Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Sql > Re: in() VS exi...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 1 Topic 3346 of 3799
Post > Topic >>

Re: in() VS exists()

by Bart.Degryse@[EMAIL PROTECTED] ("Bart Degryse") Mar 13, 2008 at 03:28 PM

--=__Part2B0DB8E4.0__=
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

how about
select count(distinct s.specimen_id) from specimens sp INNER JOIN
sequences s 
on s.specimen_id = sp.id;


>>> Julien <jcigar@[EMAIL PROTECTED]
> 2008-03-13 17:27 >>>
mmh no because it's a one to many relation (a specimen can have more
than one sequence) :

muridae=> select count(sp.id) from specimens sp INNER JOIN sequences s
on s.specimen_id = sp.id;
count 
-------
  1536
(1 row)

Time: 81.242 ms
muridae=> select count(sp.id) from specimens sp where sp.id in (select
specimen_id from sequences group by specimen_id);
count 
-------
  1431
(1 row)

Time: 81.736 ms
muridae=> 

(of course this is a bad example, because I could just do: select
count(specimen_id) from sequences group by specimen_id;, but in my
application I have more fields coming from specimens of course)

Julien

On Thu, 2008-03-13 at 15:12 +0100, Bart Degryse wrote:
> I think that just
> select count(sp.id) from specimens sp INNER JOIN sequences s on
> s.specimen_id = sp.id;
> should be enough
> 
> >>> Julien <jcigar@[EMAIL PROTECTED]
> 2008-03-13 17:10 >>>
> If I understood well the query plan, the planner optimize the
> IN(SELECT ...) version with a JOIN (line 19-20 of the first paste) :
> 
> ->  Hash IN Join  (cost=240.95..4011.20 rows=1436 width=4) (actual
> time=93.971..201.908 rows=1431 loops=1)
>          Hash Cond: ("outer".id = "inner".specimen_id)
> 
> so I guess that :
> 
> select count(sp.id) from specimens sp where sp.id in (select
> specimen_id
> from sequences);
> 
> is almost the same as :
> 
> select count(sp.id) from specimens sp INNER JOIN (select specimen_id
> from sequences GROUP BY specimen_id) as foo on foo.specimen_id =
> sp.id;
> 
> ?
> 
> Thanks,
> Julien
> 
> On Thu, 2008-03-13 at 14:46 +0100, Bart Degryse wrote:
> > The chapter on indexes in the manual should give you a pretty good
> > idea on the why.
> > IN and EXISTS are not the only possibilities, you can also use
inner
> > or outer joins.
> > Which solution performs best depends on the data, the database
> > version, the available indexes, ...
> > 
> > >>> Julien <jcigar@[EMAIL PROTECTED]
> 2008-03-13 15:47 >>>
> > Hello,
> > 
> > Does anyone has an idea why sometimes:
> > - select ... where ... in (select ...)
> > is faster than :
> > - select ... where ... exists(select ...)
> > and sometimes it's the opposite ?
> > 
> > I had such a situation, I've pasted the queries on:
> > http://rafb.net/p/KXNZ6892.html
and http://rafb.net/p/jvo5DO38.html

> > 
> > It's running PostgreSQL 8.1 with an effective_cache_size of 30000.
> > 
> > specimens.id is the primary key and there are indexes on
> > sequences(specimen_id) and specimen_measurements(specimen_id)
> > 
> > Is there a general "rule" to know when to use the in() version and
> > when
> > to use the exists() version ? Is it true to say that the exists()
> > version is more scalable (with many rows) than the in() version
> (from
> > the little tests I made it seems the case) ?
> > 
> > Thanks,
> > Julien
> > 
> > -- 
> > Julien Cigar
> > Belgian Biodiversity Platform
> > http://www.biodiversity.be
( http://www.biodiversity.be/
)
> > Université Libre de Bruxelles (ULB)
> > Campus de la Plaine CP 257
> > Bâtiment NO, Bureau 4 N4 115C (Niveau 4)
> > Boulevard du Triomphe, entrée ULB 2
> > B-1050 Bruxelles
> > Mail: jcigar@[EMAIL PROTECTED]
 
> > @[EMAIL PROTECTED]
 http://biobel.biodiversity.be/person/show/471

> > Tel : 02 650 57 52
> > 
> > 
> > -- 
> > Sent via pgsql-sql mailing list (pgsql-sql@[EMAIL PROTECTED]
)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-sql

> > 
> -- 
> Julien Cigar
> Belgian Biodiversity Platform
> http://www.biodiversity.be
( http://www.biodiversity.be/
)
> Université Libre de Bruxelles (ULB)
> Campus de la Plaine CP 257
> Bâtiment NO, Bureau 4 N4 115C (Niveau 4)
> Boulevard du Triomphe, entrée ULB 2
> B-1050 Bruxelles
> Mail: jcigar@[EMAIL PROTECTED]
 
> @[EMAIL PROTECTED]
 http://biobel.biodiversity.be/person/show/471

> Tel : 02 650 57 52
> 
> 
> -- 
> Sent via pgsql-sql mailing list (pgsql-sql@[EMAIL PROTECTED]
)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql

> 
-- 
Julien Cigar
Belgian Biodiversity Platform
http://www.biodiversity.be
( http://www.biodiversity.be/
)
Université Libre de Bruxelles (ULB)
Campus de la Plaine CP 257
Bâtiment NO, Bureau 4 N4 115C (Niveau 4)
Boulevard du Triomphe, entrée ULB 2
B-1050 Bruxelles
Mail: jcigar@[EMAIL PROTECTED]
 
@[EMAIL PROTECTED]
 http://biobel.biodiversity.be/person/show/471

Tel : 02 650 57 52


-- 
Sent via pgsql-sql mailing list (pgsql-sql@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

--=__Part2B0DB8E4.0__=
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; charset=3Dutf-8">
<META content=3D"MSHTML 6.00.2800.1264" name=3DGENERATOR></HEAD>
<BODY style=3D"MARGIN: 4px 4px 1px; FONT: 10pt Tahoma">
<DIV>how about</DIV>
<DIV>select count(distinct s.specimen_id)&nbsp;from specimens sp INNER =
JOIN sequences s </DIV>
<DIV>on s.specimen_id =3D sp.id;<BR><BR><BR>&gt;&gt;&gt; Julien
&lt;jcigar@[EMAIL PROTECTED]
 2008-03-13 17:27 &gt;&gt;&gt;<BR>mmh no because it's a one =
to many relation (a specimen can have more<BR>than one sequence)
:<BR><BR>m=
uridae=3D&gt; select count(sp.id) from specimens sp INNER JOIN sequences =
s<BR>on s.specimen_id =3D sp.id;<BR>count <BR>-------<BR>&nbsp; 1536<BR>(1
=
row)<BR><BR>Time: 81.242 ms<BR>muridae=3D&gt; select count(sp.id) from =
specimens sp where sp.id in (select<BR>specimen_id from sequences group by
=
specimen_id);<BR>count <BR>-------<BR>&nbsp; 1431<BR>(1 row)<BR><BR>Time:
=
81.736 ms<BR>muridae=3D&gt; <BR><BR>(of course this is a bad example, =
because I could just do: select<BR>count(specimen_id) from sequences group
=
by specimen_id;, but in my<BR>application I have more fields coming from =
specimens of course)<BR><BR>Julien<BR><BR>On Thu, 2008-03-13 at 15:12 =
+0100, Bart Degryse wrote:<BR>&gt; I think that just<BR>&gt; select =
count(sp.id) from specimens sp INNER JOIN sequences s on<BR>&gt;
s.specimen=
_id =3D sp.id;<BR>&gt; should be enough<BR>&gt; <BR>&gt; &gt;&gt;&gt; =
Julien &lt;jcigar@[EMAIL PROTECTED]
 2008-03-13 17:10 &gt;&gt;&gt;<BR>&gt; If I
=
understood well the query plan, the planner optimize the<BR>&gt; IN(SELECT
=
....) version with a JOIN (line 19-20 of the first paste) :<BR>&gt; =
<BR>&gt; -&gt;&nbsp; Hash IN Join&nbsp; (cost=3D240.95..4011.20
rows=3D1436=
 width=3D4) (actual<BR>&gt; time=3D93.971..201.908 rows=3D1431
loops=3D1)<B=
R>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Hash Cond: =
("outer".id =3D "inner".specimen_id)<BR>&gt; <BR>&gt; so I guess that =
:<BR>&gt; <BR>&gt; select count(sp.id) from specimens sp where sp.id in =
(select<BR>&gt; specimen_id<BR>&gt; from sequences);<BR>&gt; <BR>&gt; is =
almost the same as :<BR>&gt; <BR>&gt; select count(sp.id) from specimens =
sp INNER JOIN (select specimen_id<BR>&gt; from sequences GROUP BY =
specimen_id) as foo on foo.specimen_id =3D<BR>&gt; sp.id;<BR>&gt; <BR>&gt;
=
?<BR>&gt; <BR>&gt; Thanks,<BR>&gt; Julien<BR>&gt; <BR>&gt; On Thu, =
2008-03-13 at 14:46 +0100, Bart Degryse wrote:<BR>&gt; &gt; The chapter on
=
indexes in the manual should give you a pretty good<BR>&gt; &gt; idea on =
the why.<BR>&gt; &gt; IN and EXISTS are not the only possibilities, you =
can also use inner<BR>&gt; &gt; or outer joins.<BR>&gt; &gt; Which =
solution performs best depends on the data, the database<BR>&gt; &gt; =
version, the available indexes, ...<BR>&gt; &gt; <BR>&gt; &gt;
&gt;&gt;&gt;=
 Julien &lt;jcigar@[EMAIL PROTECTED]
 2008-03-13 15:47 &gt;&gt;&gt;<BR>&gt; =
&gt; Hello,<BR>&gt; &gt; <BR>&gt; &gt; Does anyone has an idea why =
sometimes:<BR>&gt; &gt; - select ... where ... in (select ...)<BR>&gt; =
&gt; is faster than :<BR>&gt; &gt; - select ... where ... exists(select =
....)<BR>&gt; &gt; and sometimes it's the opposite ?<BR>&gt; &gt; <BR>&gt;
=
&gt; I had such a situation, I've pasted the queries on:<BR>&gt; &gt; <A =
href=3D"http://rafb.net/p/KXNZ6892.html">http://rafb.net/p/KXNZ6892.html</A=
> and <A
href=3D"http://rafb.net/p/jvo5DO38.html">http://rafb.net/p/jvo5DO3=
8.html</A><BR>&gt; &gt; <BR>&gt; &gt; It's running PostgreSQL 8.1 with an
=
effective_cache_size of 30000.<BR>&gt; &gt; <BR>&gt; &gt; specimens.id is
=
the primary key and there are indexes on<BR>&gt; &gt;
sequences(specimen_id=
) and specimen_measurements(specimen_id)<BR>&gt; &gt; <BR>&gt; &gt; Is =
there a general "rule" to know when to use the in() version and<BR>&gt; =
&gt; when<BR>&gt; &gt; to use the exists() version ? Is it true to say =
that the exists()<BR>&gt; &gt; version is more scalable (with many rows) =
than the in() version<BR>&gt; (from<BR>&gt; &gt; the little tests I made =
it seems the case) ?<BR>&gt; &gt; <BR>&gt; &gt; Thanks,<BR>&gt; &gt; =
Julien<BR>&gt; &gt; <BR>&gt; &gt; -- <BR>&gt; &gt; Julien Cigar<BR>&gt; =
&gt; Belgian Biodiversity Platform<BR>&gt; &gt; <A
href=3D"http://www.biodi=
versity.be/">http://www.biodiversity.be</A><BR>&gt;
&gt; Universit=C3=A9 =
Libre de Bruxelles (ULB)<BR>&gt; &gt; Campus de la Plaine CP 257<BR>&gt; =
&gt; B=C3=A2timent NO, Bureau 4 N4 115C (Niveau 4)<BR>&gt; &gt; Boulevard
=
du Triomphe, entr=C3=A9e ULB 2<BR>&gt; &gt; B-1050 Bruxelles<BR>&gt; &gt;
=
Mail: jcigar@[EMAIL PROTECTED]
>&gt; &gt; @[EMAIL PROTECTED]
 <A
href=3D"http://biobel.biodi=
versity.be/person/show/471">http://biobel.biodiversity.be/person/show/471</=
A><BR>&gt; &gt; Tel : 02 650 57 52<BR>&gt; &gt; <BR>&gt; &gt; <BR>&gt; =
&gt; -- <BR>&gt; &gt; Sent via pgsql-sql mailing list
(pgsql-sql@[EMAIL PROTECTED]
)<BR>&gt; &gt; To make changes to your subscription:<BR>&gt; &gt; <A
=
href=3D"http://www.postgresql.org/mailpref/pgsql-sql">http://www.postgresql=
..org/mailpref/pgsql-sql</A><BR>&gt; &gt; <BR>&gt; -- <BR>&gt; Julien =
Cigar<BR>&gt; Belgian Biodiversity Platform<BR>&gt; <A
href=3D"http://www.b=
iodiversity.be/">http://www.biodiversity.be</A><BR>&gt;
Universit=C3=A9 =
Libre de Bruxelles (ULB)<BR>&gt; Campus de la Plaine CP 257<BR>&gt; =
B=C3=A2timent NO, Bureau 4 N4 115C (Niveau 4)<BR>&gt; Boulevard du =
Triomphe, entr=C3=A9e ULB 2<BR>&gt; B-1050 Bruxelles<BR>&gt; Mail: =
jcigar@[EMAIL PROTECTED]
>&gt; @[EMAIL PROTECTED]
 <A
href=3D"http://biobel.biodiversity.be/=
person/show/471">http://biobel.biodiversity.be/person/show/471</A><BR>&gt;
=
Tel : 02 650 57 52<BR>&gt; <BR>&gt; <BR>&gt; -- <BR>&gt; Sent via =
pgsql-sql mailing list (pgsql-sql@[EMAIL PROTECTED]
)<BR>&gt; To make changes
=
to your subscription:<BR>&gt; <A
href=3D"http://www.postgresql.org/mailpref=
/pgsql-sql">http://www.postgresql.org/mailpref/pgsql-sql</A><BR>&gt;
=
<BR>-- <BR>Julien Cigar<BR>Belgian Biodiversity Platform<BR><A
href=3D"http=
://www.biodiversity.be/">http://www.biodiversity.be</A><BR>Universit=C3=A9
=
Libre de Bruxelles (ULB)<BR>Campus de la Plaine CP 257<BR>B=C3=A2timent =
NO, Bureau 4 N4 115C (Niveau 4)<BR>Boulevard du Triomphe, entr=C3=A9e ULB
=
2<BR>B-1050 Bruxelles<BR>Mail: jcigar@[EMAIL PROTECTED]
>@[EMAIL PROTECTED]
 <A
href=3D"http=
://biobel.biodiversity.be/person/show/471">http://biobel.biodiversity.be/pe=
rson/show/471</A><BR>Tel : 02 650 57 52<BR><BR><BR>-- <BR>Sent via =
pgsql-sql mailing list (pgsql-sql@[EMAIL PROTECTED]
)<BR>To make changes to =
your subscription:<BR><A
href=3D"http://www.postgresql.org/mailpref/pgsql-s=
ql">http://www.postgresql.org/mailpref/pgsql-sql</A><BR></DIV></BODY></HTML=
>

--=__Part2B0DB8E4.0__=--
 




 1 Posts in Topic:
Re: in() VS exists()
Bart.Degryse@[EMAIL PROTE  2008-03-13 15:28:04 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Mon Dec 1 21:32:04 CST 2008.