--=__PartB1972271.0__=
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
I think that just
select count(sp.id) from specimens sp INNER JOIN sequences s on
s.specimen_id = sp.id;
should be enough
>>> Julien <jcigar@[EMAIL PROTECTED]
> 2008-03-13 17:10 >>>
If I understood well the query plan, the planner optimize the
IN(SELECT ...) version with a JOIN (line 19-20 of the first paste) :
-> Hash IN Join (cost=240.95..4011.20 rows=1436 width=4) (actual
time=93.971..201.908 rows=1431 loops=1)
Hash Cond: ("outer".id = "inner".specimen_id)
so I guess that :
select count(sp.id) from specimens sp where sp.id in (select
specimen_id
from sequences);
is almost the same as :
select count(sp.id) from specimens sp INNER JOIN (select specimen_id
from sequences GROUP BY specimen_id) as foo on foo.specimen_id =
sp.id;
?
Thanks,
Julien
On Thu, 2008-03-13 at 14:46 +0100, Bart Degryse wrote:
> The chapter on indexes in the manual should give you a pretty good
> idea on the why.
> IN and EXISTS are not the only possibilities, you can also use inner
> or outer joins.
> Which solution performs best depends on the data, the database
> version, the available indexes, ...
>
> >>> Julien <jcigar@[EMAIL PROTECTED]
> 2008-03-13 15:47 >>>
> Hello,
>
> Does anyone has an idea why sometimes:
> - select ... where ... in (select ...)
> is faster than :
> - select ... where ... exists(select ...)
> and sometimes it's the opposite ?
>
> I had such a situation, I've pasted the queries on:
> http://rafb.net/p/KXNZ6892.html
and http://rafb.net/p/jvo5DO38.html
>
> It's running PostgreSQL 8.1 with an effective_cache_size of 30000.
>
> specimens.id is the primary key and there are indexes on
> sequences(specimen_id) and specimen_measurements(specimen_id)
>
> Is there a general "rule" to know when to use the in() version and
> when
> to use the exists() version ? Is it true to say that the exists()
> version is more scalable (with many rows) than the in() version
(from
> the little tests I made it seems the case) ?
>
> Thanks,
> Julien
>
> --
> Julien Cigar
> Belgian Biodiversity Platform
> http://www.biodiversity.be
( http://www.biodiversity.be/
)
> Université Libre de Bruxelles (ULB)
> Campus de la Plaine CP 257
> Bâtiment NO, Bureau 4 N4 115C (Niveau 4)
> Boulevard du Triomphe, entrée ULB 2
> B-1050 Bruxelles
> Mail: jcigar@[EMAIL PROTECTED]
> @[EMAIL PROTECTED]
http://biobel.biodiversity.be/person/show/471
> Tel : 02 650 57 52
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@[EMAIL PROTECTED]
)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>
--
Julien Cigar
Belgian Biodiversity Platform
http://www.biodiversity.be
( http://www.biodiversity.be/
)
Université Libre de Bruxelles (ULB)
Campus de la Plaine CP 257
Bâtiment NO, Bureau 4 N4 115C (Niveau 4)
Boulevard du Triomphe, entrée ULB 2
B-1050 Bruxelles
Mail: jcigar@[EMAIL PROTECTED]
@[EMAIL PROTECTED]
http://biobel.biodiversity.be/person/show/471
Tel : 02 650 57 52
--
Sent via pgsql-sql mailing list (pgsql-sql@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
--=__PartB1972271.0__=
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; charset=3Dutf-8">
<META content=3D"MSHTML 6.00.2800.1264" name=3DGENERATOR></HEAD>
<BODY style=3D"MARGIN: 4px 4px 1px; FONT: 10pt Tahoma">
<DIV>I think that just</DIV>
<DIV>select count(sp.id) from specimens sp INNER JOIN sequences s on
=
s.specimen_id =3D sp.id;<BR>should be enough<BR><BR>>>> Julien =
<jcigar@[EMAIL PROTECTED]
2008-03-13 17:10 >>><BR>If I understood
=
well the query plan, the planner optimize the<BR>IN(SELECT ...) version =
with a JOIN (line 19-20 of the first paste) :<BR><BR>-> Hash IN =
Join (cost=3D240.95..4011.20 rows=3D1436 width=3D4) (actual<BR>time=
=3D93.971..201.908 rows=3D1431
loops=3D1)<BR> =
Hash Cond: ("outer".id =3D
"inner".specimen_id)<BR><BR>s=
o I guess that :<BR><BR>select count(sp.id) from specimens sp where sp.id
=
in (select specimen_id<BR>from sequences);<BR><BR>is almost the same as =
:<BR><BR>select count(sp.id) from specimens sp INNER JOIN (select =
specimen_id<BR>from sequences GROUP BY specimen_id) as foo on
foo.specimen_=
id =3D sp.id;<BR><BR>?<BR><BR>Thanks,<BR>Julien<BR><BR>On Thu, 2008-03-13
=
at 14:46 +0100, Bart Degryse wrote:<BR>> The chapter on indexes in the
=
manual should give you a pretty good<BR>> idea on the why.<BR>> IN =
and EXISTS are not the only possibilities, you can also use inner<BR>>
=
or outer joins.<BR>> Which solution performs best depends on the data,
=
the database<BR>> version, the available indexes, ...<BR>> <BR>>
=
>>> Julien <jcigar@[EMAIL PROTECTED]
2008-03-13 15:47
>>><=
BR>> Hello,<BR>> <BR>> Does anyone has an idea why
sometimes:<BR>&=
gt; - select ... where ... in (select ...)<BR>> is faster than =
:<BR>> - select ... where ... exists(select ...)<BR>> and sometimes
=
it's the opposite ?<BR>> <BR>> I had such a situation, I've pasted =
the queries on:<BR>> <A
href=3D"http://rafb.net/p/KXNZ6892.html">http://=
rafb.net/p/KXNZ6892.html</A> and <A
href=3D"http://rafb.net/p/jvo5DO38.html=
">http://rafb.net/p/jvo5DO38.html</A><BR>>
<BR>> It's running =
PostgreSQL 8.1 with an effective_cache_size of 30000.<BR>> <BR>> =
specimens.id is the primary key and there are indexes on<BR>>
sequences(=
specimen_id) and specimen_measurements(specimen_id)<BR>> <BR>> Is =
there a general "rule" to know when to use the in() version and<BR>> =
when<BR>> to use the exists() version ? Is it true to say that the =
exists()<BR>> version is more scalable (with many rows) than the in() =
version (from<BR>> the little tests I made it seems the case) ?<BR>>
=
<BR>> Thanks,<BR>> Julien<BR>> <BR>> -- <BR>> Julien =
Cigar<BR>> Belgian Biodiversity Platform<BR>> <A
href=3D"http://www.b=
iodiversity.be/">http://www.biodiversity.be</A><BR>>
Universit=C3=A9 =
Libre de Bruxelles (ULB)<BR>> Campus de la Plaine CP 257<BR>> =
B=C3=A2timent NO, Bureau 4 N4 115C (Niveau 4)<BR>> Boulevard du =
Triomphe, entr=C3=A9e ULB 2<BR>> B-1050 Bruxelles<BR>> Mail: =
jcigar@[EMAIL PROTECTED]
>> @[EMAIL PROTECTED]
<A
href=3D"http://biobel.biodiversity.be/=
person/show/471">http://biobel.biodiversity.be/person/show/471</A><BR>>
=
Tel : 02 650 57 52<BR>> <BR>> <BR>> -- <BR>> Sent via =
pgsql-sql mailing list (pgsql-sql@[EMAIL PROTECTED]
)<BR>> To make changes
=
to your subscription:<BR>> <A
href=3D"http://www.postgresql.org/mailpref=
/pgsql-sql">http://www.postgresql.org/mailpref/pgsql-sql</A><BR>>
=
<BR>-- <BR>Julien Cigar<BR>Belgian Biodiversity Platform<BR><A
href=3D"http=
://www.biodiversity.be/">http://www.biodiversity.be</A><BR>Universit=C3=A9
=
Libre de Bruxelles (ULB)<BR>Campus de la Plaine CP 257<BR>B=C3=A2timent =
NO, Bureau 4 N4 115C (Niveau 4)<BR>Boulevard du Triomphe, entr=C3=A9e ULB
=
2<BR>B-1050 Bruxelles<BR>Mail: jcigar@[EMAIL PROTECTED]
>@[EMAIL PROTECTED]
<A
href=3D"http=
://biobel.biodiversity.be/person/show/471">http://biobel.biodiversity.be/pe=
rson/show/471</A><BR>Tel : 02 650 57 52<BR><BR><BR>-- <BR>Sent via =
pgsql-sql mailing list (pgsql-sql@[EMAIL PROTECTED]
)<BR>To make changes to =
your subscription:<BR><A
href=3D"http://www.postgresql.org/mailpref/pgsql-s=
ql">http://www.postgresql.org/mailpref/pgsql-sql</A><BR></DIV></BODY></HTML=
>
--=__PartB1972271.0__=--


|