Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Performance > Re: Group by mo...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 10 of 14 Topic 4002 of 4361
Post > Topic >>

Re: Group by more efficient than distinct?

by llonergan@[EMAIL PROTECTED] (Luke Lonergan) Apr 20, 2008 at 10:35 PM

> This message is in MIME format. Since your mail reader does not
understand
this format, some or all of this message may not be legible.

--B_3291575759_709166
Content-type: text/plain;
	charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable

Hi Francisco,

Generally, PG sorting is much slower than hash aggregation for performing
the distinct operation.  There may be small sizes where this isn=B9t true,
bu=
t
for large amounts of data (in-memory or not), hash agg (used most often,
bu=
t
not always by GROUP BY) is faster.

We=B9ve implemented a special optimization to PG sorting that does the
distinct processing within the sort, instead of afterward, but it=B9s
limited
to some small-ish number (10,000) of distinct values due to it=B9s use of
a
memory and processing intensive heap.

So, you=B9re better off using GROUP BY and making sure that the planner is
using hash agg to do the work.

- Luke=20


On 4/17/08 8:46 PM, "Francisco Reyes" <lists@[EMAIL PROTECTED]
> wrote:

> I am trying to get a distinct set of rows from 2 tables.
> After looking at someone else's query I noticed they were doing a group
b=
y
> to obtain the unique list.
>=20
> After comparing on multiple machines with several tables, it seems using
> group by to obtain a distinct list is substantially faster than using
> select distinct.
>=20
> Is there any dissadvantage of using "group by" to obtain a unique list?
>=20
> On a small dataset the difference was about 20% percent.
>=20
> Group by
>  HashAggregate  (cost=3D369.61..381.12 rows=3D1151 width=3D8) (actual
> time=3D76.641..85.167 rows=3D2890 loops=3D1)
>=20
> Distinct
>  Unique  (cost=3D1088.23..1174.53 rows=3D1151 width=3D8) (actual
> time=3D90.516..140.123 rows=3D2890 loops=3D1)
>=20
> Although I don't have the numbers here with me, a simmilar result was
> obtaining against a query that would return 100,000 rows. 20% and more
> speed differnce between "group by" over "select distinct".
>=20
> --
> Sent via pgsql-performance mailing list
(pgsql-performance@[EMAIL PROTECTED]
)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>=20


--B_3291575759_709166
Content-type: text/html;
	charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable

<HTML>
<HEAD>
<TITLE>Re: [PERFORM] Group by more efficient than distinct?</TITLE>
</HEAD>
<BODY>
<FONT SIZE=3D"4"><FONT FACE=3D"Verdana, Helvetica, Arial"><SPAN
STYLE=3D'font-siz=
e:14pt'>Hi Francisco,<BR>
<BR>
Generally, PG sorting is much slower than hash aggregation for performing
t=
he distinct operation. &nbsp;There may be small sizes where this
isn&#8217;t=
 true, but for large amounts of data (in-memory or not), hash agg (used
most=
 often, but not always by GROUP BY) is faster.<BR>
<BR>
We&#8217;ve implemented a special optimization to PG sorting that does the
=
distinct processing within the sort, instead of afterward, but it&#8217;s
li=
mited to some small-ish number (10,000) of distinct values due to
it&#8217;s=
 use of a memory and processing intensive heap.<BR>
<BR>
So, you&#8217;re better off using GROUP BY and making sure that the
planner=
 is using hash agg to do the work.<BR>
<BR>
- Luke <BR>
<BR>
<BR>
On 4/17/08 8:46 PM, &quot;Francisco Reyes&quot;
&lt;lists@[EMAIL PROTECTED]
 wrote:<BR>
<BR>
</SPAN></FONT></FONT><BLOCKQUOTE><FONT SIZE=3D"4"><FONT FACE=3D"Verdana,
Helvet=
ica, Arial"><SPAN STYLE=3D'font-size:14pt'>I am trying to get a distinct
set o=
f rows from 2 tables.<BR>
After looking at someone else's query I noticed they were doing a group
by<=
BR>
to obtain the unique list.<BR>
<BR>
After comparing on multiple machines with several tables, it seems
using<BR=
>
group by to obtain a distinct list is substantially faster than using<BR>
select distinct.<BR>
<BR>
Is there any dissadvantage of using &quot;group by&quot; to obtain a
unique=
 list?<BR>
<BR>
On a small dataset the difference was about 20% percent.<BR>
<BR>
Group by<BR>
&nbsp;HashAggregate &nbsp;(cost=3D369.61..381.12 rows=3D1151 width=3D8)
(actual<B=
R>
time=3D76.641..85.167 rows=3D2890 loops=3D1)<BR>
<BR>
Distinct<BR>
&nbsp;Unique &nbsp;(cost=3D1088.23..1174.53 rows=3D1151 width=3D8)
(actual<BR>
time=3D90.516..140.123 rows=3D2890 loops=3D1)<BR>
<BR>
Although I don't have the numbers here with me, a simmilar result was<BR>
obtaining against a query that would return 100,000 rows. 20% and more<BR>
speed differnce between &quot;group by&quot; over &quot;select
distinct&quo=
t;. &nbsp;<BR>
<BR>
--<BR>
Sent via pgsql-performance mailing list
(pgsql-performance@[EMAIL PROTECTED]
)<=
BR>
To make changes to your subscription:<BR>
<a
href=3D"http://www.postgresql.org/mailpref/pgsql-performance">http://www.p=
ostgresql.org/mailpref/pgsql-performance</a><BR>
<BR>
</SPAN></FONT></FONT></BLOCKQUOTE>
</BODY>
</HTML>


--B_3291575759_709166--
 




 14 Posts in Topic:
Group by more efficient than distinct?
lists@[EMAIL PROTECTED]   2008-04-17 23:46:08 
Re: Group by more efficient than distinct?
mlists@[EMAIL PROTECTED]   2008-04-18 09:25:04 
Re: Group by more efficient than distinct?
stark@[EMAIL PROTECTED]   2008-04-18 10:36:02 
Re: Group by more efficient than distinct?
lists@[EMAIL PROTECTED]   2008-04-18 12:35:04 
Re: Group by more efficient than distinct?
lists@[EMAIL PROTECTED]   2008-04-20 11:15:36 
Re: Group by more efficient than distinct?
lists@[EMAIL PROTECTED]   2008-04-22 01:34:40 
Re: Group by more efficient than distinct?
mark@[EMAIL PROTECTED] (  2008-04-21 19:50:22 
Re: Group by more efficient than distinct?
mark@[EMAIL PROTECTED] (  2008-04-21 21:39:15 
Re: Group by more efficient than distinct?
lists@[EMAIL PROTECTED]   2008-04-20 11:12:10 
Re: Group by more efficient than distinct?
llonergan@[EMAIL PROTECTE  2008-04-20 22:35:58 
Re: Group by more efficient than distinct?
matthew@[EMAIL PROTECTED]  2008-04-22 11:34:23 
Re: Group by more efficient than distinct?
mark@[EMAIL PROTECTED] (  2008-04-22 08:01:20 
Re: Group by more efficient than distinct?
matthew@[EMAIL PROTECTED]  2008-04-22 13:22:20 
Re: Group by more efficient than distinct?
mark@[EMAIL PROTECTED] (  2008-04-22 09:04:30 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Wed Oct 15 20:52:05 CDT 2008.