On Fri, 18 Apr 2008 11:36:02 +0200, Gregory Stark <stark@[EMAIL PROTECTED]
>=
=20=20
wrote:
> "Francisco Reyes" <lists@[EMAIL PROTECTED]
> writes:
>
>> Is there any dissadvantage of using "group by" to obtain a unique list?
>>
>> On a small dataset the difference was about 20% percent.
>>
>> Group by
>> HashAggregate (cost=3D369.61..381.12 rows=3D1151 width=3D8) (actual
>> time=3D76.641..85.167 rows=3D2890 loops=3D1)
Basically :
- If you process up to some percentage of your RAM worth of data,
ha****ng=
=20=20
is going to be a lot faster
- If the size of the hash grows larger than your RAM, ha****ng will
fail=20=
=20
miserably and sorting will be much faster since PG's disksort is
really=20=
=20
good
- GROUP BY knows this and acts accordingly
- DISTINCT doesn't know this, it only knows sorting, so it sorts
- If you need DISTINCT x ORDER BY x, sorting may be faster too
(depending=
=20=20
on the % of distinct rows)
- If you need DISTINCT ON, well, you're stuck with the Sort
- So, for the time being, you can replace DISTINCT with GROUP BY...
--=20
Sent via pgsql-performance mailing list (pgsql-performance@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


|