Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Performance > Re: Group by mo...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 7 of 14 Topic 4002 of 4424
Post > Topic >>

Re: Group by more efficient than distinct?

by mark@[EMAIL PROTECTED] (Mark Mielke) Apr 21, 2008 at 07:50 PM

PFC wrote:
>     Actually, the memory used by the hash depends on the number of 
> distinct values, not the number of rows which are processed...
>     Consider :
>
> SELECT a GROUP BY a
> SELECT a,count(*) GROUP BY a
>
>     In both cases the hash only holds discinct values. So if you have 
> 1 million rows to process but only 10 distinct values of "a", the hash 
> will only contain those 10 values (and the counts), so it will be very 
> small and fast, it will absorb a huge seq scan without problem. If 
> however, you have (say) 100 million distinct values for a, using a 
> hash would be a bad idea. As usual, divide the size of your RAM by the 
> number of concurrent connections or something.
>     Note that "a" could be a column, several columns, anything, the 
> size of the hash will be pro****tional to the number of distinct 
> values, ie. the number of rows returned by the query, not the number 
> of rows processed (read) by the query. Same with hash joins etc, 
> that's why when you join a very small table to a large one Postgres 
> likes to use seq scan + hash join on the small table.

This surprises me - hash values are lossy, so it must still need to 
confirm against the real list of values, which at a minimum should 
require references to the rows to check against?

Is PostgreSQL doing something beyond my imagination? :-)

Cheers,
mark

-- 
Mark Mielke <mark@[EMAIL PROTECTED]
>


-- 
Sent via pgsql-performance mailing list (pgsql-performance@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
 




 14 Posts in Topic:
Group by more efficient than distinct?
lists@[EMAIL PROTECTED]   2008-04-17 23:46:08 
Re: Group by more efficient than distinct?
mlists@[EMAIL PROTECTED]   2008-04-18 09:25:04 
Re: Group by more efficient than distinct?
stark@[EMAIL PROTECTED]   2008-04-18 10:36:02 
Re: Group by more efficient than distinct?
lists@[EMAIL PROTECTED]   2008-04-18 12:35:04 
Re: Group by more efficient than distinct?
lists@[EMAIL PROTECTED]   2008-04-20 11:15:36 
Re: Group by more efficient than distinct?
lists@[EMAIL PROTECTED]   2008-04-22 01:34:40 
Re: Group by more efficient than distinct?
mark@[EMAIL PROTECTED] (  2008-04-21 19:50:22 
Re: Group by more efficient than distinct?
mark@[EMAIL PROTECTED] (  2008-04-21 21:39:15 
Re: Group by more efficient than distinct?
lists@[EMAIL PROTECTED]   2008-04-20 11:12:10 
Re: Group by more efficient than distinct?
llonergan@[EMAIL PROTECTE  2008-04-20 22:35:58 
Re: Group by more efficient than distinct?
matthew@[EMAIL PROTECTED]  2008-04-22 11:34:23 
Re: Group by more efficient than distinct?
mark@[EMAIL PROTECTED] (  2008-04-22 08:01:20 
Re: Group by more efficient than distinct?
matthew@[EMAIL PROTECTED]  2008-04-22 13:22:20 
Re: Group by more efficient than distinct?
mark@[EMAIL PROTECTED] (  2008-04-22 09:04:30 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Mon Dec 1 8:58:20 CST 2008.