Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Bugs > Re: BUG #4264: ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 4 of 4 Topic 3876 of 4088
Post > Topic >>

Re: BUG #4264: Optimizer fails to use hash_aggregate when appropriate.

by tgl@[EMAIL PROTECTED] (Tom Lane) Jun 26, 2008 at 11:47 AM

"Scott Carey" <scott@[EMAIL PROTECTED]
> writes:
> 2. The query planner does not estimate the number of returned rows
> appropriately when table inheritance / partitioning is involved, leading
to
> poor choices for aggregation.  Here are two explains that demonstrate
this.
> The parent table has 0 rows, estimated as 1 row.  Ideally it should not
> affect the query plan.

One extra row is certainly not going to affect that plan materially.

It looks to me like the problem is the row *width* estimate --- the
parent table is getting what is probably a default estimate, and IIRC
the append relation just takes on the widest of the input widths.
We could probably do something a bit smarter there; maybe weight the
widths according to number of rows in each appended table?

The reason this matters is presumably that with the overly wide width,
the planner thinks the hash table wouldn't fit in work_mem.

> Thanks to your insight, ran some tests and demonstrated the vast
difference
> in query plans resulting from the two queries (on a large table , on a
> column with few values) of the form:
> SELECT column from TABLE GROUP BY column;   (uses hash_aggregate)
> SELECT DISTINCT column from TABLE;  (full sort, then aggregate)

Yeah, the SELECT DISTINCT code is old and crufty and tightly intertwined
with ORDER BY, which means that it's always implemented by sorting,
whereas with GROUP BY we can consider both implementations.
Sometime we need to rewrite all that; but it's hard to see how to do it
without breaking DISTINCT ON.  The latter seems like it *needs* to be
intertwined with sorting ...

>> DISTINCT forces a sort, so there wouldn't be any advantage.

> I contend that this is false.  Example:  Hash_aggregate from 25 million
rows
> to 10000 unique rows, then sort the result.

You need to go back and think again about what it means to have a
DISTINCT aggregate (which is unrelated to SELECT DISTINCT, at least
from an implementation standpoint) inside a GROUP BY.  We could
conceivably do it without any sorting if there were a separate hash
aggregation table set up for each GROUP BY group, but the odds of
running out of memory are much too great for my taste.  Estimating
the size of the hash tables would depend on guessing how many distinct
values of one variable are associated with each value of another
variable, and it's hard enough to know even how many distinct values
there are overall let alone what the correlation is.  I think the
only safe way would be to assume that each per-group hash table would
reach maximum size, which would probably end up with the planner
never choosing the approach at all anyway.

			regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
 




 4 Posts in Topic:
BUG #4264: Optimizer fails to use hash_aggregate when appropriat
scott@[EMAIL PROTECTED]   2008-06-25 18:00:33 
Re: BUG #4264: Optimizer fails to use hash_aggregate when approp
tgl@[EMAIL PROTECTED] (T  2008-06-25 16:55:55 
Re: BUG #4264: Optimizer fails to use hash_aggregate when approp
scott@[EMAIL PROTECTED]   2008-06-25 19:40:23 
Re: BUG #4264: Optimizer fails to use hash_aggregate when approp
tgl@[EMAIL PROTECTED] (T  2008-06-26 11:47:57 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sat Oct 11 12:08:41 CDT 2008.