Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Performance > Re: Understandi...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 3 of 8 Topic 4036 of 4352
Post > Topic >>

Re: Understanding histograms

by lenshap@[EMAIL PROTECTED] ("Len Shapiro") Apr 29, 2008 at 11:32 PM

Tom,

Thank you for your prompt reply.

On Tue, Apr 29, 2008 at 10:19 PM, Tom Lane <tgl@[EMAIL PROTECTED]
> wrote:
> Len Shapiro <len@[EMAIL PROTECTED]
> writes:
>  > 1. Why does Postgres come up with a negative n_distinct?
>
>  It's a fractional representation.  Per the docs:
>
>  > stadistinct   float4          The number of distinct nonnull data
values in the column. A value greater than zero is the actual number of
distinct values. A value less than zero is the negative of a fraction of
the number of rows in the table (for example, a column in which values
appear about twice on the average could be represented by stadistinct =
-0.5). A zero value means the number of distinct values is unknown

I asked about n_distinct, whose do***entation reads in part "The
negated form is used when ANALYZE believes that the number of distinct
values is likely to increase as the table grows".  and I asked about
why ANALYZE believes that the number of distinct values is likely to
increase.  I'm unclear why you quoted to me the do***entation on
stadistinct.
>
>
>  > The "rows=2" estimate makes sense when const = 1 or 5, but it makes
no
>  > sense to me for other values of const not in the MVC list.
>  > For example, if I run the query
>  > EXPLAIN SELECT * from sailors where rank = -1000;
>  > Postgres still gives an estimate of "row=2".
>
>  I'm not sure what estimate you'd expect instead?

Instead I would expect an estimate of "rows=0" for values of const
that are not in the MCV list and not in the histogram.  When the
histogram has less than the maximum number of entries, implying (I am
guessing here) that all non-MCV values are in the histogram list, this
seems like a simple strategy and has the virtue of being accurate.

Where in the source is the code that manipulates the histogram?

> The code has a built in
>  assumption that no value not present in the MCV list can be more
>  frequent than the last member of the MCV list, so it's definitely not
>  gonna guess *more* than 2.

That's interesting.  Where is this in the source code?

Thanks for all your help.

All the best,

Len Shapiro

>                         regards, tom lane
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
 




 8 Posts in Topic:
Understanding histograms
len@[EMAIL PROTECTED] (L  2008-04-29 21:56:32 
Re: Understanding histograms
tgl@[EMAIL PROTECTED] (T  2008-04-30 01:19:51 
Re: Understanding histograms
lenshap@[EMAIL PROTECTED]  2008-04-29 23:32:18 
Re: Understanding histograms
tgl@[EMAIL PROTECTED] (T  2008-04-30 10:43:11 
Re: Understanding histograms
pgsql@[EMAIL PROTECTED]   2008-04-30 15:47:02 
Re: Understanding histograms
tgl@[EMAIL PROTECTED] (T  2008-04-30 19:17:44 
Re: Understanding histograms
stark@[EMAIL PROTECTED]   2008-04-30 20:53:44 
Re: Understanding histograms
tgl@[EMAIL PROTECTED] (T  2008-05-01 00:41:07 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sun Oct 12 9:39:10 CDT 2008.