Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Novice > Re: Explain exp...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 4 of 12 Topic 3026 of 3322
Post > Topic >>

Re: Explain explained

by markus@[EMAIL PROTECTED] ("Markus Stocker") Mar 4, 2008 at 03:52 PM

Tom,

Thanks for your inputs, very enlightening!

On Tue, Mar 4, 2008 at 1:26 PM, Tom Lane <tgl@[EMAIL PROTECTED]
> wrote:
> "Markus Stocker" <markus@[EMAIL PROTECTED]
> writes:
>  > 1/ How does postgresql know that the constrain individual_name.name =
>  > 'http://www.University0.edu'
matches one row (which is in fact true)?
>
>  It doesn't *know* that, except in the special case where it sees
there's
>  a unique index on that column.

Indeed, there is a unique index on that column, sorry about that. In
fact, pg_stats says that there are no MCVs for that column
(obviously). So that explains a lot. I get

sel(individual_name.name = 'http://www.University0.edu')
= (1 - null_frac) / num_distinct
= (1 - 0) / 17174

rows
= rel_cardinality * sel
= 17174 * 1 / 17174

which I suppose explains too why the expected number of rows is 1.

> There's a new section in the 8.3 docs that gives some more details about
the estimation process:
>  http://www.postgresql.org/docs/8.3/static/planner-stats-details.html

Thanks for pointing to this, very well do***ented.

>  > Further, why expects postgresql
>  > for each operation a row size of 1?
>
>  It's not expecting more than one join partner at each step.  Again
>  that's just statistical.

Sounds reasonable. The first constrain (i.e. individual_name.name)
returns (estimated) 1 row (see above) hence successive joins can't
have more than one.

>  > 2/ Sequential scans seem to me more expensive compared to index
scans.
>  > I'm wondering why the sequential scan on individual_name is the first
>  > executed in the plan.
>
>  I was wondering that too; it looks like it should be a candidate for an
>  index search.  Datatype problem maybe?  Again, you've not shown us the
>  table definitions...

I guess this is explained too now, at least partially. If I drop the
index on individual_name.name there is still pg_stats telling that
there are no MCVs. Hence, the expected number of rows doesn't change
(even after analyze, the values simply are unique). Not sure how the
sequential scan influences the overall costs compared to the index
scan, though.

The schema for individual_name relation is

 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer | not null
 name   | text    | not null
Indexes:
    "individual_name_pkey" PRIMARY KEY, btree (id)
    "individual_name_name_idx" UNIQUE, btree (name)

whereas the schema for the concept_assertion relation is

   Column   |  Type   | Modifiers
------------+---------+-----------
 concept    | integer | not null
 individual | integer | not null
Indexes:
    "concept_assertion_pkey" PRIMARY KEY, btree (concept, individual)
    "concept_assertion_concept_idx" btree (concept)
Foreign-key constraints:
    "concept_assertion_concept_fkey" FOREIGN KEY (concept) REFERENCES
tbox_name(id)
    "concept_assertion_individual_fkey" FOREIGN KEY (individual)
REFERENCES individual_name(id)


>  > 3/ There is one condition in the query, i.e.
concept_assertion.concept
>  > = 5 with an empty result set, i.e. selectivity 0. In fact, the last
>  > index scan on concept_assertion ca_1 in the plan is never executed
>  > (this is what 'explain analyze' tells me). I'm wondering, why this
>  > constrain is not executed first. By executing this first, we could
>  > just never execute everything else.
>
>  Postgres never uses a rowcount estimate of less than one row,
>  so it's not going to try to optimize that way.  There are various
>  rationales behind that choice, but the main one is that we don't
>  trust the statistics unreservedly.  The odds that an estimate of zero
>  is more accurate than an estimate of one just aren't good enough,
>  and the likelihood of generating a really awful plan if we did believe
>  zero rows is too high.

Agreed, makes a lot of sense. If I try to calculate the estimated rows
for concept_assertion.concept = 5 (i.e. the constrain which
effectively returns an empty result set) I get

sel(concept_assertion.concept = 5)
= (1 - sum(mvf)) / (n_distinct - num_mcv)
= 0.021545454

rows
= rel_cardinality * sel
= 18128 * 0.021545454
= (rounded) 391

Clearly, at least by inspecting the two constraints individually, i.e.
the individual_name.name = 'http://www.University0.edu'
and the
concept_assertion.concept = 5 constrain, the optimizer chooses the
first as it returns less expected rows, i.e. 1 vs. 391.

Hope the explanations sound somewhat reasonable.

Thanks again,
markus

--
Sent via pgsql-novice mailing list (pgsql-novice@[EMAIL PROTECTED]
)
To make changes to your Subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-novice
 




 12 Posts in Topic:
Explain explained
markus@[EMAIL PROTECTED]   2008-03-04 09:53:32 
Re: Explain explained
rdeleonp@[EMAIL PROTECTED  2008-03-04 10:21:07 
Re: Explain explained
tgl@[EMAIL PROTECTED] (T  2008-03-04 13:26:17 
Re: Explain explained
markus@[EMAIL PROTECTED]   2008-03-04 15:52:30 
Re: Explain explained
tgl@[EMAIL PROTECTED] (T  2008-03-04 17:30:31 
Re: Explain explained
markus@[EMAIL PROTECTED]   2008-03-04 17:59:45 
numeric definition advice
coder@[EMAIL PROTECTED]   2008-03-07 11:40:55 
Re: numeric definition advice
tgl@[EMAIL PROTECTED] (T  2008-03-07 08:37:30 
Re: numeric definition advice
coder@[EMAIL PROTECTED]   2008-03-07 19:49:04 
warnings restoring a db
coder@[EMAIL PROTECTED]   2008-04-09 08:59:48 
Re: warnings restoring a db
tgl@[EMAIL PROTECTED] (T  2008-04-09 10:43:53 
Re: warnings restoring a db
coder@[EMAIL PROTECTED]   2008-04-09 16:52:00 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri Dec 5 4:07:28 CST 2008.