Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Performance > Re: large table...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 10 of 15 Topic 3978 of 4424
Post > Topic >>

Re: large tables and simple "= constant" queries using indexes

by lists@[EMAIL PROTECTED] (PFC) Apr 10, 2008 at 11:37 PM

> Thanks a lot, all of you - this is excellent advice. With the data=20=20
> clustered and statistics at a more reasonable value of 100, it now=20=20
> reproducibly takes even less time - 20-57 ms per query.

	1000x speedup with proper tuning - always impressive, lol.
	IO seeks are always your worst enemy.

> After reading the section on "Statistics Used By the Planner" in the=20=
=20
> manual, I was a little concerned that, while the statistics sped up the=
=20=20
> queries that I tried immeasurably, that the most_common_vals array
was=20=
=20
> where the speedup was happening, and that the values which wouldn't fit=
=20=20
> in this array wouldn't be sped up. Though I couldn't offhand find an=20=
=20
> example where this occurred, the clustering approach seems
intuitively=20=
=20
> like a much more complete and scalable solution, at least for a=20=20
> read-only table like this.

	Actually, with statistics set to 100, then 100 values will be stored
in=20=
=20
most_common_vals. This would mean that the values not in most_common_vals=
=20=20
will have less than 1% frequency, and probably much less than that.
The=20=
=20
choice of plan for these rare values is pretty simple.

	With two columns, "interesting" stuff can happen, like if you have
col1=20=
=20
in [1...10] and col2 in [1...10] and use a condition on col1=3Dconst
and=20=
=20
col2=3Dconst, the selectivity of the result depends not only on the=20=20
distribution of col1 and col2 but also their correlation.

	As for the tests you did, it's hard to say without seeing the explain=20=
=20
analyze outputs. If you change the stats and the plan choice (EXPLAIN)=20=
=20
stays the same, and you use the same values in your query, any difference=
=20=20
in timing comes from caching, since postgres is executing the same
plan=20=
=20
and therefore doing the exact same thing. Caching (from PG and from
the=20=
=20
OS) can make the timings vary a lot.

> - Trying the same constant a second time gave an instantaneous
result,=20=
=20
> I'm guessing because of query/result caching.

	PG does not cache queries or results. It caches data & index pages in
its=
=20=20
shared buffers, and then the OS adds another layer of the usual disk
cache.
	A simple query like selecting one row based on PK takes about 60=20=20
microseconds of CPU time, but if it needs one seek for the index and
one=20=
=20
for the data it may take 20 ms waiting for the moving parts to move...=20=
=20
Hence, CLUSTER is a very useful tool.

	Bitmap index scans love clustered tables because all the interesting
rows=
=20=20
end up being grouped together, so much less pages need to be visited.

> - I didn't try decreasing the statistics back to 10 before I ran the=20=
=20
> cluster command, so I can't show the search times going up because
of=20=
=20
> that. But I tried killing the 500 meg process. The new process uses
less=
=20=20
> than 5 megs of ram, and still reproducibly returns a result in less
than=
=20=20
> 60 ms. Again, this is with a statistics value of 100 and the data=20=20
> clustered by gene_prediction_view_gene_ref_key.

	Killing it or just restarting postgres ?
	If you let postgres run (not idle) for a while, naturally it will
fill=20=
=20
the RAM up to the shared_buffers setting that you specified in the=20=20
configuration file. This is good, since grabbing data from postgres'
own=20=
=20
cache is faster than having to make a syscall to the OS to get it from
the=
=20=20
OS disk cache (or disk). This isn't bloat.
	But what those 500 MB versus 6 MB show is that before, postgres had
to=20=
=20
read a lot of data for your query, so it stayed in the cache ; after=20=20
tuning it needs to read much less data (thanks to CLUSTER) so the
cache=20=
=20
stays empty.


--=20
Sent via pgsql-performance mailing list (pgsql-performance@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
 




 15 Posts in Topic:
large tables and simple "= constant" queries using indexes
john.e.beaver@[EMAIL PROT  2008-04-09 16:58:27 
Re: large tables and simple "= constant" queries using
acmmailing@[EMAIL PROTECT  2008-04-09 23:21:20 
Re: large tables and simple "= constant" queries using
acmmailing@[EMAIL PROTECT  2008-04-10 09:13:39 
Re: large tables and simple "= constant" queries using indexes
lists@[EMAIL PROTECTED]   2008-04-10 10:25:48 
Re: large tables and simple "= constant" queries using
matthew@[EMAIL PROTECTED]  2008-04-10 10:51:13 
Re: large tables and simple "= constant" queries using
john.e.beaver@[EMAIL PROT  2008-04-10 10:44:59 
Re: large tables and simple "= constant" queries using indexes
erik@[EMAIL PROTECTED] (  2008-04-10 10:02:48 
Re: large tables and simple "= constant" queries using indexes
Gaetano Mendola <mendo  2008-04-10 18:18:15 
Re: large tables and simple "= constant" queries using indexes
john.e.beaver@[EMAIL PROT  2008-04-10 12:37:45 
Re: large tables and simple "= constant" queries using indexes
lists@[EMAIL PROTECTED]   2008-04-10 23:37:50 
Re: large tables and simple "= constant" queries using
gsmith@[EMAIL PROTECTED]   2008-04-10 14:47:59 
Re: large tables and simple "= constant" queries using
wmoran@[EMAIL PROTECTED]   2008-04-09 17:36:09 
Re: large tables and simple "= constant" queries using
jgh@[EMAIL PROTECTED] (J  2008-04-09 23:07:50 
Re: large tables and simple "= constant" queries using indexes
lists@[EMAIL PROTECTED]   2008-04-10 00:31:00 
Re: large tables and simple "= constant" queries using
wmoran@[EMAIL PROTECTED]   2008-04-10 13:08:54 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Mon Dec 1 8:13:12 CST 2008.