Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Performance > Re: large table...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 6 of 15 Topic 3978 of 4206
Post > Topic >>

Re: large tables and simple "= constant" queries using

by john.e.beaver@[EMAIL PROTECTED] (John Beaver) Apr 10, 2008 at 10:44 AM

Thanks a lot, all of you - this is excellent advice. With the data 
clustered and statistics at a more reasonable value of 100, it now 
reproducibly takes even less time - 20-57 ms per query.

After reading the section on "Statistics Used By the Planner" in the 
manual, I was a little concerned that, while the statistics sped up the 
queries that I tried immeasurably, that the most_common_vals array was 
where the speedup was happening, and that the values which wouldn't fit 
in this array wouldn't be sped up. Though I couldn't offhand find an 
example where this occurred, the clustering approach seems intuitively 
like a much more complete and scalable solution, at least for a 
read-only table like this.

As to whether the entire index/table was getting into ram between my 
statistics calls, I don't think this was the case. Here's the behavior 
that I found:
- With statistics at 10, the query took 25 (or so) seconds no matter how 
many times I tried different values. The query plan was the same as for 
the 200 and 800 statistics below.
- Trying the same constant a second time gave an instantaneous result, 
I'm guessing because of query/result caching.
- Immediately on increasing the statistics to 200, the query took a 
reproducibly less amount of time. I tried about 10 different values
- Immediately on increasing the statistics to 800, the query 
reproducibly took less than a second every time. I tried about 30 
different values.
- Decreasing the statistics to 100 and running the cluster command 
brought it to 57 ms per query.
- The Activity Monitor (OSX) lists the relevant postgres process as 
taking a little less than 500 megs.
- I didn't try decreasing the statistics back to 10 before I ran the 
cluster command, so I can't show the search times going up because of 
that. But I tried killing the 500 meg process. The new process uses less 
than 5 megs of ram, and still reproducibly returns a result in less than 
60 ms. Again, this is with a statistics value of 100 and the data 
clustered by gene_prediction_view_gene_ref_key.

And I'll consider the idea of using triggers with an ancillary table for 
other purposes; seems like it could be a useful solution for something.

Matthew wrote:
> On Thu, 10 Apr 2008, PFC wrote:
>
> ... Lots of useful advice ...
>
>>     - If you often query rows with the same gene_ref, consider using 
>> CLUSTER to physically group those rows on disk. This way you can get 
>> all rows with the same gene_ref in 1 seek instead of 2000. Clustered 
>> tables also make Bitmap scan happy.
>
> In my opinion this is the one that will make the most difference. You 
> will need to run:
>
> CLUSTER gene_prediction_view USING gene_prediction_view_gene_ref_key;
>
> after you insert significant amounts of data into the table. This 
> re-orders the table according to the index, but new data is always 
> written out of order, so after adding lots more data the table will 
> need to be re-clustered again.
>
>> - Switch to a RAID10 (4 times the IOs per second, however zero gain 
>> if you're single-threaded, but massive gain when concurrent)
>
> Greg Stark has a patch in the pipeline that will change this, for 
> bitmap index scans, by using fadvise(), so a single thread can utilise 
> multiple discs in a RAID array.
>
> Matthew
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
 




 15 Posts in Topic:
large tables and simple "= constant" queries using indexes
john.e.beaver@[EMAIL PROT  2008-04-09 16:58:27 
Re: large tables and simple "= constant" queries using
acmmailing@[EMAIL PROTECT  2008-04-09 23:21:20 
Re: large tables and simple "= constant" queries using
acmmailing@[EMAIL PROTECT  2008-04-10 09:13:39 
Re: large tables and simple "= constant" queries using indexes
lists@[EMAIL PROTECTED]   2008-04-10 10:25:48 
Re: large tables and simple "= constant" queries using
matthew@[EMAIL PROTECTED]  2008-04-10 10:51:13 
Re: large tables and simple "= constant" queries using
john.e.beaver@[EMAIL PROT  2008-04-10 10:44:59 
Re: large tables and simple "= constant" queries using indexes
erik@[EMAIL PROTECTED] (  2008-04-10 10:02:48 
Re: large tables and simple "= constant" queries using indexes
Gaetano Mendola <mendo  2008-04-10 18:18:15 
Re: large tables and simple "= constant" queries using indexes
john.e.beaver@[EMAIL PROT  2008-04-10 12:37:45 
Re: large tables and simple "= constant" queries using indexes
lists@[EMAIL PROTECTED]   2008-04-10 23:37:50 
Re: large tables and simple "= constant" queries using
gsmith@[EMAIL PROTECTED]   2008-04-10 14:47:59 
Re: large tables and simple "= constant" queries using
wmoran@[EMAIL PROTECTED]   2008-04-09 17:36:09 
Re: large tables and simple "= constant" queries using
jgh@[EMAIL PROTECTED] (J  2008-04-09 23:07:50 
Re: large tables and simple "= constant" queries using indexes
lists@[EMAIL PROTECTED]   2008-04-10 00:31:00 
Re: large tables and simple "= constant" queries using
wmoran@[EMAIL PROTECTED]   2008-04-10 13:08:54 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Thu Aug 21 18:36:18 CDT 2008.