Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Performance > Re: large table...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 4 of 15 Topic 3978 of 4361
Post > Topic >>

Re: large tables and simple "= constant" queries using indexes

by lists@[EMAIL PROTECTED] (PFC) Apr 10, 2008 at 10:25 AM

>> Perfect - thanks Arjen. Using your value of 200 decreased the time to  
>> 15 seconds, and using a value of 800 makes it almost instantaneous. I'm
 
>> really not concerned about space usage; if having more statistics  
>> increases performance this much, maybe I'll just default it to 1000?
>>  Strangely, the steps taken in the explain analyze are all the same.  
>> The only differences are the predicted costs (and execution times).
>>  explain analyze for a statistics of 200:


	Actually, since you got the exact same plans and the second one is a lot 

faster, this can mean that the data is in the disk cache, or that the  
second query has all the rows it needs contiguous on disk whereas the  
first one has its rows all over the place. Therefore you are IO-bound.  
Statistics helped, perhaps (impossible to know since you don't provide the
 
plan wit statistics set to 10), but your main problem is IO.
	Usually setting the statistics to 100 is enough...

	Now, here are some solutions to your problem in random order :

	- Install 64 bit Linux, 64 bit Postgres, and get lots of RAM, lol.
	- Switch to a RAID10 (4 times the IOs per second, however zero gain if  
you're single-threaded, but massive gain when concurrent)

	- If you just need a count by gene_ref, a simple solution is to keep it  
in a separate table and update it via triggers, this is a frequently used 

solution, it works well unless gene_ref is updated all the time (which is 

probably not your case). Since you will be vacuuming this count-cache  
table often, don't put the count as a field in your sgd_annotations table,
 
just create a small table with 2 fields, gene_ref and count (unless you  
want to use the count for other things and you don't like the join).

	From your table definition gene_ref references another table. It would  
seem that you have many rows in gene_prediction_view with the same  
gene_ref value.

	- If you often query rows with the same gene_ref, consider using CLUSTER 

to physically group those rows on disk. This way you can get all rows with
 
the same gene_ref in 1 seek instead of 2000. Clustered tables also make  
Bitmap scan happy.
	This one is good since it can also speed up other queries (not just the  
count).
	You could also cluster on (gene_ref,go_id) perhaps, I don't know what  
your columns mean. Only you can decide that because clustering order has  
to be meaningful (to group rows according to something that makes sense  
and not at random).

	* Lose some weight :

CREATE INDEX ix_gene_prediction_view_gene_ref
  ON gene_prediction_view
  USING btree
  (gene_ref);

	- This index is useless since you have an UNIQUE on (gene_ref, go_id)  
which is also an index.
	Remove the index on (gene_ref), it will leave space in the disk cache for
 
other things.

	- Since (gene_ref, go_id) is UNIQUE NOT NULL, you might be able to use  
that as your primary key, but only if it is never updated of course. Saves
 
another index.

	- If you often do queries that fetch many rows, but seldom fetch the  
description, tell PG to always store the description in offline compressed
 
form (read the docs on ALTER TABLE ... SET STORAGE ..., I forgot the  
syntax). Point being to make the main table smaller.

	- Also I see a category as VARCHAR. If you have a million different  
categories, that's OK, but if you have 100 categories for your 15M rows,  
put them in a separate table and replace that by a category_id (normalize 

!)


-- 
Sent via pgsql-performance mailing list (pgsql-performance@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
 




 15 Posts in Topic:
large tables and simple "= constant" queries using indexes
john.e.beaver@[EMAIL PROT  2008-04-09 16:58:27 
Re: large tables and simple "= constant" queries using
acmmailing@[EMAIL PROTECT  2008-04-09 23:21:20 
Re: large tables and simple "= constant" queries using
acmmailing@[EMAIL PROTECT  2008-04-10 09:13:39 
Re: large tables and simple "= constant" queries using indexes
lists@[EMAIL PROTECTED]   2008-04-10 10:25:48 
Re: large tables and simple "= constant" queries using
matthew@[EMAIL PROTECTED]  2008-04-10 10:51:13 
Re: large tables and simple "= constant" queries using
john.e.beaver@[EMAIL PROT  2008-04-10 10:44:59 
Re: large tables and simple "= constant" queries using indexes
erik@[EMAIL PROTECTED] (  2008-04-10 10:02:48 
Re: large tables and simple "= constant" queries using indexes
Gaetano Mendola <mendo  2008-04-10 18:18:15 
Re: large tables and simple "= constant" queries using indexes
john.e.beaver@[EMAIL PROT  2008-04-10 12:37:45 
Re: large tables and simple "= constant" queries using indexes
lists@[EMAIL PROTECTED]   2008-04-10 23:37:50 
Re: large tables and simple "= constant" queries using
gsmith@[EMAIL PROTECTED]   2008-04-10 14:47:59 
Re: large tables and simple "= constant" queries using
wmoran@[EMAIL PROTECTED]   2008-04-09 17:36:09 
Re: large tables and simple "= constant" queries using
jgh@[EMAIL PROTECTED] (J  2008-04-09 23:07:50 
Re: large tables and simple "= constant" queries using indexes
lists@[EMAIL PROTECTED]   2008-04-10 00:31:00 
Re: large tables and simple "= constant" queries using
wmoran@[EMAIL PROTECTED]   2008-04-10 13:08:54 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Wed Oct 15 20:52:52 CDT 2008.