Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Performance > Re: Question ab...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 3 of 5 Topic 4017 of 4424
Post > Topic >>

Re: Question about disk IO an index use and seeking advice

by lists@[EMAIL PROTECTED] (PFC) Apr 24, 2008 at 06:56 PM

> Our ~600,000,000
> row table is changed very infrequently and is on a 12 disk
software=20=20
> raid-6
> for historical reasons using an  LSI Logic / Symbios Logic SAS1068 PCI-X
> Fusion-MPT SAS  Our ~50,000,000 row staging table is on a 12 disk=20=20
> hardware
> raid-10 using a Dell PowerEdge Expandable RAID controller 5.

> So my disk IO and index question.  When I issue a query on the big table
> like this:
> SELECT    column, count(*)
> FROM    bigtable
> GROUP BY column
> ORDER BY count DESC
> When I run dstat to see my disk IO I see the software raid-6
consistently
> holding over 70M/sec.  This is fine with me, but I generally don't like=
=20=20
> to
> do queries that table scan 600,000,000 rows.  So I do:

	Note that RAID5 or 6 is fine when reading, it's the small random
writes=20=
=20
that kill it.
	Is the table being inserted to while you run this query, which will=20=20
generate small random writes for the index updates ?
	Or is the table only inserted to during the nightly cron job ?

	70 MB/s seems to me quite close to what a single SATA disk could do
these=
=20=20
days.
	My software RAID 5 saturates the PCI bus in the machine and pushes
more=20=
=20
than 120 MB/s.
	You have PCI-X and 12 disks so you should get huuuuge disk
throughput,=20=
=20
really mindboggling figures, not 70 MB/s.
	Since this seems a high-budget system perhaps a good fast hardware RAID ?
	Or perhaps this test was performed under heavy load and it is actually a=
=20=20
good result.


> All of the
> rows in the staging table are changed at least once and then deleted and
> recreated in the bigger table.  All of the staging table's indexes are
on
> the raid-10.  The postgres data directory itself is on the raid-6. 
I=20=
=20
> think
> all the disks are SATA 10Ks. The setup is kind of a beast.
>
> SELECT    column, count(*)
> FROM    bigtable
> WHERE date > '4-24-08'
> GROUP BY column
> ORDER BY count DESC
> When I run dstat I see only around 2M/sec and it is not consistent
at=20=
=20
> all.
>
> So my question is, why do I see such low IO load on the index scan=20=20
> version?

	First, it is probably choosing a bitmap index scan, which means it needs=
=20=20
to grab lots of pages from the index. If your index is fragmented,
just=20=
=20
scanning the index could take a long time.
	Then, i is probably taking lots of random bites in the table data.
	If this is an archive table, the dates should be increasing
sequentially.=
=20=20
If this is not the case you will get random IO which is rather bad on
huge=
=20=20
data sets.

	So.

	If you need the rows to be grouped on-disk by date (or perhaps
another=20=
=20
field if you more frequently run other types of query, like grouping
by=20=
=20
category, or perhaps something else, you decide) :

	The painful thing will be to reorder the table, either
	- use CLUSTER
	- or recreate a table and INSERT INTO it ORDER BY the field you
chose.=20=
=20
This is going to take a while, set sort_mem to a large value. Then create=
=20=20
the indexes.

	Then every time you insert data in the archive, be sure to insert it
in=20=
=20
big batches, ORDER BY the field you chose. That way new inserts will
be=20=
=20
still in the order you want.=09

	While you're at it you might think about partitioning the monster on
a=20=
=20
useful criterion (this depends on your querying).

> If I could tweak some setting to make more aggressive use of IO, would
it
> actually make the query faster?  The field I'm scanning has a .960858
> correlation, but I haven't vacuumed since im****ting any of the data that

	You have ANALYZEd at least ?
	Cause if you didn't and an index scan (not bitmap) comes up on this kind=
=20=20
of query and it does a million index hits you have a problem.

> I'm
> scanning, though the correlation should remain very high.  When I do a
> similar set of queries on the hardware raid I see similar performance
> except  the numbers are both more than doubled.
>
> Here is the explain output for the queries:
> SELECT    column, count(*)
> FROM    bigtable
> GROUP BY column
> ORDER BY count DESC
> "Sort  (cost=3D74404440.58..74404444.53 rows=3D1581 width=3D10)"
> "  Sort Key: count(*)"
> "  ->  HashAggregate  (cost=3D74404336.81..74404356.58 rows=3D1581
width=
=3D10)"
> "        ->  Seq Scan on bigtable (cost=3D0.00..71422407.21
rows=3D596385=
921
> width=3D10)"

	Plan is OK (nothing else to do really)

> ---------------
> SELECT    column, count(*)
> FROM    bigtable
> WHERE date > '4-24-08'
> GROUP BY column
> ORDER BY count DESC
> "Sort  (cost=3D16948.80..16948.81 rows=3D1 width=3D10)"
> "  Sort Key: count(*)"
> "  ->  HashAggregate  (cost=3D16948.78..16948.79 rows=3D1 width=3D10)"
> "        ->  Index Scan using date_idx on bigtable
(cost=3D0.00..16652.77
> rows=3D59201 width=3D10)"
> "              Index Cond: (date > '2008-04-21
00:00:00'::timestamp=20=20
> without
> time zone)"

	Argh.
	So you got an index scan after all.
	Is the 59201 rows estimate right ? If it is 10 times that you really
have=
=20=20
a problem.
	Is it ANALYZEd ?

> So now the asking for advice part.  I have two questions:
> What is the fastest way to copy data from the smaller table to the
larger
> table?

	INSERT INTO SELECT FROM (add ORDER BY to taste)

> We plan to rearrange the setup when we move to Postgres 8.3. 
We'll=20=20
> probably
> move all the storage over to a SAN and slice the larger table into=20=20
> monthly
> or weekly tables.  Can someone point me to a good page on partitioning?=
=20=20=20
> My
> gut tells me it should be better, but I'd like to learn more about why.

	Because in your case, records having the dates you want will be in
1=20=20
partition (or 2), so you get a kind of automatic CLUSTER. For instance if=
=20=20
you do your query on last week's data, it will seq scan last week's=20=20
partition (which will be a much more manageable size) and not even look
at=
=20=20
the others.

Matthew said :
> You could possibly not bother with a staging table, and replacethe mass=
=20=20
> copy with making a new partition. Not sure of the details myself though.

	Yes you could do that.
	When a partition ceases to become actively updated, though, you
should=20=
=20
CLUSTER it so it is really tight and fast.
	CLUSTER on a partition which has a week's worth of data will obviously
be=
=20=20
much faster than CLUSTERing your monster archive.









--=20
Sent via pgsql-performance mailing list (pgsql-performance@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
 




 5 Posts in Topic:
Question about disk IO an index use and seeking advice
nik9000@[EMAIL PROTECTED]  2008-04-24 09:59:20 
Re: Question about disk IO an index use and seeking advice
matthew@[EMAIL PROTECTED]  2008-04-24 16:27:39 
Re: Question about disk IO an index use and seeking advice
lists@[EMAIL PROTECTED]   2008-04-24 18:56:34 
Re: Question about disk IO an index use and seeking advice
nik9000@[EMAIL PROTECTED]  2008-04-24 14:19:08 
Re: Question about disk IO an index use and seeking advice
lists@[EMAIL PROTECTED]   2008-04-24 22:02:40 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Mon Dec 1 8:14:44 CST 2008.