Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Performance > Question about ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 5 Topic 4017 of 4156
Post > Topic >>

Question about disk IO an index use and seeking advice

by nik9000@[EMAIL PROTECTED] ("Nikolas Everett") Apr 24, 2008 at 09:59 AM

------=_Part_11_9236393.1209045562147
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

I have a question about index us and IO and am seeking advice.

We are running postgres 8.2.  We have two big big tables.  Our
~600,000,000
row table is changed very infrequently and is on a 12 disk software raid-6
for historical reasons using an  LSI Logic / Symbios Logic SAS1068 PCI-X
Fusion-MPT SAS  Our ~50,000,000 row staging table is on a 12 disk hardware
raid-10 using a Dell PowerEdge Expandable RAID controller 5.  All of the
rows in the staging table are changed at least once and then deleted and
recreated in the bigger table.  All of the staging table's indexes are on
the raid-10.  The postgres data directory itself is on the raid-6.  I
think
all the disks are SATA 10Ks.  The setup is kind of a beast.

So my disk IO and index question.  When I issue a query on the big table
like this:
SELECT    column, count(*)
FROM    bigtable
GROUP BY column
ORDER BY count DESC
When I run dstat to see my disk IO I see the software raid-6 consistently
holding over 70M/sec.  This is fine with me, but I generally don't like to
do queries that table scan 600,000,000 rows.  So I do:
SELECT    column, count(*)
FROM    bigtable
WHERE date > '4-24-08'
GROUP BY column
ORDER BY count DESC
When I run dstat I see only around 2M/sec and it is not consistent at all.

So my question is, why do I see such low IO load on the index scan
version?
If I could tweak some setting to make more aggressive use of IO, would it
actually make the query faster?  The field I'm scanning has a .960858
correlation, but I haven't vacuumed since im****ting any of the data that
I'm
scanning, though the correlation should remain very high.  When I do a
similar set of queries on the hardware raid I see similar performance
except  the numbers are both more than doubled.

Here is the explain output for the queries:
SELECT    column, count(*)
FROM    bigtable
GROUP BY column
ORDER BY count DESC
"Sort  (cost=74404440.58..74404444.53 rows=1581 width=10)"
"  Sort Key: count(*)"
"  ->  HashAggregate  (cost=74404336.81..74404356.58 rows=1581 width=10)"
"        ->  Seq Scan on bigtable (cost=0.00..71422407.21 rows=596385921
width=10)"
---------------
SELECT    column, count(*)
FROM    bigtable
WHERE date > '4-24-08'
GROUP BY column
ORDER BY count DESC
"Sort  (cost=16948.80..16948.81 rows=1 width=10)"
"  Sort Key: count(*)"
"  ->  HashAggregate  (cost=16948.78..16948.79 rows=1 width=10)"
"        ->  Index Scan using date_idx on bigtable (cost=0.00..16652.77
rows=59201 width=10)"
"              Index Cond: (date > '2008-04-21 00:00:00'::timestamp
without
time zone)"

So now the asking for advice part.  I have two questions:
What is the fastest way to copy data from the smaller table to the larger
table?

We plan to rearrange the setup when we move to Postgres 8.3.  We'll
probably
move all the storage over to a SAN and slice the larger table into monthly
or weekly tables.  Can someone point me to a good page on partitioning? 
My
gut tells me it should be better, but I'd like to learn more about why.
Does anyone have experience migrating large databases to a SAN?  I
understand that it'll give me better fail over capabilities so long as the
SAN itself doesn't go out, but are we going to be sacrificing performance
for this?  That doesn't even mention the cost....

Thanks so much for reading through all this,

--Nik

------=_Part_11_9236393.1209045562147
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

I have a question about index us and IO and am seeking advice.<br><br>We
are running postgres 8.2.&nbsp; We have two big big tables.&nbsp; Our
~600,000,000 row table is changed very infrequently and is on a 12 disk
software raid-6 for historical reasons using an&nbsp; LSI Logic / Symbios
Logic SAS1068 PCI-X Fusion-MPT SAS&nbsp; Our ~50,000,000 row staging table
is on a 12 disk hardware raid-10 using a Dell PowerEdge Expandable RAID
controller 5.&nbsp; All of the rows in the staging table are changed at
least once and then deleted and recreated in the bigger table.&nbsp; All
of the staging table&#39;s indexes are on the raid-10.&nbsp; The postgres
data directory itself is on the raid-6.&nbsp; I think all the disks are
SATA 10Ks.&nbsp; The setup is kind of a beast.<br>
<br>So my disk IO and index question.&nbsp; When I issue a query on the
big table like this:<br>SELECT&nbsp;&nbsp;&nbsp; column,
count(*)<br>FROM&nbsp;&nbsp;&nbsp; bigtable<br>GROUP BY column<br>ORDER BY
count DESC<br>When I run dstat to see my disk IO I see the software raid-6
consistently holding over 70M/sec.&nbsp; This is fine with me, but I
generally don&#39;t like to do queries that table scan 600,000,000
rows.&nbsp; So I do:<br>
SELECT&nbsp;&nbsp;&nbsp; column, count(*)<br>
FROM&nbsp;&nbsp;&nbsp; bigtable<br>WHERE date &gt; &#39;4-24-08&#39;<br>
GROUP BY column<br>
ORDER BY count DESC<br>When I run dstat I see only around 2M/sec and it is
not consistent at all.<br><br>So my question is, why do I see such low IO
load on the index scan version?&nbsp; If I could tweak some setting to
make more aggressive use of IO, would it actually make the query
faster?&nbsp; The field I&#39;m scanning has a .960858 correlation, but I
haven&#39;t vacuumed since im****ting any of the data that I&#39;m
scanning, though the correlation should remain very high.&nbsp; When I do
a similar set of queries on the hardware raid I see similar performance
except&nbsp; the numbers are both more than doubled.<br>
<br>Here is the explain output for the
queries:<br>SELECT&nbsp;&nbsp;&nbsp; column,
count(*)<br>FROM&nbsp;&nbsp;&nbsp; bigtable<br>GROUP BY column<br>ORDER BY
count DESC<br>
&quot;Sort&nbsp; (cost=74404440.58..74404444.53 rows=1581
width=10)&quot;<br>&quot;&nbsp; Sort Key: count(*)&quot;<br>&quot;&nbsp;
-&gt;&nbsp; HashAggregate&nbsp; (cost=74404336.81..74404356.58 rows=1581
width=10)&quot;<br>&quot;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-&gt;&nbsp; Seq Scan on bigtable (cost=0.00..71422407.21 rows=596385921
width=10)&quot;<br>
---------------<br>SELECT&nbsp;&nbsp;&nbsp; column, count(*)<br>
FROM&nbsp;&nbsp;&nbsp; bigtable<br>WHERE date &gt; &#39;4-24-08&#39;<br>
GROUP BY column<br>
ORDER BY count DESC<br>&quot;Sort&nbsp; (cost=16948.80..16948.81 rows=1
width=10)&quot;<br>&quot;&nbsp; Sort Key: count(*)&quot;<br>&quot;&nbsp;
-&gt;&nbsp; HashAggregate&nbsp; (cost=16948.78..16948.79 rows=1
width=10)&quot;<br>&quot;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-&gt;&nbsp; Index Scan using date_idx on bigtable (cost=0.00..16652.77
rows=59201 width=10)&quot;<br>
&quot;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Index Cond: (date &gt; &#39;2008-04-21 00:00:00&#39;::timestamp without
time zone)&quot;<br><br>So now the asking for advice part.&nbsp; I have
two questions:<br>What is the fastest way to copy data from the smaller
table to the larger table?<br>
<br>We plan to rearrange the setup when we move to Postgres 8.3.&nbsp;
We&#39;ll probably move all the storage over to a SAN and slice the larger
table into monthly or weekly tables.&nbsp; Can someone point me to a good
page on partitioning?&nbsp; My gut tells me it should be better, but
I&#39;d like to learn more about why.<br>
Does anyone have experience migrating large databases to a SAN?&nbsp; I
understand that it&#39;ll give me better fail over capabilities so long as
the SAN itself doesn&#39;t go out, but are we going to be sacrificing
performance for this?&nbsp; That doesn&#39;t even mention the cost....<br>
<br>Thanks so much for reading through all this,<br><br>--Nik<br>

------=_Part_11_9236393.1209045562147--
 




 5 Posts in Topic:
Question about disk IO an index use and seeking advice
nik9000@[EMAIL PROTECTED]  2008-04-24 09:59:20 
Re: Question about disk IO an index use and seeking advice
matthew@[EMAIL PROTECTED]  2008-04-24 16:27:39 
Re: Question about disk IO an index use and seeking advice
lists@[EMAIL PROTECTED]   2008-04-24 18:56:34 
Re: Question about disk IO an index use and seeking advice
nik9000@[EMAIL PROTECTED]  2008-04-24 14:19:08 
Re: Question about disk IO an index use and seeking advice
lists@[EMAIL PROTECTED]   2008-04-24 22:02:40 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan13V112 Sun Jul 6 19:50:10 CDT 2008.