Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Novice > Re: Concurrent ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 3 of 5 Topic 3157 of 3323
Post > Topic >>

Re: Concurrent COPY commands

by phillip@[EMAIL PROTECTED] ("Phillip Sitbon") Jul 9, 2008 at 09:35 AM

------=_Part_5834_22789798.1215621315996
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Sorry about the late reply.

I only have two fast SATA drives on software RAID, but that really isn't
the
issue- while the copy commands are going, disk activity is relatively low.
By relatively I mean that I have seen it a lot higher under certain
cir***stances, and I know for sure the disks aren't holding anything back.
I
know it's a bad comparison, but the process generating this huge amount of
data can write directly to the disk very fast and still be CPU-bound,
while
it eventually ends up waiting for postgres when I try to pipe it into the
database. I figured some overhead was to be expected and that's why I
tried
the parallel setup in the first place.

What I see is that after some buffering (not sure it is buffering, but
after
it gets some data), one postgres process will ramp up to 100% CPU (on one
core) for some time, thus blocking its input FIFO. That is when the hard
drive activity goes up a bit, but whatever it is doing is definitely
CPU-bound on that core.

No more than one worker process does this at a time. And no matter what
kind
of FIFO buffers and select() calls I use, the calling process eventually
gets blocked because the postgres processes don't appear to be working in
parallel as well as they could be; hence, postgres doesn't take in any
more
data for a while. I'm really curious about why going parallel x6 is so
much
slower than one process when the disks aren't being pushed that hard
compared to their capabilities.

I am suspecting something wrong with my config, but I can't be sure. Is
1-2
GB for work_mem ok? Would that hurt it?

On a positive note, I let the single-process version run to completion and
I
now have a solid TB of data that I can access and use at lightning speed
:)

Cheers,

  Phillip

On Wed, Jul 2, 2008 at 10:02 AM, Alan Hodgson <ahodgson@[EMAIL PROTECTED]
> wrote:

> On Wednesday 02 July 2008, Phillip Sitbon <phillip@[EMAIL PROTECTED]
> wrote:
> > Hello,
> >
> > I am running some queries that use multiple connections to issue COPY
> > commands which bring data into the same table via different files
(FIFOs
> > to be precise). This is being done on a SMP machine and I am noticing
> > that none of the postgres worker processes operate in parallel, even
> > though there is data available to all of them. The performance is
nearly
> > exactly the same as it is for issuing a single COPY command.
> > Is this
> > normal behavior, even with all of the separate transactions still in
> > progress? Would I be better off doing multithreaded bulk inserts from
my
> > C program rather than sending the data to FIFOs?
>
> Sounds like you're I/O bound - I doubt any other concurrency mechanism
will
> change that much.
>
> >
> > The machine I am using has 16GB of memory and 8 cores, so I've tried
to
> > optimize the configuration accordingly but I am a little lost in some
> > places.
>
> Ah, but what does your RAID controller and drives look like?
>
>
> --
> Alan
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice@[EMAIL PROTECTED]
)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice
>

------=_Part_5834_22789798.1215621315996
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Sorry about the late reply.<br><br>I only have two fast SATA drives on
software RAID, but that really isn&#39;t the issue- while the copy
commands are going, disk activity is relatively low. By relatively I mean
that I have seen it a lot higher under certain cir***stances, and I know
for sure the disks aren&#39;t holding anything back. I know it&#39;s a bad
comparison, but the process generating this huge amount of data can write
directly to the disk very fast and still be CPU-bound, while it eventually
ends up waiting for postgres when I try to pipe it into the database. I
figured some overhead was to be expected and that&#39;s why I tried the
parallel setup in the first place.<br>
<br>What I see is that after some buffering (not sure it is buffering, but
after it gets some data), one postgres process will ramp up to 100% CPU (on
one core) for some time, thus blocking its input FIFO. That is when the
hard drive activity goes up a bit, but whatever it is doing is definitely
CPU-bound on that core.<br>
<br>No more than one worker process does this at a time. And no matter
what kind of FIFO buffers and select() calls I use, the calling process
eventually gets blocked because the postgres processes don&#39;t appear to
be working in parallel as well as they could be; hence, postgres
doesn&#39;t take in any more data for a while. I&#39;m really curious
about why going parallel x6 is so much slower than one process when the
disks aren&#39;t being pushed that hard compared to their
capabilities.<br>
<br>I am suspecting something wrong with my config, but I can&#39;t be
sure. Is 1-2 GB for work_mem ok? Would that hurt it?<br><br>On a positive
note, I let the single-process version run to completion and I now have a
solid TB of data that I can access and use at lightning speed :)<br>
<br>Cheers,<br><br>&nbsp; Phillip<br><br><div class="gmail_quote">On Wed,
Jul 2, 2008 at 10:02 AM, Alan Hodgson &lt;<a
href="mailto:ahodgson@[EMAIL PROTECTED]
">ahodgson@[EMAIL PROTECTED]
>&gt;
wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid
rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="Ih2E3d">On Wednesday 02 July 2008, Phillip Sitbon &lt;<a
href="mailto:phillip@[EMAIL PROTECTED]
">phillip@[EMAIL PROTECTED]
>&gt; wrote:<br>
&gt; Hello,<br>
&gt;<br>
&gt; I am running some queries that use multiple connections to issue
COPY<br>
&gt; commands which bring data into the same table via different files
(FIFOs<br>
&gt; to be precise). This is being done on a SMP machine and I am
noticing<br>
&gt; that none of the postgres worker processes operate in parallel,
even<br>
&gt; though there is data available to all of them. The performance is
nearly<br>
&gt; exactly the same as it is for issuing a single COPY command.<br>
&gt; Is this<br>
&gt; normal behavior, even with all of the separate transactions still
in<br>
&gt; progress? Would I be better off doing multithreaded bulk inserts from
my<br>
&gt; C program rather than sending the data to FIFOs?<br>
<br>
</div>Sounds like you&#39;re I/O bound - I doubt any other concurrency
mechanism will<br>
change that much.<br>
<div class="Ih2E3d"><br>
&gt;<br>
&gt; The machine I am using has 16GB of memory and 8 cores, so I&#39;ve
tried to<br>
&gt; optimize the configuration accordingly but I am a little lost in
some<br>
&gt; places.<br>
<br>
</div>Ah, but what does your RAID controller and drives look like?<br>
<br>
<br>
--<br>
Alan<br>
<font color="#888888"><br>
--<br>
Sent via pgsql-novice mailing list (<a
href="mailto:pgsql-novice@[EMAIL PROTECTED]
">pgsql-novice@[EMAIL PROTECTED]
>)<br>
To make changes to your subscription:<br>
<a href="http://www.postgresql.org/mailpref/pgsql-novice"
target="_blank">http://www.postgresql.org/mailpref/pgsql-novice</a><br>
</font></blockquote></div><br>

------=_Part_5834_22789798.1215621315996--
 




 5 Posts in Topic:
Concurrent COPY commands
phillip@[EMAIL PROTECTED]  2008-07-02 09:44:21 
Re: Concurrent COPY commands
ahodgson@[EMAIL PROTECTED  2008-07-02 10:02:48 
Re: Concurrent COPY commands
phillip@[EMAIL PROTECTED]  2008-07-09 09:35:15 
Re: Concurrent COPY commands
ahodgson@[EMAIL PROTECTED  2008-07-09 09:45:37 
Re: Concurrent COPY commands
phillip@[EMAIL PROTECTED]  2008-07-09 10:07:13 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri Dec 5 5:49:42 CST 2008.