------=_Part_5834_22789798.1215621315996
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Sorry about the late reply.
I only have two fast SATA drives on software RAID, but that really isn't
the
issue- while the copy commands are going, disk activity is relatively low.
By relatively I mean that I have seen it a lot higher under certain
cir***stances, and I know for sure the disks aren't holding anything back.
I
know it's a bad comparison, but the process generating this huge amount of
data can write directly to the disk very fast and still be CPU-bound,
while
it eventually ends up waiting for postgres when I try to pipe it into the
database. I figured some overhead was to be expected and that's why I
tried
the parallel setup in the first place.
What I see is that after some buffering (not sure it is buffering, but
after
it gets some data), one postgres process will ramp up to 100% CPU (on one
core) for some time, thus blocking its input FIFO. That is when the hard
drive activity goes up a bit, but whatever it is doing is definitely
CPU-bound on that core.
No more than one worker process does this at a time. And no matter what
kind
of FIFO buffers and select() calls I use, the calling process eventually
gets blocked because the postgres processes don't appear to be working in
parallel as well as they could be; hence, postgres doesn't take in any
more
data for a while. I'm really curious about why going parallel x6 is so
much
slower than one process when the disks aren't being pushed that hard
compared to their capabilities.
I am suspecting something wrong with my config, but I can't be sure. Is
1-2
GB for work_mem ok? Would that hurt it?
On a positive note, I let the single-process version run to completion and
I
now have a solid TB of data that I can access and use at lightning speed
:)
Cheers,
Phillip
On Wed, Jul 2, 2008 at 10:02 AM, Alan Hodgson <ahodgson@[EMAIL PROTECTED]
> wrote:
> On Wednesday 02 July 2008, Phillip Sitbon <phillip@[EMAIL PROTECTED]
> wrote:
> > Hello,
> >
> > I am running some queries that use multiple connections to issue COPY
> > commands which bring data into the same table via different files
(FIFOs
> > to be precise). This is being done on a SMP machine and I am noticing
> > that none of the postgres worker processes operate in parallel, even
> > though there is data available to all of them. The performance is
nearly
> > exactly the same as it is for issuing a single COPY command.
> > Is this
> > normal behavior, even with all of the separate transactions still in
> > progress? Would I be better off doing multithreaded bulk inserts from
my
> > C program rather than sending the data to FIFOs?
>
> Sounds like you're I/O bound - I doubt any other concurrency mechanism
will
> change that much.
>
> >
> > The machine I am using has 16GB of memory and 8 cores, so I've tried
to
> > optimize the configuration accordingly but I am a little lost in some
> > places.
>
> Ah, but what does your RAID controller and drives look like?
>
>
> --
> Alan
>
> --
> Sent via pgsql-novice mailing list (pgsql-novice@[EMAIL PROTECTED]
)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-novice
>
------=_Part_5834_22789798.1215621315996
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Sorry about the late reply.<br><br>I only have two fast SATA drives on
software RAID, but that really isn't the issue- while the copy
commands are going, disk activity is relatively low. By relatively I mean
that I have seen it a lot higher under certain cir***stances, and I know
for sure the disks aren't holding anything back. I know it's a bad
comparison, but the process generating this huge amount of data can write
directly to the disk very fast and still be CPU-bound, while it eventually
ends up waiting for postgres when I try to pipe it into the database. I
figured some overhead was to be expected and that's why I tried the
parallel setup in the first place.<br>
<br>What I see is that after some buffering (not sure it is buffering, but
after it gets some data), one postgres process will ramp up to 100% CPU (on
one core) for some time, thus blocking its input FIFO. That is when the
hard drive activity goes up a bit, but whatever it is doing is definitely
CPU-bound on that core.<br>
<br>No more than one worker process does this at a time. And no matter
what kind of FIFO buffers and select() calls I use, the calling process
eventually gets blocked because the postgres processes don't appear to
be working in parallel as well as they could be; hence, postgres
doesn't take in any more data for a while. I'm really curious
about why going parallel x6 is so much slower than one process when the
disks aren't being pushed that hard compared to their
capabilities.<br>
<br>I am suspecting something wrong with my config, but I can't be
sure. Is 1-2 GB for work_mem ok? Would that hurt it?<br><br>On a positive
note, I let the single-process version run to completion and I now have a
solid TB of data that I can access and use at lightning speed :)<br>
<br>Cheers,<br><br> Phillip<br><br><div class="gmail_quote">On Wed,
Jul 2, 2008 at 10:02 AM, Alan Hodgson <<a
href="mailto:ahodgson@[EMAIL PROTECTED]
">ahodgson@[EMAIL PROTECTED]
>>
wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid
rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="Ih2E3d">On Wednesday 02 July 2008, Phillip Sitbon <<a
href="mailto:phillip@[EMAIL PROTECTED]
">phillip@[EMAIL PROTECTED]
>> wrote:<br>
> Hello,<br>
><br>
> I am running some queries that use multiple connections to issue
COPY<br>
> commands which bring data into the same table via different files
(FIFOs<br>
> to be precise). This is being done on a SMP machine and I am
noticing<br>
> that none of the postgres worker processes operate in parallel,
even<br>
> though there is data available to all of them. The performance is
nearly<br>
> exactly the same as it is for issuing a single COPY command.<br>
> Is this<br>
> normal behavior, even with all of the separate transactions still
in<br>
> progress? Would I be better off doing multithreaded bulk inserts from
my<br>
> C program rather than sending the data to FIFOs?<br>
<br>
</div>Sounds like you're I/O bound - I doubt any other concurrency
mechanism will<br>
change that much.<br>
<div class="Ih2E3d"><br>
><br>
> The machine I am using has 16GB of memory and 8 cores, so I've
tried to<br>
> optimize the configuration accordingly but I am a little lost in
some<br>
> places.<br>
<br>
</div>Ah, but what does your RAID controller and drives look like?<br>
<br>
<br>
--<br>
Alan<br>
<font color="#888888"><br>
--<br>
Sent via pgsql-novice mailing list (<a
href="mailto:pgsql-novice@[EMAIL PROTECTED]
">pgsql-novice@[EMAIL PROTECTED]
>)<br>
To make changes to your subscription:<br>
<a href="http://www.postgresql.org/mailpref/pgsql-novice"
target="_blank">http://www.postgresql.org/mailpref/pgsql-novice</a><br>
</font></blockquote></div><br>
------=_Part_5834_22789798.1215621315996--


|