Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql General > Re: High insert...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 4 of 7 Topic 15970 of 17437
Post > Topic >>

Re: High inserting by syslog

by dev@[EMAIL PROTECTED] (Richard Huxton) Jul 3, 2008 at 05:08 PM

Valter Douglas Lisb=C3=B4a Jr. wrote:
> Hello all, I have a perl script thats load a entire day squid log to
a=20
> postgres table. I run it at midnight by cronjob and turns off the
indexes=
=20
> before do it (turning it on after). The script works fine, but I want
to=
=20
> change this to a diferent approach.
>=20
> I'd like to insert on the fly the log lines, so long it be generated to
h=
ave=20
> the data on-line. But the table has some indexes and the load of lines
is=
=20
> about 300.000/day, so the average inserting is 3,48/sec. I think this
cou=
ld=20
> overload the database server (i did not test yet), so if I want to
create=
 a=20
> no indexed table to receive the on-line inserting and do a job moving
all=
=20
> lines to the main indexed table at midnight.

There are two things to bear in mind.

1. What you need to worry about is the peak rate of inserts, not the=20
average. Even at 30 rows/sec that's not too bad.
2. What will your system do if the database is taken offline for a=20
period? How will it catch up?

The limiting factor will be the speed of your disks. Assuming a single=20
disk (no battery-backed raid cache) you'll be limited to your RPM (e.g.=20
10,000 commits / minute). That will fall off rapidly if you only have=20
one disk and it's busy doing other reads/writes. But, if you batch many=20
log-lines together you need many less commits.

So - to address both points above, I'd use a script with a flexible=20
batch-size.
1. Estimate how many log-lines need to be saved to the database.
2. Batch together a suitable number of lines (1-1000) and commit them
to=20
the database.
3. Sleep 1-10 secs
4. Back to #1, disconnect and reconnect every once in a while.

If the database is unavailable for any reason, this script will=20
automatically feed rows faster when it returns.

> My question is, Does exists a better solution, or this tatic is a good
wa=
y to=20
> do this?

You might want to partition the table monthly. That will make it easier=20
to manage a few years from now.
http://www.postgresql.org/docs/current/static/ddl-partitioning.html

Also, consider increasing checkpoint_segments if you find the system=20
gets backed-up.
Perhaps consider setting synchronous_commit to off (but only for the=20
connection saving the log-lines to the database)
http://www.postgresql.org/docs/8.3/static/runtime-config-wal.html

--=20
   Richard Huxton
   Archonet Ltd

--=20
Sent via pgsql-general mailing list (pgsql-general@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
 




 7 Posts in Topic:
High inserting by syslog
douglas@[EMAIL PROTECTED]  2008-07-03 12:05:15 
Re: High inserting by syslog
jd@[EMAIL PROTECTED] (&q  2008-07-03 09:03:49 
Re: High inserting by syslog
douglas@[EMAIL PROTECTED]  2008-07-03 13:23:24 
Re: High inserting by syslog
dev@[EMAIL PROTECTED] (R  2008-07-03 17:08:26 
Re: High inserting by syslog
ahodgson@[EMAIL PROTECTED  2008-07-03 09:27:26 
Re: High inserting by syslog
scrawford@[EMAIL PROTECTE  2008-07-03 09:32:53 
Re: High inserting by syslog
Rainer Gerhards <rgerh  2008-07-04 01:41:37 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sat Nov 22 16:40:46 CST 2008.