Valter Douglas Lisb=C3=B4a Jr. wrote:
> Hello all, I have a perl script thats load a entire day squid log to
a=20
> postgres table. I run it at midnight by cronjob and turns off the
indexes=
=20
> before do it (turning it on after). The script works fine, but I want
to=
=20
> change this to a diferent approach.
>=20
> I'd like to insert on the fly the log lines, so long it be generated to
h=
ave=20
> the data on-line. But the table has some indexes and the load of lines
is=
=20
> about 300.000/day, so the average inserting is 3,48/sec. I think this
cou=
ld=20
> overload the database server (i did not test yet), so if I want to
create=
a=20
> no indexed table to receive the on-line inserting and do a job moving
all=
=20
> lines to the main indexed table at midnight.
There are two things to bear in mind.
1. What you need to worry about is the peak rate of inserts, not the=20
average. Even at 30 rows/sec that's not too bad.
2. What will your system do if the database is taken offline for a=20
period? How will it catch up?
The limiting factor will be the speed of your disks. Assuming a single=20
disk (no battery-backed raid cache) you'll be limited to your RPM (e.g.=20
10,000 commits / minute). That will fall off rapidly if you only have=20
one disk and it's busy doing other reads/writes. But, if you batch many=20
log-lines together you need many less commits.
So - to address both points above, I'd use a script with a flexible=20
batch-size.
1. Estimate how many log-lines need to be saved to the database.
2. Batch together a suitable number of lines (1-1000) and commit them
to=20
the database.
3. Sleep 1-10 secs
4. Back to #1, disconnect and reconnect every once in a while.
If the database is unavailable for any reason, this script will=20
automatically feed rows faster when it returns.
> My question is, Does exists a better solution, or this tatic is a good
wa=
y to=20
> do this?
You might want to partition the table monthly. That will make it easier=20
to manage a few years from now.
http://www.postgresql.org/docs/current/static/ddl-partitioning.html
Also, consider increasing checkpoint_segments if you find the system=20
gets backed-up.
Perhaps consider setting synchronous_commit to off (but only for the=20
connection saving the log-lines to the database)
http://www.postgresql.org/docs/8.3/static/runtime-config-wal.html
--=20
Richard Huxton
Archonet Ltd
--=20
Sent via pgsql-general mailing list (pgsql-general@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


|