Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql General > Re: High insert...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 7 of 7 Topic 15970 of 17437
Post > Topic >>

Re: High inserting by syslog

by Rainer Gerhards <rgerhards@[EMAIL PROTECTED] > Jul 4, 2008 at 01:41 AM

Let me look at the source of the traffic. You could easy lots of the
problems (except for the resulting database table size) if you push
the log file into the database with some rate-limiting. One approach
to do this is to use rsyslog's file monitor [1]. I am not sure if it
will work for your exact needs, but it is easy to find out. Once the
data is inside rsyslog, you can use it's rate-limiting feature (e.g.
wait a bit after each insert). Rsyslog also permits you to insert any
kind of log data in queued modes, which enable you to de-couple the
log producer from the consumer (the database in this case). Actually,
this is for what the system was designed. I have an example scenario
description for queued high-rate database inserts in [2].

HTH,
Rainer

[1] http://www.rsyslog.com/doc-imfile.html
[2] http://www.rsyslog.com/doc-rsyslog_high_database_rate.html

Disclaimer: I am the rsyslog author, so I tend to have a bias ;)

On Jul 3, 6:32 pm, scrawf...@[EMAIL PROTECTED]
 (Steve Crawford)
wrote:
> Valter Douglas Lisb=F4a Jr. wrote:> Hello all, I have a perl script
thats=
 load a entire day squid log to a
> > postgres table. I run it at midnight by cronjob and turns off the
index=
es
> > before do it (turning it on after). The script works fine, but I want
t=
o
> > change this to a diferent approach.
>
> > I'd like to insert on the fly the log lines, so long it be generated
to=
 have
> > the data on-line. But the table has some indexes and the load of lines
=
is
> > about 300.000/day, so the average inserting is 3,48/sec. I think this
c=
ould
> > overload the database server (i did not test yet), so if I want to
crea=
te a
> > no indexed table to receive the on-line inserting and do a job moving
a=
ll
> > lines to the main indexed table at midnight.
>
> > My question is, Does exists a better solution, or this tatic is a good
=
way to
> > do this?
>
> The average matters less than the peak. Unless your traffic is even
> 24x7, your rate will be higher. If your log is concentrated in an 8-hour
> workday, your average daytime rate will be closer to 10/second with
> peaks that are much higher. You might consider some form of buffering
> between the Squid log and the database to avoid blocking. Your current
> method has the advantage of moving the database workload to off-hours.
>
> Instead of moving data, you might look into partitioning your data. How
> long do you keep your logs actively available in PostgreSQL? I know one
> company that partitions their log data into months (parent table with
> child table for each month). They keep 12-months of data live so they
> rotate through the child tables. At the start of a month, that month's
> table is truncated. Modify as appropriate for your load - perhaps a
> partition (child-table) for each day. Or a current-day child-table that
> is migrated into a main-table nightly. Either way you can make it appear
> that the parent-table is an up-to-date complete table.
>
> You will need to do some reading on table partitioning if you go this
> route. Pay special attention to the requirements needed to optimize
queri=
es.
>
> You might also want to check your stats tables to make sure the indexes
> you currently maintain are actually used by your queries and remove any
> that are unnecessary to reduce index-maintenance overhead.
>
> Another possible technique would be to have a nightly process that
> creates partial-indexes. One set of indexes would cover all data prior
> to midnight and the other set all data after midnight. Depending on the
> nature of your "real-time" vs. historical queries, these might even be
> different indexes. You will have to tweak your queries to make use of
> your indexes but your live data won't have to update your "historical"
> indexes. Warning: the date-constraint in the partial index must be
> static - you can't do something like "...where squidlog_timestamp >
> current_date...".  Your nightly process will be creating new indexes
> with a new date-constraint. You might even be able to get away with
> having no indexes on the current-day's data and just recreate historical
> indexes nightly (similar to your no-index with nightly-insert).
>
> But don't try the above till you determine you have a problem. On modest
> 3-year-old non-dedicated (also running file-storage, rsync backup,
> mail...) hardware with basic SATA RAID1 we are handling a similar load
> without strain.
>
> Cheers,
> Steve
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@[EMAIL PROTECTED]
)
> To make changes to your
subscription:http://www.postgresql.org/mailpref/p=
gsql-general
 




 7 Posts in Topic:
High inserting by syslog
douglas@[EMAIL PROTECTED]  2008-07-03 12:05:15 
Re: High inserting by syslog
jd@[EMAIL PROTECTED] (&q  2008-07-03 09:03:49 
Re: High inserting by syslog
douglas@[EMAIL PROTECTED]  2008-07-03 13:23:24 
Re: High inserting by syslog
dev@[EMAIL PROTECTED] (R  2008-07-03 17:08:26 
Re: High inserting by syslog
ahodgson@[EMAIL PROTECTED  2008-07-03 09:27:26 
Re: High inserting by syslog
scrawford@[EMAIL PROTECTE  2008-07-03 09:32:53 
Re: High inserting by syslog
Rainer Gerhards <rgerh  2008-07-04 01:41:37 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sat Nov 22 16:38:00 CST 2008.