Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Admin > Handling large ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 10 Topic 4890 of 5124
Post > Topic >>

Handling large volumes of data

by jspies@[EMAIL PROTECTED] (Johann Spies) Apr 8, 2008 at 11:42 AM

I want to set up a system where logs of all kinds can be put in tables
so that queries and re****ts can be generated from there.  Our Firewall
logs alone generate about 600,000,000 lines per month and that will
increase as we get more bandwidth.

I am testing postgresql's ability to handle large datasets.  I have
loaded about 4,900,000,000 in one of two tables with 7200684 in the
second table in database 'firewall', built one index using one
date-field (which took a few days) and used that index to copy about
3,800,000,000 of those records from the first to a third table,
deleted those copied record from the first table and dropped the third
table.

This took about a week on a 2xCPU quadcore server with 8Gb RAM.  During
and
after that exercise the server was lethargic and constantly ran at a
load average of at least 5.

So I decided maybe it is a good thing to run 'autovacuum analyse' to
clean up things.  After a few days the process is still running and
the load average still constantly about 5.  

I then decided to just drop the tables.  I did that about 18 hours ago
and still there is no sign of Postgresql fini****ng that (pid 18614 below).

The machine is sluggish.  A 'ps fanx'  shows:

12501 ?        S      0:00 /usr/lib/postgresql/8.1/bin/postmaster -D
/var/lib/postgresql/8.1/main -c
config_file=/etc/postgresql/8.1/main/postgresql.conf
12504 ?        D      0:54  \_ postgres: writer process
12505 ?        S      1:42  \_ postgres: stats buffer process
12506 ?        S      1:03  |   \_ postgres: stats collector process
15918 ?        D    1214:28  \_ postgres: autovacuum process   firewall
18613 ?        D    130:08  \_ postgres: exilog exilog
146.232.128.197(53907) SELECT
18614 ?        S     12:42  \_ postgres: exilog exilog
146.232.128.197(53908) idle
31932 ?        D     16:11  \_ postgres: exilog exilog
146.232.128.106(36547) INSERT
10380 ?        S      0:00  \_ postgres: log firewall [local] DROP TABLE
waiting
20753 ?        D      5:11  \_ postgres: exilog exilog
146.232.128.197(43581) INSERT
16370 ?        S      5:04  \_ postgres: exilog exilog
146.232.128.59(40620) idle
 3483 ?        S      0:50  \_ postgres: exilog exilog
146.232.128.49(33803) INSERT
 3484 ?        S      0:04  \_ postgres: exilog exilog
146.232.128.49(33804) idle
 3485 ?        S      0:02  \_ postgres: exilog exilog
146.232.128.49(33805) idle

dstat-output shows a lot of IO.  It is actually better now. Earlier today
it was constantly about 50M per line:

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq|_read _writ|_recv _send|__in_ _out_|_int_ _csw_
  4   1  87   9   0   0|  24M   15M|   0     0 |  96B   10k|1108  1902
  3   1  85  11   0   0|  38M   32M|7242B 8796B|   0     0 |1175  1501
  2   0  88  10   0   0|  27M   33M|9949B   10k|   0     0 | 737   714
  1   1  87  10   0   0|  24M   28M|2412B 2948B|   0     0 | 637   533
  2   0  87  11   0   0|  28M   38M|2507B 2306B|   0     0 | 789   765
  2   1  87   9   0   0|  40M   38M|2268B 2450B|   0     0 | 900   795
  2   1  85  12   0   0|  33M   25M|4753B 3376B|  36k    0 | 950  1217
  2   2  80  16   0   0|  24M   28M|2590B 2738B|   0     0 | 899  1487
  2   1  84  12   0   0|  32M   40M|2603B 3025B|   0     0 |1042  1377
  2   0  86  11   0   0|  28M   30M|8530B 9116B|   0     0 |1054  1302
  1   0  77  22   0   0|8412k   12M|  12k   12k|   0     0 | 854  1286

Apparently the best approach is not to have very large tables.  I am
thinking of making (as far as the firewall is concerned) a different
table for each day and then drop the older tables as necessary.

Any advice on how to best handle this kind of setup will be
appreciated.

Regards
Johann
-- 
Johann Spies          Telefoon: 021-808 4036
Informasietegnologie, Universiteit van Stellenbosch

     "Be careful for nothing; but in every thing by prayer
      and supplication with thanksgiving let your requests
      be made known unto God. And the peace of God, which
      passeth all understanding, shall keep your hearts
      hearts and minds through Christ Jesus."
                           Philippians 4:6,7  

-- 
Sent via pgsql-admin mailing list (pgsql-admin@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
 




 10 Posts in Topic:
Handling large volumes of data
jspies@[EMAIL PROTECTED]   2008-04-08 11:42:34 
Re: Handling large volumes of data
shoaibmir@[EMAIL PROTECTE  2008-04-08 19:49:04 
Re: Handling large volumes of data
postgresql@[EMAIL PROTECT  2008-04-08 11:55:00 
Re: Handling large volumes of data
jspies@[EMAIL PROTECTED]   2008-04-08 14:16:03 
Re: Handling large volumes of data
michael.monnerie@[EMAIL P  2008-04-08 12:01:14 
Re: Handling large volumes of data
jspies@[EMAIL PROTECTED]   2008-04-08 12:13:36 
Re: Handling large volumes of data
postgresql@[EMAIL PROTECT  2008-04-08 14:00:07 
Re: Handling large volumes of data
michael.monnerie@[EMAIL P  2008-04-08 21:24:07 
Re: Handling large volumes of data
ajs@[EMAIL PROTECTED] (A  2008-04-09 10:00:45 
Re: Handling large volumes of data
tsakai@[EMAIL PROTECTED]   2008-04-08 08:47:06 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan13V112 Sun Jul 6 17:17:17 CDT 2008.