Hi!
How can I make a Update of a column in a very large table for all rows
without using the double amount of disc space and without any need for
atomic operation?
I have a very large table with about 60 million rows. I sometimes need to
do a simple update to ALL rows that resets a status-flag to zero.
I don't need to have transactional integrity (but of course if the system
crashes, there should be no data corruption. A separate flag in the file
system can well save the fact that that bulk update was in progress) for
this, I don't care or want a abort or "all or nothing" like SQL mandates.
The idea is basically that either this update succeeds or it succeeds or -
there is no "not". It must succeed. It must be tried until it works. If the
update is halfway finished, that's okay.
If I just do an
UPDATE table SET flag=0;
then Pg will make a copy of every row which must be cleaned up by vaccuum.
I understand - and don't like during this specific problem - that PG is a
MVCC database and will keep my old versions of the rows before the update
alive. This is very bad.
If I do a batched loop like this:
UPDATE table SET flag=0 where id>=0 and id <200;
UPDATE table SET flag=0 where id>=200 and id <400;
UPDATE table SET flag=0 where id>=400 and id <600;
....
then PG will seek all over my harddrive I think.
It would be much better if it could just start in the beginning of the
table and work it's way towards the end. But which sort-criteria would
satisfy this? I don't think that there is any SQL that does something like
that.
Another ideas (and I think it's a quite good idea) would be to
drop the column and recreate it with a new default value.
But the problem is that this is not actually MY database, but an appliance
(which has a harddrive that does not have the double amount of space
available btw) and it has to work reliably whenever something special
happens.
And I don't think I should create new columns (the old one would be hidden
and their internal column ids lost I think) all the time, that might have a
limit.
Can I do it maybe every day??
Is there any other way to go?
I would really like to kind of "skip transactions". Of course basic data
integrity in on disc structures, but not atomicy for this bulk update!
I read that PG has an internal command language that is used to build up a
database when all the tables for e.g. table-names are not present yet.
Could I use that to hack my way around transactions?
Basically I can do everything to this PG installation, this is an extremly
controlled, locked down environment. I don't need to care for anyone else,
it's always the same schema, the same version, same OS, etc. and I could
happily apply any hack that solves the problem.
Many thanks for any pointers or ideas how to solve this problem!
Christian
--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
--
Sent via pgsql-general mailing list (pgsql-general@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


|