Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql General > Re: clustering ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 6 of 15 Topic 15434 of 16301
Post > Topic >>

Re: clustering without locking

by craig@[EMAIL PROTECTED] (Craig Ringer) May 3, 2008 at 08:21 AM

Scott Ribe wrote:
>> Wouldn't new / updated tuples just get put in the hole, fairly rapidly
>> un-clustering the table again?
>>     
>
> How is that different than putting them in newly-allocated space at the
end
> of the table?
>
>   
It isn't, I just wasn't thinking straight.

This is probably a stupid idea as I'm fairly clueless about Pg's 
innards, but I find myself wondering about doing a non-exclusive cluster 
in small progressive chunks in series of short transactions.

In each transaction tuples from a smallish contiguous chunk of pages 
(beginning at the start of the table and moving onward as the cluster 
progresses) are copied to free or newly allocated space later in the 
table and given the xid of the transaction. The old versions are marked 
as having been obsoleted by this transaction. The transaction then 
commits. This step is much like a standalone UPDATE that's not actually 
changing the field values and that has some weird tuple selection
criteria.

Progressive clustering then waits until the last transaction that could 
see the old copies of the tuples that were just relocated finishes. When 
the space that was occupied by the moved tuples becomes reclaimable (ie 
vacuum could mark it as free if it was to run) progressive clustering 
resumes. A new transaction is begun. It looks up tuples from the index 
being clustered on in order and reads enough to fill the space just 
freed into RAM. It then writes those tuples into the freed space 
(without ever actually marking it as free, so no other transactions 
could ever write to it), giving them the xid of the writing transaction. 
The old copies are marked as obsoleted by this transaction and the 
transaction commits. Again, it's like an UPDATE, but combined with a 
small selective vacuum that never bothers to update the free space map 
because it instantly reuses the space.

Now a small chunk of the table, at the start, is in order. The process 
can now begin again with the next chunk of unordered pages. It never 
needs to create a huge hole in the table or expand the table massively. 
It can even space the data out if a fillfactor has been set by leaving 
gaps as it writes the replacement data into the freed chunks and 
updating the free space map.

The progressive cluster would also need to be able to run a periodic 
vacuum (or do the equivalent internally) with a restriction that 
prevented it from examining pages in the range the progressive cluster 
was currently trying to free. Otherwise all the space freed by old 
versions of rows that're now neatly ordered in the table wouldn't be 
freed and couldn't be used as scratch space.

In my ignorance of Pg's innards it seems like all this should work, 
though it'd basically never finish if there were long running 
transactions touching the table being clustered. The other potential 
problem I wonder about is bloating the indexes, since every record in 
the table is being moved twice over the course of the progressive 
cluster operation.

Oh: There's actually only a need for one transaction per step:

begin transaction
move ordered tuples into just-freed hole
move tuples from next block of pages to free space later in table
commit
wait for all transactions that can see the old versions to complete
repeat

So ... is this crazy? Concurrently clustering the table by moving each 
record *twice*, in batches, with pauses to allow old versions to cease 
being visible by any live transaction? Or can it actually work?

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
 




 15 Posts in Topic:
clustering without locking
fschmidt@[EMAIL PROTECTED  2008-05-01 17:12:52 
Re: clustering without locking
kleptog@[EMAIL PROTECTED]  2008-05-02 08:18:39 
Re: clustering without locking
scott_ribe@[EMAIL PROTECT  2008-05-02 07:51:57 
Re: clustering without locking
craig@[EMAIL PROTECTED]   2008-05-02 22:09:01 
Re: clustering without locking
scott_ribe@[EMAIL PROTECT  2008-05-02 08:29:28 
Re: clustering without locking
craig@[EMAIL PROTECTED]   2008-05-03 08:21:43 
Re: clustering without locking
tgl@[EMAIL PROTECTED] (T  2008-05-02 20:41:35 
Re: clustering without locking
craig@[EMAIL PROTECTED]   2008-05-03 09:53:55 
Re: clustering without locking
tgl@[EMAIL PROTECTED] (T  2008-05-03 13:27:14 
Re: clustering without locking
craig@[EMAIL PROTECTED]   2008-05-04 02:19:14 
Re: clustering without locking
tgl@[EMAIL PROTECTED] (T  2008-05-03 14:33:18 
Re: clustering without locking
craig@[EMAIL PROTECTED]   2008-05-04 03:04:56 
Re: clustering without locking
tgl@[EMAIL PROTECTED] (T  2008-05-02 10:51:07 
Re: clustering without locking
scott_ribe@[EMAIL PROTECT  2008-05-02 09:08:41 
Re: clustering without locking
masao.fujii@[EMAIL PROTEC  2008-05-02 16:11:24 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Wed Aug 20 8:35:56 CDT 2008.