Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql General > Re: clustering ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 8 of 15 Topic 15434 of 17602
Post > Topic >>

Re: clustering without locking

by craig@[EMAIL PROTECTED] (Craig Ringer) May 3, 2008 at 09:53 AM

Tom Lane wrote:
> Craig Ringer <craig@[EMAIL PROTECTED]
> writes:
>> So ... is this crazy? Concurrently clustering the table by moving each 
>> record *twice*, in batches, with pauses to allow old versions to cease 
>> being visible by any live transaction? Or can it actually work?
> 
> It seems to me you'd have a pretty horrid bloat problem: at completion,
> the table and all its indexes must be at least twice the minimum size,

I was hoping that wouldn't be the case for the table its self, though I 
expected the indexes would be pretty messed up and need their own 
cleanup process.

I'll try to explain my thinking, as I'm curious about what I've 
misunderstood.

The do***entation on VACUUM (well, on pg_freespacemap actually) suggests 
that vacuum can recover space from pages that contain some free space 
but are not wholly free. Is that right?

In this theoretical progressive cluster, tuples are moved in chunks from 
their original locations to free space later in the table (probably 
newly allocated at the end). However, if vacuum can recover partial 
pages shouldn't it be marking some of the scratch space and originally 
allocated space as free (thus permitting its reuse as scratch space) as 
tuples are picked out and moved back to the start of the table in order?

Guesswork:

If the initial order of tuples in the table before clustering starts is 
completely random then early on the table would expand by a full 
progressive cluster chunk size each step, because the pages being moved 
back to the start would be from all over the place and the space being 
freed would be very scattered and hard to reclaim.

Later on, though, less new space would have to be allocated because more 
and more of the space allocated earlier to hold moved tuples would be 
being freed up in useful chunks that could be reused. That'd also permit 
inserts and updates unrelated to the ongoing progressive clustering 
process to be written inside already allocated space rather than being 
appended to the table after all the progressive clustering scratch space.

So, if I understand vacuum's reclaiming correctly then even starting off 
with a completely record order it should expand the table somewhat for 
scratch space, but to less than double its original size. How much less, 
and how much could be truncated at the end, would depend on how good 
vacuum is at finding small holes to shove new/moved tuples into, and how 
similar the tuple sizes are. Right?

That assumes that the initial ordering of tuples is in fact random. If 
you're re-clustering a table it's probably far from random, and many of 
the tuples will already be in roughly the right areas. That should 
permit a much smaller allocation of scratch space, since much more of 
the data from the scratch area will be copied back and marked as free 
for reuse (for new unrelated inserts or for more moved tuples) within 
the next few steps of the progressive cluster. Especially if there's a 
non-100% fillfactor it should also be possible to truncate most of the 
newly allocated space at the end, as new inserts can be put in sensible 
places in the table rather than at the end.

Now, even if that's right the indexes will presumably be in an awful 
state. I've noticed that PostgreSQL has `CREATE INDEX CONCURRENTLY' but 
not `REINDEX CONCURRENTLY'. That's not too surprising, as nobody's 
trying to use an index while you're initially creating it. If there's no 
way to clean up the indexes after an operation like this then it's 
probably not worth it ... so, is there any way to clean up / optimize 
and index that doesn't require a long exclusive lock? A REINDEX 
CONCURRENTLY equivalent?

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
 




 15 Posts in Topic:
clustering without locking
fschmidt@[EMAIL PROTECTED  2008-05-01 17:12:52 
Re: clustering without locking
kleptog@[EMAIL PROTECTED]  2008-05-02 08:18:39 
Re: clustering without locking
scott_ribe@[EMAIL PROTECT  2008-05-02 07:51:57 
Re: clustering without locking
craig@[EMAIL PROTECTED]   2008-05-02 22:09:01 
Re: clustering without locking
scott_ribe@[EMAIL PROTECT  2008-05-02 08:29:28 
Re: clustering without locking
craig@[EMAIL PROTECTED]   2008-05-03 08:21:43 
Re: clustering without locking
tgl@[EMAIL PROTECTED] (T  2008-05-02 20:41:35 
Re: clustering without locking
craig@[EMAIL PROTECTED]   2008-05-03 09:53:55 
Re: clustering without locking
tgl@[EMAIL PROTECTED] (T  2008-05-03 13:27:14 
Re: clustering without locking
craig@[EMAIL PROTECTED]   2008-05-04 02:19:14 
Re: clustering without locking
tgl@[EMAIL PROTECTED] (T  2008-05-03 14:33:18 
Re: clustering without locking
craig@[EMAIL PROTECTED]   2008-05-04 03:04:56 
Re: clustering without locking
tgl@[EMAIL PROTECTED] (T  2008-05-02 10:51:07 
Re: clustering without locking
scott_ribe@[EMAIL PROTECT  2008-05-02 09:08:41 
Re: clustering without locking
masao.fujii@[EMAIL PROTEC  2008-05-02 16:11:24 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Mon Dec 1 18:50:27 CST 2008.