Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql General > Multithreaded q...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 4 Topic 15779 of 17437
Post > Topic >>

Multithreaded queue in PgSQL

by alokin1@[EMAIL PROTECTED] (Nikola Milutinovic) Jun 10, 2008 at 03:40 AM

--0-152070369-1213094451=:45349
Content-Type: text/plain; charset=us-ascii

Hi all.

This may be trivial, but I cannot find good references for it. The problem
is this:

Suppose we have one table in PgSQL which is a job queue, each row
represents one job with several status flags, IDs,... Several processes
will attempt to access the queue and "take" their batch of jobs, the batch
will have some parameterizable size. So, the simple idea is "select N
lowest IDs that do not have a flag <in process> set and set the flag",
"then proceed with whatever it is that should be done".

Trouble is, with MVCC I don't see a way to prevent overlapping and race
conditions. Oh, sure, if I issue select for update, it will lock rows,
but, if I understand correctly, the change may not be instantaneous and
atomic, so I might get transaction to roll back and then there is error
handling that will lead to the uglies serialization I can think of. Let me
clarify this, so somebody can tell me if I got it wrong.

Imagine Queue table with 20 rows, ID: 1,...,20, status="new". Imagine 2
processes/threads (P1, P2) attempting to get 10 jobs each.How to do that?

P1: UPDATE job_queue SET process_id=$1, status="in process" WHERE id IN (
    SELECT id FROM job_queue WHERE status="new" and id IN (   
        SELECT id FROM job_queue WHERE status="new" ORDER BY id LIMIT 10
FOR UPDATE)
    )
)
P2: the same
P1: SELECT * FROM job_queue WHERE process_id=$1 ....
P2: SELECT * FROM job_queue WHERE process_id=$1 ....

The reason for the 2 selects is that if 2 or more processes content for
the same set of jobs, the first one will set the status. The second will,
after P1 has released the rows get those rows, that are already taken. Of
course, this will most likely return 0 rows for P2, since all 10 will be
taken. If I leave out the LIMIT 10 in the inner select, I am effectively
locking the entire table. Is that the way to go?

LOCK TABLE job_queue EXCLUSIVE;
UPDATE ...
UNLOCK TABLE job_queue;

Nix.



      
--0-152070369-1213094451=:45349
Content-Type: text/html; charset=us-ascii

<html><head><style type="text/css"><!-- DIV {margin:0px;}
--></style></head><body><div style="font-family:times new roman, new york,
times, serif;font-size:12pt"><div>Hi all.<br><br>This may be trivial, but I
cannot find good references for it. The problem is this:<br><br>Suppose we
have one table in PgSQL which is a job queue, each row represents one job
with several status flags, IDs,... Several processes will attempt to
access the queue and "take" their batch of jobs, the batch will have some
parameterizable size. So, the simple idea is "select N lowest IDs that do
not have a flag &lt;in process&gt; set and set the flag", "then proceed
with whatever it is that should be done".<br><br>Trouble is, with MVCC I
don't see a way to prevent overlapping and race conditions. Oh, sure, if I
issue select for update, it will lock rows, but, if I understand correctly,
the change may not be instantaneous and atomic, so I might get transaction
to roll back and then
 there is error handling that will lead to the uglies serialization I can
think of. Let me clarify this, so somebody can tell me if I got it
wrong.<br><br>Imagine Queue table with 20 rows, ID: 1,...,20,
status="new". Imagine 2 processes/threads (P1, P2) attempting to get 10
jobs each.How to do that?<br><br>P1: UPDATE job_queue SET process_id=$1,
status="in process" WHERE id IN (<br>&nbsp;&nbsp;&nbsp; SELECT id FROM
job_queue WHERE status="new" and id IN ( &nbsp;  <br>&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;&nbsp; SELECT id FROM job_queue WHERE status="new" ORDER BY id
LIMIT 10 FOR UPDATE)<br>&nbsp;&nbsp;&nbsp; )<br>)<br>P2: the same<br>P1:
SELECT * FROM job_queue WHERE process_id=$1 ....<br>P2: SELECT * FROM
job_queue WHERE process_id=$1 ....<br><br>The reason for the 2 selects is
that if 2 or more processes content for the same set of jobs, the first
one will set the status. The second will, after P1 has released the rows
get those rows, that are already taken.
 Of course, this will most likely return 0 rows for P2, since all 10 will
be taken. If I leave out the LIMIT 10 in the inner select, I am
effectively locking the entire table. Is that the way to go?<br><br>LOCK
TABLE job_queue EXCLUSIVE;<br>UPDATE ...<br>UNLOCK TABLE
job_queue;<br><br>Nix.<br></div></div><br>



      </body></html>
--0-152070369-1213094451=:45349--
 




 4 Posts in Topic:
Multithreaded queue in PgSQL
alokin1@[EMAIL PROTECTED]  2008-06-10 03:40:51 
Re: Multithreaded queue in PgSQL
dfontaine@[EMAIL PROTECTE  2008-06-10 12:54:54 
Re: Multithreaded queue in PgSQL
laserlist@[EMAIL PROTECTE  2008-06-10 22:19:12 
Re: Multithreaded queue in PgSQL
scott.marlowe@[EMAIL PROT  2008-06-10 08:57:37 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sat Nov 22 15:55:35 CST 2008.