Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Performance > Re: Creating la...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 6 of 6 Topic 3981 of 4352
Post > Topic >>

Re: Creating large database of MD5 hash values

by fweimer@[EMAIL PROTECTED] (Florian Weimer) Apr 11, 2008 at 07:04 PM

* Jon Stewart:

>>  BYTEA is slower to load and a bit inconvenient to use from DBI, but
>>  occupies less space on disk than TEXT or VARCHAR in hex form (17 vs 33
>>  bytes with PostgreSQL 8.3).

> Can you clarify the "slower to load" point? Where is that pain point
> in the postgres architecture?

COPY FROM needs to read 2.5 bytes on average, instead 2, and a complex
form of double-decoding is necessary.

> Storing the values in binary makes intuitive sense to me since the
> data is twice as dense, thus getting you more bang for the buck on
> comparisons, caching, and streaming reads. I'm not too concerned about
> raw convenience, as there's not going to be a lot of code around my
> application.

The main issue is that you can't use the parameter-providing version
of $sth->execute (or things like $sth->selectarray, $sth->do), you
must use explicit binding by parameter index in order to specify the
type information.

> The idea is that you have named sets of hash values, and hash values
> can be in multiple sets.

The ID step is only going to help you if your sets are very large and
you use certain types of joins, I think.  So it's better to
denormalize in this case (if that's what you were alluding to in your
original post).

> The big operations will be to calculate the unions, intersections, and
> differences between sets. That is, I'll load a new set into the
> database and then see whether it has anything in common with another
> set (probably throw the results into a temp table and then dump it
> out).

In this case, PostgreSQL's in-memory bitmap indices should give you
most of the effect of your hash <-> ID mapping anyway.

> I will also periodically run queries to determine the size of
> the intersection of two sets for all pairs of sets (in order to
> generate some nice graphs).

I think it's very difficult to compute that efficiently, but I haven't
thought much about it.  This type of query might benefit from your
hash <-> ID mapping, however, because the working set is smaller.

--=20
Florian Weimer                <fweimer@[EMAIL PROTECTED]
>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstra=DFe 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

--=20
Sent via pgsql-performance mailing list (pgsql-performance@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
 




 6 Posts in Topic:
Creating large database of MD5 hash values
jonathan.l.stewart@[EMAIL  2008-04-10 10:48:59 
Re: Creating large database of MD5 hash values
dmagick@[EMAIL PROTECTED]  2008-04-11 15:11:34 
Re: Creating large database of MD5 hash values
fweimer@[EMAIL PROTECTED]  2008-04-11 16:05:00 
Re: Creating large database of MD5 hash values
alvherre@[EMAIL PROTECTED  2008-04-11 10:25:44 
Re: Creating large database of MD5 hash values
jonathan.l.stewart@[EMAIL  2008-04-11 11:28:55 
Re: Creating large database of MD5 hash values
fweimer@[EMAIL PROTECTED]  2008-04-11 19:04:00 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sun Oct 12 9:33:13 CDT 2008.