Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Sql > Having trouble ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 1 Topic 3525 of 3717
Post > Topic >>

Having trouble designing my database

by "Ken Camann" <kjcamann@[EMAIL PROTECTED] > Jun 12, 2008 at 05:15 AM

Hello everyone,

I am struggling with the design of a database for a scientific
application. 
I've been running around in circles trying to figure this out for weeks
and 
I really need help with it.  I am working on a project with comparitively 
little resources and I need to start making progress soon, so thanks in 
advance if you manage to read even half of this (I know its long!).

My frustration stems from a combination of the problem being hard and me 
being new to SQL and not having a lot of experience with very large 
databases.  It has very different usage characteristics compared to normal

databases.  For example, I consider the following im****tant:

-"Doing it right the first time", because the database will be around for
a 
long time
-Design that relies as much as possible on SQL standards
-I have to be able to afford it (i.e., it has to be free)...this and the 
previous item led me to PostgreSQL
-Referential integrity (foreign keys)
-The ability to store at the very least several hundred million, but 
possibly up to several hundred billion (!) rows in a table

That last requirement may seem pretty ridiculous, but consider that

-Despite the huge number of rows, I'll never SELECT more than a few
million 
at time
-There are usually between 3-6 fixed format columns in most tables...It's 
scientific data, mostly numbers.  So it's not as big as it sounds, and it 
probably won't seem that large in 5-10 years, which might be how long it 
takes to get that big..for the first few years it will be 100 gigabytes at

the most
-I don't care about performance in the same way most people using
databases 
do.  I'll never have a huge number of users accessing it, and I won't be 
concerned with metrics like X queries per second (there will probably be
at 
the most a few queries per minute...and probably not even that many, with 
the occasional update, insert, or delete).  My definition of performance
is 
the "the query takes 2 seconds" instead of 2 or 20 minutes.  If it takes 2

milliseconds great, but I don't really care.
-Here's the interesting part: not only is most of the data naturally 
partitionable, it's "so natural" that the full table is never actually 
needed, just the partitions.  That is, there is no query that will ever be

performed on a union of 2 or more partitions. Ever.  The only reason to
even 
use a database is referential integrity.  I have to force it to be one big

table, because the foreign key reference needs to know about a single 
logical name.  But really, all it amounts to is a big, well-organized 
filesystem with lots of meta-data to help me find things.

It seems like by that last point, I can break up each 2-3 million row 
partition into a table and not even bother with partitioning.  Except if I

do that, I'll have hundreds of tables with automatically generated names
and 
hundreds of odd foreign keys everywhere.  In addition to being "ugly",
this 
seems like it would be hard to administer and maintain.

In PostgreSQL it makes no difference anyway, since all partitions are 
explicitly visible in the schema either way, and you have to directly
manage 
them either way.  In MySQL or Oracle (AFAIK), the physical partitions are 
hidden from you, managed automatically, and appear as one nice logical 
partition.  Unfortunately, MySQL partitions are limited to 1024 tables and

they totally remove the foreign key system (so I guess they removed the 
thing that seems like it would actually make the implementation hard). 
Oracle probably does not have these limitations, but costs arbitrarily
large 
amounts of money.

Anyway, should I even be using an RDBMS for this?  Surely other people
have 
had to do things like before.  Is this what people mean by "data
warehouse" 
(the definitions I've heard seem so vauge I honestly don't know what it 
means).  How would you guys implement something like this?

Thanks for reading!
Ken
 




 1 Posts in Topic:
Having trouble designing my database
"Ken Camann" &l  2008-06-12 05:15:56 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Mon Oct 13 4:19:53 CDT 2008.