Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Databases General > Re: Hundreds of...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 6 of 16 Topic 3157 of 3283
Post > Topic >>

Re: Hundreds of columns, index on all of those.

by rock_sand81@[EMAIL PROTECTED] May 6, 2008 at 03:25 AM

Hi!

This is not a data warehouse. I think so: Old data is deleted quite
quickly. New data arrives and all of the old data is deleted. There is
e.g. almost no reason to make backups of the DB.
One item still consists of e.g. 900 columns.


About the indexes.
Actually there would probably be same amount of indexes (entries in an
index) with all (first and third) table structures.
Each field would be have an entry in an index in both cases.

From now on I am talking about table structure 3. (Two tables, all 900
columns are on separate rows in table #2.)

The size of the DB would increase quite a lot, because each field /
column ID would need Item ID.
So if there is 900 columns per one Item, there would also be needed
900 Item IDs per one Item, if all columns are on separate rows in
table #2.
There is required to be an index on Item ID column in Table 2.
So there would be 900 Item IDs per one Item in the table and also 900
Item IDs per one Item in the index too.
1800 Item IDs would be needed to be stored in total, for handling
table 2.

I assume there would be needed 2 indexes for table 2:

  one index for only Item ID column in table 2
  and one index for the pair(value, column ID) in table 2 (Item ID
would not need to be in this index.).

Size of an Item ID (number) would be really several bytes. I think
small numbers cannot be used. Item IDs can take huge amounts space,
really easily a lot of GBs.

Amount of 900 columns might be more rare than e.g. amount of 100
columns per one Item. More general would be the case of 100 columns
per one Item. It would still consume a lot of GBs in every case:
- the table 2 (xx GBs)
- the index for table 2, which is on Item ID column, (xx GBs)

It consumes quite a lot of space. I have tried the above where is only
one index on Item ID column in table 2. Index was not created on
(column ID, value) pair yet! Only this existed:

Table 2:
  Item ID, column ID, value
  Only one index: on (Item ID)

No other indexes were on Table 2.
And it consumed a lot of disk space.

- - - - - - -
A little more information about the items.

There are millions of items. So, there are millions of Item IDs.

Those items / Item IDs belong to groups too.
There are only few hundred different groups.

If I search for some values from one of the 900 columns, I do not
search from all of the groups. Search is done "inside" only one group.
The query is made for Items which belong to one specific group. Values
are not searched from several groups at the same time. Values are
searched from only one group at time.

So, there is a group ID too. And all Items belong to some group ID.
There can be e.g. hundreds of thousands of Items in one group.

> Also: are you positively sure you need to retrieve
> data by any of those indexes?

Sadly, it seems so. I would like to avoid it.


> What's the cardinality (# of different values)
> of each of these columns, for instance?

There can be quite a lot of different values. Few characters or
numbers. I would say the data range is bigger than 900 (the amount of
"columns"). Might be quite a lot bigger even, if there can be e.g.
decimal values too, I am not 100% sure yet.



I may need to think should I even add group ID to the table 2 too. (?)
I think about it, because index on (value, column ID) might give quite
a lot of Item IDs, which belong to several group IDs, but I do not
know would that be slow.
Just to repeat this: The query is always made on the specific group,
so items, which belong to other groups can be ignored.
There is also an index on Item ID column on table 1.

Group ID can exist in Table 1 in every case. Table 1 does not consume
disk space much:

Table 1:
Item ID, Item name, group ID
Index 1.1 on (Item ID)
Index 1.2 on (group ID) (or something like this)

Table 2:
Item ID, column ID, value
Index 2.1 on (Item ID)
Index 2.2 on (value, column ID).

Table 2 consumes a lot of space (!). Table 1 is not a problem.

Would it be wise (or stupid) to add the group ID also to Table 2 to
make the query fast?
Or would the query be fast enough without group ID (in Table 2)
already?

Would it be needed to add group ID to table 2 to make the performance
of the query good. Group ID would be added to the index 2.2 too:

Table 2:
Item ID, column ID, value, group ID
Index 2.1 on (Item ID)
Index 2.2 on (value, column ID, group ID).


> hence I can say: go for it.
> Trade your columns with rows and you'll be much happier.

That might be the only possiblity.


Thank you for all of the comments!
 




 16 Posts in Topic:
Hundreds of columns, index on all of those.
rock_sand81@[EMAIL PROTEC  2008-05-06 00:05:09 
Re: Hundreds of columns, index on all of those.
Jeff North <jnorth@[EM  2008-05-06 07:54:21 
Re: Hundreds of columns, index on all of those.
Marco Mariani <marco@[  2008-05-06 10:05:27 
Re: Hundreds of columns, index on all of those.
Thomas Kellerer <YQDHX  2008-05-06 15:15:31 
Re: Hundreds of columns, index on all of those.
Marco Mariani <marco@[  2008-05-06 10:09:02 
Re: Hundreds of columns, index on all of those.
rock_sand81@[EMAIL PROTEC  2008-05-06 03:25:50 
Re: Hundreds of columns, index on all of those.
Marco Mariani <marco@[  2008-05-06 13:08:06 
Re: Hundreds of columns, index on all of those.
rock_sand81@[EMAIL PROTEC  2008-05-06 04:41:23 
Re: Hundreds of columns, index on all of those.
Marco Mariani <marco@[  2008-05-06 15:06:57 
Re: Hundreds of columns, index on all of those.
Ed Prochak <edprochak@  2008-05-06 05:45:54 
Re: Hundreds of columns, index on all of those.
--CELKO-- <jcelko212@[  2008-05-07 09:51:29 
Re: Hundreds of columns, index on all of those.
"David Cressey"  2008-05-07 19:26:47 
Re: Hundreds of columns, index on all of those.
rock_sand81@[EMAIL PROTEC  2008-05-07 22:23:56 
Re: Hundreds of columns, index on all of those.
rock_sand81@[EMAIL PROTEC  2008-05-07 23:34:52 
Re: Hundreds of columns, index on all of those.
Ed Prochak <edprochak@  2008-05-08 05:29:54 
Re: Hundreds of columns, index on all of those.
rock_sand81@[EMAIL PROTEC  2008-05-09 04:04:06 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sat Nov 22 15:23:38 CST 2008.