On May 8, 12:23 am, rock_san...@[EMAIL PROTECTED]
wrote:
> > From the last one, I assume no updates between loads, only deletes and
> > inserts.
>
> Yes, there are no updates. There are only insertions and deletions,
> running all the time.
>
> Good idea, own implementation of a DBMS with files.
>
> One table?
> Would it still require 900 indexes, if there would be 900 columns. I
> would just need to create the indexes by my self.
>
> - -
> Items arrive to several groups at the same time and all the time.
>
> Even if there would be own table/file per one group, there would be
> 900 columns per Item still.
>
> There is also table partitioning in some RDBMs. Deletion would be very
> fast, if table (table 2) would be partitioned (or if there would be
> separate table per each group).
> A partitioned table is quite the same as having separate tables per
> each group.
> Group ID would be needed to be added to the table 2 regarding how
> partitioning of the table would be best to be implemented.
>
> But I am consentrating on the queries and having an index on all of
> the columns in a good way, at least from performance of insertions and
> performance of queries points of views.
> - -
>
> One thought is in my mind too...
>
> If e.g. Oracle or some other RDBMS has not been able to offer a better
> solution(or do those offer?), would I be able to implement a better
> one?
>
> I have not implemented indexes etc. by myself, so I probably just do
> not know enough about this kind of implementation of my "own DBMS". I
> would need to investigate this issue more.
>
> Some practical examples presented in theory would be needed, if
> someone is able to implement this kind of mechanism / system.
>
> - -
>
> Reasons to "stick" with an official RDBMS (MySQL, Oracle,
> Postgre, ...):
> - Indexes
> - Array insert (Oracle, others?)
> - Sup****t to use the DB with threads or by several process at the same
> time (mutexes)
> - Compression of indexes (and maybe compression of tables, in some
> version of some RDBMS)
> - My own implementation of a "DB system" with flat data files might
> probably get more easily corrupted.
> An already existing RDBMS might be more robust.
> - Need to access the DB with different programming languages from
> tier-2 (queries) and tier-3 (insertions).
> - Partitioned tables (in some RDBMSs)
> - Some other reasons? ( An SQL interface was already mentioned. It
> might be quite im****tant too. )
>
> All ideas are welcome.
> I need to go through all of those.
>
> > Why don't you tell us what problem you were trying to solve
>
> I can only tell that Items arrive and values from those are
> monitored and analyzed.
>
> - -
>
> > What leads you to require so many columns?
>
> It is the situation. An Item which arrives has all of those columns.
> It cannot be affected.
>
> > Why is an index required on all
>
> columns?
>
> A value is wanted to be searched from one column (inside one group) at
> a time. Queries would be too slow, if there would not be an index on
> those columns.
> It seems to be the situation. And these queries are the issue which I
> try to solve. Index would be needed or some other solution(?) to have
> good performance with such queries.
>
> > Does each column specify exactly one attribute?
>
> Yes.
> Each column is independent in an Item.
> All columns still belong to the Item.
>
> > Do all the attributes
>
> describe the same "thing"
>
> I am not sure what is exactly being asked.
> All columns (or attributes) describe the Item.
> Columns are attributes of an Item.
>
> > How do you plan on dealing with situations where there is a row,
> > and a
> > column, but no value that ought to be stored in
> > the intersetion of the
> > given row and the given column?
>
> I am sorry, you would need to give an example.
> I was not able to understand exactly what the situation would be.
I was trying to get an example from you. Let me be very distinct in my
question:
Are all 900 columns identical? meaning:
same data type
(the columns are all CHAR, NUMBER, BLOB, other???)
same domain of data
(the column are all character, representing DNA sequences)
same semantics
(the columns are all CHAR, representing Colors, which are a coloring
of a map)
We are trying to understand YOUR situation.
>
> If there is 900 columns per one Item. Then all of those columns would
> have a value.
The same value?
> Next Item could have only 100 columns (or attributes) and then only
> those 100 columns would have a value.
Aha! Now a new clue. If there can be variations, are the attributes
really groups of attributes? These groups could be normalized in
different tables.
>
> So, there are:
> - Groups
> - Items, which belong to groups
> - attributes which belong to the Items
> Then each Item has also a type. One Item can have e.g. 900 attributes
> or 100 attributes depending on the type.
Aha
If they are different types, your table is really denormalized. You
could split them on load into tables according to their type.
I still have to question the need for being able to query on any
column. Basically what you are saying is any column could be to only
one in the condition (WHERE) clause of a query AND that there is
enough data in the table to make a full table scan too slow. I still
am having trouble believing that.
I think you should consider normalizing the data.then even if the
query requirement is true, the number of rows in the 900 column table
will be reduced possibly to the point where the 900 indices are not
absolutely necessary.
This discussion in the abstract really is less productive than
necessary. Please consider giving some examples of your real problem.
Obscure the real problem domain if you can (if this is proprietary or
otherwise secret), but I really see little progress without knowing
what the data looks like.
But if I must, I'll give one final general comment about this:
Your problem is finding the level of optimization needed (do you need
indices or not) to achieve a given performance level. Generally, the
way to find out performance improvements is to test. Knowing you data
and database features helps, but actually running tests is the way to
confirm performance meets the goal.
Ed


|