Marshall Spight wrote:
> Bob Badour wrote:
>
>>Marshall Spight wrote:
>>
>>
>>>If we have a requirement to distinguish between
>>>exists-but-we-don't-know
>>>and doesn't-exist, then the cardinality-0 model is insufficient. But
>>>there
>>>are also cases where we won't need to so distinguish, and cases
>>>where doesn't-exist is not a possibility. In those cases, I would
>>>propose that cardinality-0 is often a better choice than SQL's NULL.
>>
>>Yes, well, you are not putting the bar very high. Now, are you? Besting
>>SQL's NULL for handling missing information is rather like tripping over
>>a shoelace.
>>
>>The closed world assumption is going to bite you more often than you
>>seem to suppose. Presumably, if you have a relation valued attribute,
>>you intend to use the relational algebra or calculus on it. However, the
>>closed world assumption is central to both which means they will yield
>>incorrect results in many cases unless the user takes extraordinary
>>measures to account for the not-quite-closed nature of the model's
world.
>>
>>One must consider those consequences when considering relation valued
>>attributes as a means to describe missing information.
>
> I do not disagree, but I'm not sure if I'm seeing all the consequences
> you refer to. I agree that how the algebra behaves is im****tant.
>
> Lame example:
> Given a Persons relation with an Age attribute. (I know it is better
> to have a Birthdate attribute, but for this artifical example, this
> will
> suffice.) The Age attribute is an RVA of a single int attribute with
> an empty key (that is, it is either an int or empty, indicating we
> don't
> know the age.) If we wanted to calculate the average age of
> Persons, and we took sum(Age) / count(Age), (waving my hands
> around the nest/unnest issue), then it would seem to me that
> what you would most likely want to know is, of the people for whom
> the age is known, what is the average, which is exactly what
> I'd expect the RVA-Age schema to give you.
Your example works exactly as NULL does, and I consider the result
incorrect. The average age of the entire cohort is unknown, and it is
misleading to pretend otherwise.
Consider the situation where one also has a (possibly unknown) Weight
attribute. If one queries for average Age, one gets a numeric answer. If
one queries for average Weight, one gets a numeric answer. If one
queries for average Age and average Weight, one gets two numeric answers
that may not equal the answers above.
This sort of thing has been a source of many errors I have been asked to
correct. Basically, it breaks the identity: SUM(A) + SUM(B) = SUM(A+B)
Some folks get really antsy when the numbers don't balance.
>>>I don't think it's a complete solution, though. I think some kind of
>>>sum type, as in SML, is also quite desirable. This lets the
>>>modeller create special values according to the requirements
>>>of the domain.
>>
>>I could not find a comprehensive enough reference for SML online to
>>decypher what you are saying above. Is a sum type similar to a union
>>type of some sort?
>
> Yes; specifically a tagged union, unlike the untagged C union.
> Here's a decent article:
>
> http://en.wikipedia.org/wiki/Tagged_union
Yes, I agree. That would be very useful.


|