On Jun 30, 7:33 am, "David Cressey" <cresse...@[EMAIL PROTECTED]
> wrote:
> "Ed Prochak" <edproc...@[EMAIL PROTECTED]
> wrote in message
>
> news:4b1507c9-207e-4555-bd27-42dd0e84de9d@[EMAIL PROTECTED]
>
>
>
> > On Jun 26, 12:31 pm, "David Cressey" <cresse...@[EMAIL PROTECTED]
> wrote:
[]
> > > I can't tell you the number of times programmers have come to me for
> help
> > > with a query
> > > that produces wrong results, and when I try to change their
"select" to
> > > "select distinct" to see if maybe that's the problem, they exclaim
"Oh
> I
> > > never use 'select distinct'! It runs too slow!"
> > > I patiently explained to them that I first want to come up with a
query
> that
> > > is logically correct, then if necessary, come up with one that's
> logically
> > > equivalent, but runs fast.
>
> > When I see
> > SELECT DISTINCT
> > in anything but an ad hoc query, I know something is wrong with the
> > query. Possibly it is joining to a wrong table (to a detail table
> > instead of a header/parent table for example) or some filtering
> > condition is missed (a column not in the select list needs to be
> > used), or some join condition was missed (there is a compound key and
> > one component was left out). Even in an ad hoc query I tend to use
> > SELECT COUNT(*)
> > since that gives me a little bit more information for the same cost as
> > DISTINCT (i.e. it reads the same amount of data).
>
> > Seeing this post, I'm actually a little disappointed in you, David.
> > Your last sentence does show you mainly follow a good approach. It's
> > just that DISTINCT should be left to ad hoc queries IMO.
>
> > Have a good day.
> > Ed
>
> Your comment doesn't agree with my experience. In general, there are
two
> cir***stances where SELECT DISTINCT is the right approach. Both involve
> situations where SELECT generates a bag, but the desired result is a
set.
>
> The first is a situation where the database has been misdesigned. If
the
> database has been frozen for a year or more, and there is a lot of
> production software that depends on the existing table design, it may
> simply be unfeasable to correct the design in order to make one query
> logically easy to write.
Okay, you have a case where the design error is not in the query. We
agree there is still something wrong here.
>
> The second is a situation where the query requires a subset of the data
that
> does not include any candidate keys from the underlying tables. In
that
> case, the generation of a query requires its own mechanism for
eliminating
> duplicates.
I'd like to see an example of this case.
Then the question would be: is this case a rare exception?
I think it is very rare, since honestly I haven't seen it.
You seem to be saying it is not so rare (especially in your other
post). If so, an example or two should be easy to show.
Maybe I am making some queries more complicated than they need to be.
Please show me the error of my ways.
>
> Perhaps the development of a new program, a long time after the database
is
> in production, and not easily altered, has much the same flavor of what
you
> have called an "ad hoc query". Using the database to organize data in a
way
> that the designer did not contemplate.
I am thinking of ad hoc as in a throwaway query. One you might do to
research an issue in the system. Occasionally it might be a business
request, but most are developer queries as part of new development
(especially when you just started a new job and there is little or no
do***entation). And I would say it includes organizing data in ways
the designer did not anticipate. Whether these new ways are useful or
not is yet to be determined. So I guess, yes, in a few cases these
queries might work their way into a new program.
Ed


|