On Thu, 26 Oct 2006 21:49:11 -0500 in alt.comp.databases, Ed Katzman
<none@[EMAIL PROTECTED]
> wrote:
>Hi. I'm a complete novice in the area of text processing and am hoping
>that someone can point me in a good direction to get started.
>
>Here is the problem: I work in consumer products marketing. I have a
>database of over 100,000 products. Each record is for the initial
>introduction of a product into the market and it provides some basic
>overview information. While some of the information is arranged in
>separate fields (date of intro, manufacturer, etc.) most of the valuable
>information for our purpose is contained in a free form description
>field.
>
>I am hoping to do some cluster analysis or even some cladistics on the
>data, but it seems like I need to pull the relevant text information out
>of the description field and put it in some group of individual fields
>
>I don't know where to start to process this data so it comes out more
>structured as input to other uses.
>
>Can anyone give me some advice?
Most major database vendors have text search addons if you can afford
them.
If you can't, generate an xref of the products with the words.
Look at counts of words per product and overall.
Eliminate frequently occurring "noise" words.
Then look at correlations between product attributes and words,
concentrating initially on the most and least frequently occurring
words.
That may give you some ideas on where to go next.
--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada
Brian.Inglis@[EMAIL PROTECTED]
(Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
fake address use address above to reply


|