Hi. I'm a complete novice in the area of text processing and am hoping
that someone can point me in a good direction to get started.
Here is the problem: I work in consumer products marketing. I have a
database of over 100,000 products. Each record is for the initial
introduction of a product into the market and it provides some basic
overview information. While some of the information is arranged in
separate fields (date of intro, manufacturer, etc.) most of the valuable
information for our purpose is contained in a free form description
field.
I am hoping to do some cluster analysis or even some cladistics on the
data, but it seems like I need to pull the relevant text information out
of the description field and put it in some group of individual fields
I don't know where to start to process this data so it comes out more
structured as input to other uses.
Can anyone give me some advice?
Many thanks.
Ed
NewProductWorks