Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Patches > Re: [GENERAL] t...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 1 Topic 3445 of 4253
Post > Topic >>

Re: [GENERAL] ts_headline

by bruce@[EMAIL PROTECTED] (Bruce Momjian) Mar 3, 2008 at 10:19 PM

--ELM1204600793-3329-0_
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="US-ASCII"


I have applied the attached do***entation patch to show ts_headline()
using a configuration name.

---------------------------------------------------------------------------

Oleg Bartunov wrote:
> On Sat, 23 Feb 2008, Stephen Davies wrote:
> 
> > As it turns out, all I needed was in the doco but the key element -
the first
> > config arg to ts_headline - was not in any of the examples so I missed
it.
> 
> aha, Original one were based on default 
> configuration, but then concept was changed, but the examples were not
> modified.
> 
> >
> > Would it be possible for ts_headline to work with the pre-parsed
ts_vector?
> 
> it's impossible, Richard already explained you the reasons.
> 
> >
> > I see references to future plans for phrase searching in ts. Is there
a date
> > for this?
> 
> Not yet. The problem mostly algebraical :) Simple 'exact search' is
doable, but
> we need something more, since we sup****t boolean operators, 
> pluggable dictionaries (which could produce several lexemes, for
example),
> and do***ent structure (lexem weights). So, we need to define consistent
> algebra for text, to have predictable results. This is quite a complex
task,
> which require a lot of dedicated time, which we don't have.
> 
> >
> > Cheers and thanks,
> > Stephen
> > Davies
> >
> >
> > On Friday 22 February 2008 22:54, Oleg Bartunov wrote:
> >> On Fri, 22 Feb 2008, Stephen Davies wrote:
> >>> Hmmmm!
> >>> I think I now understand the ts position better, thank you.
> >>>
> >>> Part of my problem has been that I am used to the functionality of
Open
> >>> Text's LCS (aka BASIS) product which handles text differently.
> >>>
> >>> It includes the position (and context) information in the index and
does
> >>> "remember" how the text was parsed so does not need to reparse to
insert
> >>> hit navigation tags nor need pointers as to how to parse queries.
(It
> >>> also sup****ts phrase searching.)
> >>>
> >>> Now that I have a better understanding of ts, I think I will be able
to
> >>> make it do at least most of what I hoped for.
> >>
> >> I'm wondering if it was not described in the text search
do***entation :)
> >>
> >>> Thank you again for your help with this.
> >>>
> >>> Cheers,
> >>> Stephen Davies
> >>>
> >>> On Friday 22 February 2008 20:45, Richard Huxton wrote:
> >>>> Stephen Davies wrote:
> >>>>> Unfortunately, my link to the box with the test database is down
due to
> >>>>> lack of maintenance by our local telco (Telstra) but I think that
I
> >>>>> also missed the optional config arg to ts_headline.
> >>>>>
> >>>>> The lack of link also means that I cannot confirm your findings
but
> >>>>> your logic looks good.
> >>>>
> >>>> Looks like ALTER DATABASE SET default_text_config='english' is what
you
> >>>> need.
> >>>>
> >>>>> It begs the question, however, as to why ts-headline needs to
reparse
> >>>>> the raw text.
> >>>>
> >>>> It needs to line up tsvector lexemes with actual characters in the
text.
> >>>> The tsvector is missing punctuation, any stopwords (the, it, a) as
well
> >>>> as being stemmed (if your dictionary does that).
> >>>>
> >>>> Also, it's looking for a short span of words that provide the best
> >>>> match. That might not be a complete match of course, and is
different to
> >>>> how you'd normally look to use a tsvector.
> >>>>
> >>>>> At least in my case, I am using a trigger to parse the combination
of
> >>>>> Title and Abstract to a ts_vector field in the table row (as
suggested
> >>>>> in 12.2.2 and 12.4.3 in the doco) so that the ts_vector is already
> >>>>> available to ts_headline.
> >>>>>
> >>>>> If ts_headline had the ability to use that pre-parsed ts_vector,
my
> >>>>> problem would never have arisen - and the performance of
ts_headline
> >>>>> would be improved.
> >>>>
> >>>> Maybe. It would still have to parse the text to some degree though,
just
> >>>> to get the original words & punctuation into the headline.
> >>
> >>  	Regards,
> >>  		Oleg
> >> _____________________________________________________________
> >> Oleg Bartunov, Research Scientist, Head of AstroNet
(www.astronet.ru),
> >> Sternberg Astronomical Institute, Moscow University, Russia
> >> Internet: oleg@[EMAIL PROTECTED]
 http://www.sai.msu.su/~megera/
> >> phone: +007(495)939-16-83, +007(495)939-23-83
> >
> >
> 
>  	Regards,
>  		Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@[EMAIL PROTECTED]
 http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian  <bruce@[EMAIL PROTECTED]
>        http://momjian.us
  EnterpriseDB                            
http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

--ELM1204600793-3329-0_
Content-Transfer-Encoding: 7bit
Content-Type: text/x-diff
Content-Disposition: inline; filename="/rtmp/diff"

Index: doc/src/sgml/textsearch.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/textsearch.sgml,v
retrieving revision 1.40
diff -c -c -r1.40 textsearch.sgml
*** doc/src/sgml/textsearch.sgml	13 Dec 2007 06:32:47 -0000	1.40
--- doc/src/sgml/textsearch.sgml	4 Mar 2008 02:55:17 -0000
***************
*** 1102,1108 ****
      For example:
  
  <programlisting>
! SELECT ts_headline('The most common type of search
  is to find all do***ents containing given query terms 
  and return them in order of their similarity to the
  query.', to_tsquery('query &amp; similarity'));
--- 1102,1108 ----
      For example:
  
  <programlisting>
! SELECT ts_headline('english', 'The most common type of search
  is to find all do***ents containing given query terms 
  and return them in order of their similarity to the
  query.', to_tsquery('query &amp; similarity'));
***************
*** 1112,1118 ****
   and return them in order of their &lt;b&gt;similarity&lt;/b&gt; to the
   &lt;b&gt;query&lt;/b&gt;.
  
! SELECT ts_headline('The most common type of search
  is to find all do***ents containing given query terms
  and return them in order of their similarity to the
  query.',
--- 1112,1118 ----
   and return them in order of their &lt;b&gt;similarity&lt;/b&gt; to the
   &lt;b&gt;query&lt;/b&gt;.
  
! SELECT ts_headline('english', 'The most common type of search
  is to find all do***ents containing given query terms
  and return them in order of their similarity to the
  query.',

--ELM1204600793-3329-0_
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0


--
Sent via pgsql-patches mailing list (pgsql-patches@[EMAIL PROTECTED]
)
To make changes to your Subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-patches

--ELM1204600793-3329-0_--
 




 1 Posts in Topic:
Re: [GENERAL] ts_headline
bruce@[EMAIL PROTECTED]   2008-03-03 22:19:53 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri Dec 5 5:54:51 CST 2008.