Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Hackers > Re: gsoc, text ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 12 of 28 Topic 9625 of 11009
Post > Topic >>

Re: gsoc, text search selectivity and dllist enhancments

by tgl@[EMAIL PROTECTED] (Tom Lane) Jul 10, 2008 at 05:02 PM

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <j.urbanski@[EMAIL PROTECTED]
> writes:
> Still, there's a decision to be made: after how many lexemes should the 
> pruning occur?

The way I think it ought to work is that the number of lexemes stored in
the final pg_statistic entry is statistics_target times a constant
(perhaps 100).  I don't like having it vary depending on tsvector width
--- why for example should a column having a few wide tsvectors get a
bigger stats entry than one with many narrow ones?  (Not to mention the
issue of having to estimate the average or max width before you can
start the counting run.)

But in any case, given a target number of lexemes to ac***ulate,
I'd suggest pruning with that number as the bucket width (pruning
distance).   Or perhaps use some multiple of the target number, but
the number itself seems about right.  The LC paper says that the
bucket width w is equal to ceil(1/e) where e is the maximum frequency
estimation error, and that the maximum number of table entries needed
is log(eN)/e after N lexemes have been scanned.  For the values of e
and N we are going to be dealing with, this is likely to work out to
a few times 1/e, in other words the table size is a few times w.
(They prove it's at most 7w given reasonable assumptions about data
distribution, regardless of how big N gets; though I think our values
for N aren't large enough for that to matter.)

The existing compute_minimal_stats code uses a table size of twice the
target number of values, so setting w to maybe a half or a third of the
target number would reproduce the current space usage.  I don't see a
problem with letting it get a little bigger though, especially since we
can expect that the lexemes aren't very long.  (compute_minimal_stats
can't assume that for arbitrary data types...)

			regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
 




 28 Posts in Topic:
Re: gsoc, text search selectivity and dllist enhancments
tgl@[EMAIL PROTECTED] (T  2008-07-03 15:25:44 
Re: gsoc, text search selectivity and dllist enhancments
heikki@[EMAIL PROTECTED]   2008-07-04 10:32:32 
Re: gsoc, text search selectivity and dllist enhancments
tgl@[EMAIL PROTECTED] (T  2008-07-04 11:53:56 
Re: gsoc, text search selectivity and dllist enhancments
heikki@[EMAIL PROTECTED]   2008-07-04 22:20:08 
Re: gsoc, text search selectivity and dllist enhancments
j.urbanski@[EMAIL PROTECT  2008-07-06 11:43:20 
Re: gsoc, text search selectivity and dllist enhancments
tgl@[EMAIL PROTECTED] (T  2008-07-07 11:58:45 
Re: gsoc, text search selectivity and dllist enhancments
j.urbanski@[EMAIL PROTECT  2008-07-07 23:53:48 
Re: gsoc, text search selectivity and dllist enhancments
j.urbanski@[EMAIL PROTECT  2008-07-09 00:33:48 
Re: gsoc, text search selectivity and dllist enhancments
tgl@[EMAIL PROTECTED] (T  2008-07-08 18:43:19 
Re: gsoc, text search selectivity and dllist enhancments
alvherre@[EMAIL PROTECTED  2008-07-10 16:27:31 
Re: gsoc, text search selectivity and dllist enhancments
j.urbanski@[EMAIL PROTECT  2008-07-10 22:32:26 
Re: gsoc, text search selectivity and dllist enhancments
tgl@[EMAIL PROTECTED] (T  2008-07-10 17:02:36 
Re: gsoc, text search selectivity and dllist enhancments
j.urbanski@[EMAIL PROTECT  2008-07-10 23:26:35 
Re: gsoc, text search selectivity and dllist enhancments
tgl@[EMAIL PROTECTED] (T  2008-07-10 18:19:36 
Re: gsoc, text search selectivity and dllist enhancments
j.urbanski@[EMAIL PROTECT  2008-07-11 08:18:25 
Re: gsoc, text search selectivity and dllist enhancments
tgl@[EMAIL PROTECTED] (T  2008-07-10 16:37:28 
Re: gsoc, text search selectivity and dllist enhancments
oleg@[EMAIL PROTECTED] (  2008-07-14 11:47:17 
Re: gsoc, text search selectivity and dllist enhancments
rlippan@[EMAIL PROTECTED]  2008-07-14 07:51:36 
Re: gsoc, text search selectivity and dllist enhancments
oleg@[EMAIL PROTECTED] (  2008-07-14 16:38:30 
Re: gsoc, text search selectivity and dllist enhancments
oleg@[EMAIL PROTECTED] (  2008-07-11 03:12:48 
Re: gsoc, text search selectivity and dllist enhancments
j.urbanski@[EMAIL PROTECT  2008-07-11 08:23:05 
Re: gsoc, text search selectivity and dllist enhancments
tgl@[EMAIL PROTECTED] (T  2008-07-11 02:30:26 
Re: gsoc, text search selectivity and dllist enhancments
j.urbanski@[EMAIL PROTECT  2008-07-11 17:31:00 
Re: gsoc, text search selectivity and dllist enhancments
tgl@[EMAIL PROTECTED] (T  2008-07-13 20:54:19 
Re: gsoc, text search selectivity and dllist enhancments
alvherre@[EMAIL PROTECTED  2008-07-13 23:52:43 
Re: gsoc, text search selectivity and dllist enhancments
tgl@[EMAIL PROTECTED] (T  2008-07-14 00:02:39 
Re: gsoc, text search selectivity and dllist enhancments
alvherre@[EMAIL PROTECTED  2008-07-14 01:01:20 
Re: gsoc, text search selectivity and dllist enhancments
tgl@[EMAIL PROTECTED] (T  2008-07-14 01:06:43 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri Dec 5 7:50:41 CST 2008.