Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Performance > Understanding h...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 8 Topic 4036 of 4156
Post > Topic >>

Understanding histograms

by len@[EMAIL PROTECTED] (Len Shapiro) Apr 29, 2008 at 09:56 PM

I hope I am posting to the right list.
I am running Postgresql 8.1.9 and don't understand the behavior of
histograms for data items not in the MVC list.  I teach databases and
want to use Postgres as an example.  I will appreciate any help that
anyone can provide.

Here is the data I am using. I am interested only in the "rank" attribute.

CREATE TABLE Sailors (
   sid Integer NOT NULL,
   sname varchar(20),
   rank integer,
   age real,
   PRIMARY KEY (sid));

I insert 30 sailor rows:

INSERT INTO Sailors VALUES (3, 'Andrew', 10,  30.0);
INSERT INTO Sailors VALUES (17, 'Bart',  5,  30.2);
INSERT INTO Sailors VALUES (29, 'Beth',    3,  30.4);
INSERT INTO Sailors VALUES (28, 'Bryant',    3,  30.6);
INSERT INTO Sailors VALUES (4, 'Cynthia',  9,  30.8);
INSERT INTO Sailors VALUES (16, 'David',  9,  30.9);
INSERT INTO Sailors VALUES (27, 'Fei',    3,  31.0);
INSERT INTO Sailors VALUES (12, 'James',    3,  32.0);
INSERT INTO Sailors VALUES (30, 'Janice',    3,  33.0);
INSERT INTO Sailors VALUES (2, 'Jim', 8,  34.5);
INSERT INTO Sailors VALUES (15, 'Jingke', 10,  35.0);
INSERT INTO Sailors VALUES (26, 'Jonathan',9,  36.0);
INSERT INTO Sailors VALUES (24, 'Kal',    3,  36.6);
INSERT INTO Sailors VALUES (14, 'Karen', 8,  37.8);
INSERT INTO Sailors VALUES (8, 'Karla',7,  39.0);
INSERT INTO Sailors VALUES (25, 'Kristen', 10, 39.5);
INSERT INTO Sailors VALUES (19, 'Len',   8,  40.0);
INSERT INTO Sailors VALUES (7, 'Lois',   8,  41.0);
INSERT INTO Sailors VALUES (13, 'Mark', 7,  43.0);
INSERT INTO Sailors VALUES (18, 'Melanie', 1,  44.0);
INSERT INTO Sailors VALUES (5, 'Niru',  5,  46.0);
INSERT INTO Sailors VALUES (23, 'Pavel',    3,  48.0);
INSERT INTO Sailors VALUES (1, 'Sergio', 7,  50.0);
INSERT INTO Sailors VALUES (6, 'Suhui', 1,  51.0);
INSERT INTO Sailors VALUES (22, 'Suresh',9,  52.0);
INSERT INTO Sailors VALUES (20, 'Tim',7,  54.0);
INSERT INTO Sailors VALUES (21, 'Tom', 10,  56.0);
INSERT INTO Sailors VALUES (11, 'Warren',    3,  58.0);
INSERT INTO Sailors VALUES (10, 'WuChang',9,  59.0);
INSERT INTO Sailors VALUES (9, 'WuChi', 10,  60.0);

after analyzing, I access the pg_stats table with

SELECT  n_distinct, most_common_vals,
        most_common_freqs,   histogram_bounds
FROM pg_stats WHERE tablename = 'sailors' AND attname = 'rank';

and I get:

n_distinct      most_common_vals        most_common_freqs 
histogram_bounds
-0.233333
        {3,9,10,7,8}    {0.266667,0.166667,0.166667,0.133333,0.133333} 
{1,5}

I have two questions.  I'd appreciate any info you can provide,
including pointers to the source code.

1. Why does Postgres come up with a negative n_distinct?  It
apparently thinks that the number of rank values will increase as the
number of sailors increases.  What/where is the algorithm that decides
that?

2. The most_common_vals and their frequencies make sense.  They say
that the values {3,9,10,7,8} occur a total of 26 times, so other
values occur a total of 4 times.  The other, less common, values are 1
and 5, each occuring twice, so the histogram {1,5} is appropriate.
If I run the query
EXPLAIN SELECT * from sailors where rank = const;
for any const not in the MVC list, I get the plan

Seq Scan on sailors  (cost=0.00..1.38 rows=2 width=21)
  Filter: (rank = const)

The "rows=2" estimate makes sense when const = 1 or 5, but it makes no
sense to me for other values of const not in the MVC list.
For example, if I run the query
EXPLAIN SELECT * from sailors where rank = -1000;
Postgres still gives an estimate of "row=2".
Can someone please explain?

Thanks,

Len Shapiro
****tland State University

-- 
Sent via pgsql-performance mailing list (pgsql-performance@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
 




 8 Posts in Topic:
Understanding histograms
len@[EMAIL PROTECTED] (L  2008-04-29 21:56:32 
Re: Understanding histograms
tgl@[EMAIL PROTECTED] (T  2008-04-30 01:19:51 
Re: Understanding histograms
lenshap@[EMAIL PROTECTED]  2008-04-29 23:32:18 
Re: Understanding histograms
tgl@[EMAIL PROTECTED] (T  2008-04-30 10:43:11 
Re: Understanding histograms
pgsql@[EMAIL PROTECTED]   2008-04-30 15:47:02 
Re: Understanding histograms
tgl@[EMAIL PROTECTED] (T  2008-04-30 19:17:44 
Re: Understanding histograms
stark@[EMAIL PROTECTED]   2008-04-30 20:53:44 
Re: Understanding histograms
tgl@[EMAIL PROTECTED] (T  2008-05-01 00:41:07 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan13V112 Sun Jul 6 21:59:10 CDT 2008.