On Wed, 16 Apr 2008 23:14:11 +0200, samantha mahindrakar=20=20
<sam.mahindrakar@[EMAIL PROTECTED]
> wrote:
> Hi....
> Iam finding the following query is working a bit slow:
> EXECUTE '(SELECT ARRAY(SELECT DISTINCT date_part(''day'',=20=20
> measurement_start)
> FROM ' || gettablestring(dates)|| '
> WHERE lane_id IN (' || lanesidarr || ')))'
> INTO temparr;
>
> This function is trying to find all the days in a prticular month
> whihc has data for the particular lane and put it in an array...which
> can be used later.
> gettablestring(dates) returns the partition name from which the data
> needs to be extracted. These partitions have index on the
> measurement_start field.
> lanesidarr is a lane number. The partition has an index on this field
to.
> Could anyone give me some hints???/
OK so I guess you have one partition per month since there is no month
in=
=20=20
your WHERE.
If this is a table which hasn't got much write activity (probably the=20=
=20
case for last month's partition, for instance), CLUSTER it on
something=20=
=20
appropriate that you use often in queries, like lane_id here.
And you can use SELECT foo GROUP BY foo, this will use a hash, it
is=20=20
faster than a sort.
Example :
CREATE TABLE blop AS SELECT '2008-01-01'::TIMESTAMP + ((n%30)*'1=20=20
DAY'::INTERVAL) AS t FROM generate_series(1,100000) AS n;
ALTER TABLE blop ADD d DATE NULL;
UPDATE blop SET d=3Dt;
VACUUM FULL ANALYZE blop;
-- Now blop contains 100K timestamps and 100K dates from the month 2008-01
EXPLAIN ANALYZE SELECT DISTINCT EXTRACT( DAY from t ) FROM blop;
QUERY PLAN
---------------------------------------------------------------------------=
----------------------------------------------
Unique (cost=3D10051.82..10551.82 rows=3D30 width=3D8) (actual=20=20
time=3D221.740..289.801 rows=3D30 loops=3D1)
-> Sort (cost=3D10051.82..10301.82 rows=3D100000 width=3D8) (actual=
=20=20
time=3D221.737..250.911 rows=3D100000 loops=3D1)
Sort Key: (date_part('day'::text, t))
Sort Method: quicksort Memory: 5955kB
-> Seq Scan on blop (cost=3D0.00..1747.00 rows=3D100000 width=
=3D8)=20=20
(actual time=3D0.021..115.254 rows=3D100000 loops=3D1)
Total runtime: 290.237 ms
(6 lignes)
Temps : 290,768 ms
EXPLAIN ANALYZE SELECT EXTRACT( DAY from t ) AS day FROM blop GROUP BY
day;
QUERY PLAN
---------------------------------------------------------------------------=
----------------------------------------
HashAggregate (cost=3D1997.00..1997.38 rows=3D30 width=3D8) (actual=20=
=20
time=3D198.375..198.390 rows=3D30 loops=3D1)
-> Seq Scan on blop (cost=3D0.00..1747.00 rows=3D100000 width=3D8)
(a=
ctual=20=20
time=3D0.021..129.779 rows=3D100000 loops=3D1)
Total runtime: 198.437 ms
(3 lignes)
Temps : 198,894 ms
=3D=3D> Hash is faster than Sort
EXPLAIN ANALYZE SELECT d FROM blop GROUP BY d;
QUERY PLAN
---------------------------------------------------------------------------=
---------------------------------------
HashAggregate (cost=3D1747.00..1747.30 rows=3D30 width=3D4) (actual=20=
=20
time=3D101.829..101.842 rows=3D30 loops=3D1)
-> Seq Scan on blop (cost=3D0.00..1497.00 rows=3D100000 width=3D4)
(a=
ctual=20=20
time=3D0.012..33.428 rows=3D100000 loops=3D1)
Total runtime: 101.905 ms
(3 lignes)
Temps : 102,516 ms
=3D=3D> Not computing the EXTRACT is faster obviously
(actually EXPLAIN ANALYZE adds some overhead, the query really takes 60
ms)
If you have an index lane_id, measurement_date, you can always do :
for day in 1..31:
find 1 row with which has this day
reutrn the days you found
--=20
Sent via pgsql-performance mailing list (pgsql-performance@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


|