Question: Postgresql 10.1 - two similar queries, 200 times difference

Question

Postgresql 10.1 - two similar queries, 200 times difference

Answers 3
Added at 2017-11-30 20:11
Tags
Question

Two queries. First is 200 times longest than second. Why? PostgreSql 10.1. Metro and Sel - views on the same table.

EXPLAIN ANALYZE
SELECT *
FROM (
       SELECT
         metro.id    AS id,
         metro.title AS name,
         metro.c1
       FROM metro
         LEFT JOIN sel
           ON metro.id = sel.metrosku
       WHERE sel.id IS NULL) t
WHERE t.c1 = 'продукты'
LIMIT 100;

EXPLAIN ANALYZE
WITH t AS (SELECT
             metro.id    AS id,
             metro.title AS name,
             metro.c1
           FROM metro
             LEFT JOIN sel
               ON metro.id = sel.metrosku
           WHERE sel.id IS NULL)
SELECT *
FROM t
WHERE t.c1 = 'продукты'
LIMIT 100;

Query 1:

"QUERY PLAN" Limit  (cost=0.00..34190.48 rows=1 width=96) (actual time=532.298..86938.359 rows=100 loops=1)
->  Nested Loop Left Join  (cost=0.00..34190.48 rows=1 width=96) (actual time=532.298..86938.274 rows=100 loops=1) Join Filter: (lower((original.info ->> 'SKU'::text)) = "substring"(((original_1.info -> 'Images'::text) ->> '0'::text), '/(\d+)'::text)) Rows Removed by Join Filter: 3555434 Filter: (lower((original_1.info ->> 'SKU'::text)) IS NULL) Rows Removed by Filter: 99
->  Seq Scan on original  (cost=0.00..17432.97 rows=1 width=1185) (actual time=0.038..2.962 rows=199 loops=1) Filter: (((competitor)::text = 'metrocc'::text) AND ((info ->> 'Type'::text) = 'Item'::text) AND (lower(((info -> 'Catalog'::text) ->> '0'::text)) = 'продукты'::text)) Rows Removed by Filter: 63
->  Seq Scan on original original_1  (cost=0.00..16754.80 rows=90 width=1185) (actual time=0.484..169.594 rows=17867 loops=199) Filter: (((competitor)::text = 'sel'::text) AND ((info ->> 'Type'::text) = 'Item'::text)) Rows Removed by Filter: 49950 Planning time: 0.471 ms Execution time: 86938.450 ms

Query 2:

"QUERY PLAN"
Limit  (cost=33521.79..33521.82 rows=1 width=96) (actual time=425.243..443.735 rows=100 loops=1)
CTE t
->  Hash Left Join  (cost=16755.92..33521.79 rows=1 width=96) (actual time=425.239..443.574 rows=140 loops=1)
Hash Cond: (lower((original.info ->> 'SKU'::text)) = "substring"(((original_1.info -> 'Images'::text) ->> '0'::text), '/(\d+)'::text))
Filter: (lower((original_1.info ->> 'SKU'::text)) IS NULL)
Rows Removed by Filter: 82
->  Seq Scan on original  (cost=0.00..16754.80 rows=144 width=1185) (actual time=0.022..7.077 rows=1638 loops=1)
Filter: (((competitor)::text = 'metrocc'::text) AND ((info ->> 'Type'::text) = 'Item'::text))
Rows Removed by Filter: 54
->  Hash  (cost=16754.80..16754.80 rows=90 width=1185) (actual time=424.723..424.723 rows=16215 loops=1)
Buckets: 4096 (originally 1024)  Batches: 8 (originally 1)  Memory Usage: 4066kB
->  Seq Scan on original original_1  (cost=0.00..16754.80 rows=90 width=1185) (actual time=0.612..175.330 rows=17867 loops=1)
Filter: (((competitor)::text = 'sel'::text) AND ((info ->> 'Type'::text) = 'Item'::text))
Rows Removed by Filter: 49950
->  CTE Scan on t  (cost=0.00..0.02 rows=1 width=96) (actual time=425.242..443.716 rows=100 loops=1)
Filter: (c1 = 'продукты'::text)
Rows Removed by Filter: 40
Planning time: 0.451 ms
Execution time: 449.512 ms
Answers to

Postgresql 10.1 - two similar queries, 200 times difference

nr: #1 dodano: 2017-11-30 20:11

It's right on the EXPLAIN PLAN itself.

First query cost is 0.00..34190.48. So the cost for returning the first rows is close to 0. Since you only want the first 100 rows, it makes sense that it runs way faster than the second one.

nr: #2 dodano: 2017-11-30 20:11

Result of your EXPLAIN PLAN:
Query 1 (Sub query): Planning time: 0.471 ms Execution time: 86938.450 ms
Query 2 (CTE): Planning time: 0.451 ms Execution time: 449.512 ms

Based on your result, the CTE form is faster than the subquery form, in this case the planner is making poor optimization decisions.

You can try use EXPLAIN (BUFFERS,ANALYZE) the detail might help.

nr: #3 dodano: 2017-11-30 20:11

In postgresql, if a cte's output is used, it is first materialized, then referred to. There is no predicate pushdown.

This is a known behaviour and documented here:

https://blog.2ndquadrant.com/postgresql-ctes-are-optimization-fences/

and as others have pointed out, is evident in the explain itself.

Source Show
◀ Wstecz