Question: Why are Postgres lookups on jsonb columns so slow?

Question

Why are Postgres lookups on jsonb columns so slow?

Answers 0
Added at 2016-12-07 17:12
Tags
Question

I have a table targeting that has a column marital_status of type text[] and another column data of type jsonb. The content of these two columns is the same, just in a different format (it's just for demonstration purposes). Example data:

 id |      marital_status      |                        data                       
----+--------------------------+---------------------------------------------------
  1 | null                     | {}
  2 | {widowed}                | {"marital_status": ["widowed"]}
  3 | {never_married,divorced} | {"marital_status": ["never_married", "divorced"]}
...

There are more than 690K records in the table in random combination.

Lookup on text[] column

EXPLAIN ANALYZE SELECT marital_status
FROM targeting
WHERE marital_status @> '{widowed}'::text[]

Without index

Usually takes < 900ms without creating any indices:

Seq Scan on targeting  (cost=0.00..172981.38 rows=159061 width=28) (actual time=0.017..840.084 rows=158877 loops=1)
  Filter: (marital_status @> '{widowed}'::text[])
  Rows Removed by Filter: 452033
Planning time: 0.150 ms
Execution time: 845.731 ms

With index

With index it usually takes < 200ms (75% improvement):

CREATE INDEX targeting_marital_status_idx ON targeting ("marital_status");

Result:

Index Only Scan using targeting_marital_status_idx on targeting  (cost=0.42..23931.35 rows=159061 width=28) (actual time=3.528..143.848 rows=158877 loops=1)"
  Filter: (marital_status @> '{widowed}'::text[])
  Rows Removed by Filter: 452033
  Heap Fetches: 0
Planning time: 0.217 ms
Execution time: 148.506 ms

Lookup on jsonb column

EXPLAIN ANALYZE SELECT data
FROM targeting
WHERE (data -> 'marital_status') @> '["widowed"]'::jsonb

Without index

Usually takes < 5,700ms without creating any indices (more than 6 times slower!):

Seq Scan on targeting  (cost=0.00..174508.65 rows=611 width=403) (actual time=0.095..5399.112 rows=158877 loops=1)
  Filter: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
  Rows Removed by Filter: 452033
Planning time: 0.172 ms
Execution time: 5408.326 ms

With index

With index it usually takers < 3,700ms (35% improvement):

CREATE INDEX targeting_data_marital_status_idx ON targeting USING GIN ((data->'marital_status'));

Result:

Bitmap Heap Scan on targeting  (cost=144.73..2482.75 rows=611 width=403) (actual time=85.966..3694.834 rows=158877 loops=1)
  Recheck Cond: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
  Rows Removed by Index Recheck: 201080
  Heap Blocks: exact=33723 lossy=53028
  ->  Bitmap Index Scan on targeting_data_marital_status_idx  (cost=0.00..144.58 rows=611 width=0) (actual time=78.851..78.851 rows=158877 loops=1)"
        Index Cond: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
Planning time: 0.257 ms
Execution time: 3703.492 ms

Questions

  • Why is the text[] column so much more performant, even when not using indices?
  • Why does adding indices to the jsonb column only increase the performance by 35%?
  • Is there a more perfomant way of doing the lookup on the jsonb column?
Answers
Source Show
◀ Wstecz