Question: What is the best alternative for the following situation?

Question

What is the best alternative for the following situation?

Answers 2
Added at 2016-12-22 14:12
Tags
Question

I'm using a JSONB field in my Postgresql database to store the following document. I own thousands of documents. I need to create reports with this data, but the search is very slow.

If I need to create a report stating the new users of a month, I need to go through the entire document comparing if the user is in one month and not in another.

Message document:

[{"recipient":1,"user":4,"created_at":"2016-11-10","content":"Duis aliquam convallis nunc.","is_sender_user":true},
{"recipient":1,"user":18,"created_at":"2016-12-10","content":"Proin eu mi.","is_sender_user":false},
{"recipient":1,"user":4,"created_at":"2016-11-20","content":"In hac habitasse platea dictumstm.","is_sender_user":true},
{"recipient":1,"user":20,"created_at":"2016-12-14","content":"Donec ut dolor.","is_sender_user":true},
{"recipient":1,"user":13,"created_at":"2016-12-06","content":"Nulla mollis molestie lorem. Quisque ut erat. Curabitur gravida nisi at nibh.","is_sender_user":true}]

It would be better to create a User table and create a JSONB messages field to store your messages. Or the way it is I can create my report using JSONB queries?

Answers to

What is the best alternative for the following situation?

nr: #1 dodano: 2016-12-22 16:12

Your message documents describe a relationship between users: a sender transmits content to a recipient. A sender may send many messages, a recipient may receive many messages. This is best represented in a relational structure, with a users table and a messages table having foreign key constraints for the sender and recipient.

It's possible to just heave everything into a JSONB field like you're doing, but there are some major disadvantages: query performance suffers, although as Samuil Petrov mentioned this can be ameliorated with indexing; but more importantly, there's nothing preventing a message from having an invalid user or recipient id. Using a schemaless JSONB field can simplify development while you're still hashing out what you need to store, but once you know what you need, it should be enforced by your schema.

nr: #2 dodano: 2016-12-22 17:12

As Samuil Petrov mentioned you can create index on jsonb field, i suggest creating index on month part of created_at and user

create INDEX td002_si3 ON testData002 (substring(doc->>'created',0,8),(doc->>'user'));

with this the query

SELECT 
      substring(doc ->> 'created', 0, 8) AS m,
      ARRAY_AGG(DISTINCT doc ->> 'user')          AS users
    FROM testData002
    GROUP BY substring(doc ->> 'created', 0, 8)

will give you the monthly users from an index scan

GroupAggregate  (cost=0.28..381.52 rows=3485 width=50)
  Group Key: ""substring""((doc ->> 'created'::text), 0, 8)
  ->  Index Scan using td002_si3 on testdata002  (cost=0.28..294.28 rows=3500 width=50)

test data generated with

create table testData002 as 
     select row_number() OVER () as id
           ,jsonb_build_object('created',dt::DATE
                              ,'user',(random()*1000)::INT) as doc 
       from generate_series(1,10),generate_series('2016-01-01'::TIMESTAMP,'2016-12-15'::TIMESTAMP,'1 day'::INTERVAL) as dt;
Source Show
◀ Wstecz