Question: How to speed up SQL query with date GROUP BY?


How to speed up SQL query with date GROUP BY?

Answers 2
Added at 2017-04-22 20:04

I have a normal SQLite database table called table1 with 7 columns and of course a rowid. The first column is an custom_id number, the second is date in format YYYY-MM-DD and other 5 are real number data columns. There are about 10M rows in the database, and custom_id and date columns have indices.

What I want to do is to speed up the following query:

SELECT date,max(data1) AS maximum
FROM table1
WHERE custom_id = '1123' AND data1 <> 'NaN'
GROUP BY strftime('%Y-%m', date)

I want to find the maximum correct (not NaN) data1 value for the custom_id 1123 for each year-month-combination. The code above works actually fine, but the query lasts 10 seconds in the first run, but the second time it takes under 1 second, which is OK for me. I run the query in my home PC Apache server with PHP. I think Apache uses some caching which explains the difference.

But the question is, how to speed up the first time run performance? I have many other custom_id:s to query, not all can be cached! Do I need more indices? Another kind of query?

Answers to

How to speed up SQL query with date GROUP BY?

nr: #1 dodano: 2017-04-22 20:04

I am guessing this is what you intend:

SELECT strftime('%Y-%m', date), max(data1) AS maximum
FROM table1
WHERE custom_id = 1123 AND data <> 'NaN'
GROUP BY strftime('%Y-%m', date)

Start with an index on table1(custom_id, date).

nr: #2 dodano: 2017-04-22 21:04

We are going to create an index that will support the following operations:

  1. Retrieve the records of a specific customer
  2. aggregate by month

Creating the following index is not possible since strftime is not a deterministic function

create index table1_ix on table1 (custom_id,strftime('%Y-%m', date));

non-deterministic functions prohibited in index expressions

So instead of strftime('%Y-%m', date) we are going to use substr(date,1,7)

create index table1_ix on table1 (custom_id,substr(date,1,7));

The query should be changed accordingly

select      substr(date,1,7), max(data1) as maximum
from        table1
where       custom_id = '1123' 
        and data1 <> 'NaN'
group by    substr(date,1,7)
Source Show
◀ Wstecz