Question: Gather stats on partition table

Question

Gather stats on partition table

Answers 2
Added at 2017-01-01 13:01
Tags
Question

I have table ABC interval partition by each day. Each partition is used in queries for that specific day. Even if i schedule job @nyt each day to gather stats then queries which are using that table before stats gather will not use optimal plan.

Answers
nr: #1 dodano: 2017-01-01 20:01

Gathering optimizer statistics on a partitioned table is not a trivial task with some caveats. Particularly on a daily partitioned schema it could be not the best solution to gather partitions statistics once per day.

To demonstrate it, let assume we have not daily schema but yearly partitions of transaction data. The question is, would it be OK to gather statistics say on 1. January (or 1. June or 31. December)? The answer is definitively NO, as in the first case the partition would be considered (nearly) empty, in the latter cases the statistic would be realistic, but they were gathered too late.

With this in mind there are IMO three possible approaches to handle it

1) do not gather statistics at all (and use dynamic sampling)

2) gather partition statistics repeatedly (say each hour)

3) do not gather statistics, but set them so, that the queries perform fine

The best option depends on your data and access pattern, so I only consider some details to the implementation of those options.

Sample Data

Lets generate a table with one full and one nearly empty daily partitions.

The table has a local index on the GROUP_ID column. The purpose of the exercise is to get a FULL TABLE SCAN while accessing the small partition and an INDEX ACCESS while accessing the big partition.

CREATE TABLE  mytab
   (    id number not null,
      group_id number,
      trans_date date,
      pad varchar2(4000))
PARTITION BY RANGE (trans_date)
INTERVAL (NUMTODSINTERVAL(1,'DAY'))
(
   PARTITION part_01 values LESS THAN (TO_DATE('31-12-2016','DD-MM-YYYY'))
);

create index mytab_idx1 on mytab(id) local;
create index mytab_idx2 on mytab(group_id) local;

-- full day partition
insert into mytab (id, group_id, trans_date, pad)
select rownum id, trunc(rownum/1000) group_id, to_date('31122016','ddmmyyyy'), lpad('x',3000,'x') from dual 
connect by  level <= 100000;
commit;

-- nearly empty day partition
insert into mytab (id, group_id, trans_date, pad)
select rownum id, trunc(rownum/1000) group_id, to_date('01012017','ddmmyyyy'), lpad('x',3000,'x') from dual
connect by  level <= 1000; 
commit;

Dynamic Sampling

If the target object have no statistics at all, Oracle performs dynamic sampling (aka dynamic statistics) With a little overhead Oracle calculates the statistics while parsing the statement. So it can't be stale.

Accessing the nearly empty partition Oracle proper choose FULL TABLE SCAN

EXPLAIN PLAN  SET STATEMENT_ID = 'jara1' into   plan_table  FOR
select   * from mytab 
where trans_date = TO_DATE('01-01-2017','DD-MM-YYYY') and group_id = 0;

SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL')); 

Plan hash value: 4018216072

------------------------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |   958 |  1905K|   274   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|       |   958 |  1905K|   274   (0)| 00:00:01 |     3 |     3 |
|*  2 |   TABLE ACCESS FULL    | MYTAB |   958 |  1905K|   274   (0)| 00:00:01 |     3 |     3 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TRANS_DATE"=TO_DATE(' 2017-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') 
              AND "GROUP_ID"=0)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

... while accessing the full partition INDEX ACCESS is used

EXPLAIN PLAN  SET STATEMENT_ID = 'jara1' into   plan_table  FOR
select * from mytab 
where trans_date = TO_DATE('31-12-2016','DD-MM-YYYY') and group_id = 0;

SELECT * FROM table(DBMS_XPLAN.DISPLAY('plan_table', 'jara1','ALL'));

Plan hash value: 984912596

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |            |  1608 |  3198K|  9021   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE                    |            |  1608 |  3198K|  9021   (1)| 00:00:01 |     2 |     2 |
|*  2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| MYTAB      |  1608 |  3198K|  9021   (1)| 00:00:01 |     2 |     2 |
|*  3 |    INDEX RANGE SCAN                        | MYTAB_IDX2 |  1608 |       |  2880   (1)| 00:00:01 |     2 |     2 |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TRANS_DATE"=TO_DATE(' 2016-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   3 - access("GROUP_ID"=0)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

So we see that the dynamic sampling works fine, choosing the proper access method.

Gather Partition Statistics Frequently

Repeating the gathering job mitigates the problem, that the partition is growing constantly.

The period depends on the transaction rate.

Example of gathering statistics for one partition only

 exec dbms_stats.gather_table_stats(OWNNAME=>user,TABNAME=>'MYTAB', PARTNAME=>'SYS_P10030',   CASCADE=> TRUE); 

The worst case that must be avoided is that *the statistics pointing that the partition is empty, but (in the meantime) the partition is heavy populated.

Set Statistics

This approach assumes, that the "correct" access path for the queries is known. In our example we may access a nearly empty partition with a FULL TABLE SCAN, but the index access is fine for such partition as well. So we can set partition statistics so, that always an INDEX ACCESS will be done.

One possible (very simple) schema is to copy the statistics from the previous day.

This call copies the statistics from partition SYS_P10029 to partition SYS_P10030

 exec DBMS_STATS.COPY_TABLE_STATS (OWNNAME=>user,TABNAME=>'MYTAB',srcpartname=>'SYS_P10029',dstpartname=> 'SYS_P10030');    

So in other word, immediately after the creation of the partition the statistics are initiated as for the full populated partition.

nr: #2 dodano: 2017-01-01 22:01

In my application I run this procedure once a day by scheduler job. It collects statistics for most recent partition.

PROCEDURE GatherIndexStats IS

    CURSOR IndexPartition(indName IN VARCHAR2) IS
    SELECT INDEX_NAME, PARTITION_NAME
    FROM USER_IND_STATISTICS i
        JOIN USER_TAB_PARTITIONS t USING (TABLE_NAME, PARTITION_NAME)
    WHERE TABLE_NAME = 'ABC'
        AND i.LAST_ANALYZED IS NULL
        AND OBJECT_TYPE = 'PARTITION'
        AND INDEX_NAME = indName
    ORDER BY INDEX_NAME, PARTITION_NAME DESC
    OFFSET 1 ROW FETCH FIRST 2 ROW ONLY;

BEGIN

    FOR aIndex IN (SELECT INDEX_NAME FROM USER_INDEXES WHERE TABLE_NAME = 'ABC') LOOP
        FOR aInd IN IndexPartition(aIndex.INDEX_NAME) LOOP
            DBMS_STATS.GATHER_INDEX_STATS(USER, aInd.INDEX_NAME, aInd.PARTITION_NAME);
        END LOOP;
    END LOOP;   

END GatherIndexStats;

In my application I need to get only Index statistics rather than full table stats. In case you like to get Index and table stats, use procedure below:

PROCEDURE GatherTableStats IS

    CURSOR TablePartition IS
    SELECT INDEX_NAME, PARTITION_NAME
    FROM USER_TAB_STATISTICS i
        JOIN USER_TAB_PARTITIONS t USING (TABLE_NAME, PARTITION_NAME)
    WHERE TABLE_NAME = 'ABC'
        AND i.LAST_ANALYZED IS NULL
        AND OBJECT_TYPE = 'PARTITION'
    ORDER BY PARTITION_NAME DESC
    OFFSET 1 ROW FETCH FIRST 2 ROW ONLY;

BEGIN

    FOR aPart IN TablePartition LOOP
        DBMS_STATS.GATHER_TABLE_STATS(USER, 'ABC', aPart.PARTITION_NAME);
    END LOOP;   

END GatherTableStats;
Source Show
◀ Wstecz