Question: Improve MERGE performance when using big tables

Question

Improve MERGE performance when using big tables

Answers 1
Added at 2016-12-27 20:12
Tags
Question

Context

We have a model in which each element has an element kind and from 0 to N features. Each feature belongs to only one element and has a feature name.

This is modeled as the following tables:

ELEMENTS
  elem_id int not null -- PK
  elem_elki_id int not null -- FK to ELEMENT_KINDS
  -- more columns with elements data

ELEMENT_KINDS
  elki_id int not null -- PK
  -- more columns with elements kinds data

FEATURES
  feat_id int not null -- PK
  feat_elem_id int not null -- FK to ELEMENTS  
  feat_fena_id int not null -- FK to FEATURE_NAMES
  -- more columns with features data

FEATURE_NAMES
  fena_id int not null -- PK
  -- more columns with feature_names data


Requirement

There is a new requirement of replacing the feature names table with a feature kinds table. There is one (and only one) feature kind for each (element kind, feature name) pair.

The changes in the models were adding a new column and creating a new table:

ALTER TABLE features ADD feat_feki_id int null;

CREATE TABLE FEATURE_KINDS
(
  feki_id int not null, -- PK
  feki_elki_id int not null, -- FK to ELEMENT_KINDS
  feki_fena_id int null, -- FK* to FEATURE_NAMES
  -- more columns with feature kinds data
)

*feki_fena_id is actually a temp colum showing which feature name was used to create each feature kind. After populating feat_feki_id, feki_fena_id should be discarded along with feat_fena_id and the feature names table.


Problem

After successfully populating the features kinds table we are trying to populate the feat_feki_id column using the following query:

MERGE INTO features F
USING
(
    SELECT *
    FROM elements
    INNER JOIN feature_kinds
    ON elem_elki_id = feki_elki_id
) EFK
ON
(
    F.feat_elem_id = EFK.elem_id AND
    F.feat_fena_id = EFK.feki_fena_id
)
WHEN MATCHED THEN
UPDATE SET F.feat_feki_id = EFK.feki_id;

This works in small case scenarios with test data, but in production we have ~20 million elements and ~2000 feature_kinds and it takes about an hour before throwing an ORA-30036: unable to extend segment by 8 in undo tablespace 'UNDOTBS1' error.


Question

Is there any way I could improve the performance of the MERGE so that it works? (Maybe I'm lacking some indexes?)

Is there another alternative to fill up the feat_feki_id column? (We already have tried UPDATE instead of MERGE with similar results)

Answers to

Improve MERGE performance when using big tables

nr: #1 dodano: 2016-12-27 22:12

It's not clear whether there is something wrong going on or whether your undo segments are just too small. Can you do the following statement without getting an ORA-30036?

UPDATE features f SET f.feat_feki_id = 12345;

If that doesn't work, you just need to increase the size of your undo segment. Kludges are available to do the update in chunks, but you really shouldn't have to do that.

Assuming it's NOT a simple UNDO size issue, one thing you might do is make sure that your MERGE (or UPDATE) is updating rows in the order they appear in your table. Otherwise, you could be revisiting the same blocks over and over, really hurting performance and increasing UNDO usage. I encountered this in a similar operation I had to do a few years ago and I was shocked when I finally figured it out.

To avoid the problem I had, you would want something like this:

MERGE INTO features F
USING
(
    SELECT f.feat_id, fk.feki_id
    FROM features f
    INNER JOIN elements e ON e.elem_id = f.feat_elem_id
    INNER JOIN feature_kinds fk ON fk.feki_elki_id = e.elem_elki_id and fk.feki_fena_id = f.feat_fena_id
    -- Order by the ROWID of the table you are updating to ensure you are not revisiting the same block over and over
    ORDER BY f.rowid
) EFK
ON
(
    F.feat_id = efk.feat_id )
)
WHEN MATCHED THEN
UPDATE SET F.feat_feki_id = EFK.feki_id;

I may have gotten your data model wrong, but the key point is to include the FEATURES table in the MERGE query and ORDER BY features.rowid to ensure that the updates happen in row order.

Source Show
◀ Wstecz