Coskan’s Approach to Oracle

August 15, 2011

What happens, when you did not do a load test before release upgrades ?

Filed under: Performance — coskan @ 11:55 pm

This is what happened on 11.2.0.2 upgrade from 10.2.0.5

Picture is taken from production (tool used is for this picture is my recent favourite Database Time Monitor from Dominic Giles – creator of swingbench) today, which was upgraded to 11.2.0.2 on last saturday.

What happens here is I think self-explanatory, our system and underlying I/O is hammered with direct path reads.

Explaining this will not satisfy the customer so question is Why it is happening ?

1- It is obvious that upgrade was not tested good enough

2- Enough research about behaviour change has not been made for 10g to 11g migration

All these are political answers, real technical answer is below

Once I checked the system there were roughly 9 distinct sqlids running with many sessions,  all waiting on the same event.

The sqls were more or less  centralised on querying Users table with filtering email where there is no index and there has never been an index.  Why wasn’t system suffering from this lack of index when we are on 10GR2?

It is because 11G is now using direct path reads whenever it can for full table scans  (for more information check the references at the bottom of this table). This means when we run this query and if the table is eligible for direct path reads it will use direct path reads from disk. When you query with 2-3 sessions this is ok but when you query with 20 sessions then it becomes disaster like here and your disk run queue gets bigger and bigger

What was happening on 10GR2? Because Oracle was using scattered reads and cache when it can we were doing more logical I/O without hammering disk subsystem and this contention never revealed and because this application is third party  no index was created by us even there was unnecessary logical I/O.

Because we only functionality and vendor certifies the production on 11G we just upgraded the database.

Let me replicate the issue with simple test case on 11G by first with default direct path read behaviour and then reverting back to pre 11G behaviour by setting 10949 event at system level (ref: Dion Cho)

Test Case

—Running 20 sessions of simple select each for 10 times. Total 200 runs. (Idea taken from -Andrey S. Nikolaev -)


create table test_direct as select * from dba_source;

---single_thread.sql --- 20 lines of the code below
select  * from test_directpath2 where name like 'TEST%';

---multi_runner.sql
set echo off
set verify off
set head off:
define parallelism="&1"

spool many_threads.sql
select 'host sqlplus hr/hr @single_thread.sql &'
from dba_objects where rownum <= ¶llelism;
spool off
@many_threads.sql

---Running
@multi_runner 20

Default 11G


Pre-10G behaviour

during the issue-Only slight increase during the caching then all LIO

After the issue happened – It lasted only 1 minute

As you can see we have a sudden CPU increased due to high number of logical reads in short time but no long term I/O load,  once oracle caches it burns CPU instead waiting on I/O. Meanwhile I think DB was in maintenance window so resource manager kicked in to avoid hammering CPU–resmgr:cpu quantum event.

Here are the other comparisons between two runs

Session waits – 11G

HR@ORACOS> @swact

    SID USERN STATE   EVENT                                          SEQ# SEC_IN_STATE SQL_ID         CHN                        P1                        P2                        P3 P1TRANSL                                   BLCKNG_SID BLCKNG_SID_S BLCKNG_INST
------- ----- ------- ---------------------------------------- ---------- ------------ ------------- ---- ------------------------- ------------------------- ------------------------- ------------------------------------------ ---------- ------------ -----------
     15 HR    WAITING direct path read                                606            0 ba40q4n8w1myy    0 file number= 10           first dba= 493120         block cnt= 64                                                                   UNKNOWN
    134 HR    WAITING direct path read                                444            0 ba40q4n8w1myy    0 file number= 10           first dba= 493314         block cnt= 62                                                                   UNKNOWN
     75 HR    WAITING direct path read                                483            0 ba40q4n8w1myy    0 file number= 10           first dba= 494656         block cnt= 64                                                                   UNKNOWN
    200 HR    WAITING direct path read                                491            0 ba40q4n8w1myy    0 file number= 10           first dba= 494912         block cnt= 64                                                                   UNKNOWN
    196 HR    WAITING direct path read                                493            0 ba40q4n8w1myy    0 file number= 10           first dba= 494912         block cnt= 64                                                                   UNKNOWN
     12 HR    WAITING direct path read                                504            0 ba40q4n8w1myy    0 file number= 10           first dba= 495234         block cnt= 62                                                                   UNKNOWN
    132 HR    WAITING direct path read                                507            0 ba40q4n8w1myy    0 file number= 10           first dba= 495234         block cnt= 62                                                                   UNKNOWN
     13 HR    WAITING direct path read                                515            0 ba40q4n8w1myy    0 file number= 10           first dba= 495617         block cnt= 63                                                                   UNKNOWN
    201 HR    WAITING direct path read                                532            0 ba40q4n8w1myy    0 file number= 10           first dba= 496192         block cnt= 64                                                                   UNKNOWN
    137 HR    WAITING direct path read                                547            0 ba40q4n8w1myy    0 file number= 10           first dba= 496704         block cnt= 64                                                                   UNKNOWN
    136 HR    WAITING direct path read                                453            0 ba40q4n8w1myy    0 file number= 13           first dba= 100032         block cnt= 64                                                                   UNKNOWN
    202 HR    WAITING direct path read                                464            0 ba40q4n8w1myy    0 file number= 13           first dba= 100354         block cnt= 62                                                                   UNKNOWN
    135 HR    WAITING direct path read                                472            0 ba40q4n8w1myy    0 file number= 13           first dba= 100610         block cnt= 62                                                                   UNKNOWN
    199 HR    WAITING direct path read                                485            0 ba40q4n8w1myy    0 file number= 13           first dba= 101056         block cnt= 64                                                                   UNKNOWN
     71 HR    WAITING direct path read                                502            0 ba40q4n8w1myy    0 file number= 13           first dba= 101506         block cnt= 62                                                                   UNKNOWN
     16 HR    WAITING direct path read                                505            0 ba40q4n8w1myy    0 file number= 13           first dba= 101696         block cnt= 64                                                                   UNKNOWN
     73 HR    WAITING direct path read                                521            0 ba40q4n8w1myy    0 file number= 13           first dba= 102208         block cnt= 64                                                                   UNKNOWN
    133 HR    WAITING direct path read                                591            0 ba40q4n8w1myy    0 file number= 13           first dba= 98946          block cnt= 62                                                                   UNKNOWN
     74 HR    WAITING direct path read                                432            0 ba40q4n8w1myy    0 file number= 13           first dba= 99330          block cnt= 62                                                                   UNKNOWN
     72 HR    WAITING direct path read                                608            0 ba40q4n8w1myy    0 file number= 13           first dba= 99520          block cnt= 64                                                                   UNKNOWN
    192 HR    WORKING On CPU / runqueue                                43            0 90sp00nt0sspn    0 driver id=                #bytes= 1                 0                                                                               NOT IN WAIT
                                                                                                          0x0000000054435000

21 rows selected.

Session waits – Pre 11G


HR@ORACOS> @swact

    SID USERN STATE   EVENT                                          SEQ# SEC_IN_STATE SQL_ID         CHN                        P1                        P2                        P3 P1TRANSL                                   BLCKNG_SID BLCKNG_SID_S BLCKNG_INST
------- ----- ------- ---------------------------------------- ---------- ------------ ------------- ---- ------------------------- ------------------------- ------------------------- ------------------------------------------ ---------- ------------ -----------
    136       WAITING jobq slave wait                                  10            0                  0 0                         0                         0                                                                               NO HOLDER
     20       WAITING jobq slave wait                                  10            0                  0 0                         0                         0                                                                               NO HOLDER
     23 HR    WAITING resmgr:cpu quantum                               53            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
     25 HR    WAITING resmgr:cpu quantum                               60            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
     22 HR    WAITING resmgr:cpu quantum                               53            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    210 HR    WAITING resmgr:cpu quantum                               50            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
     14 HR    WAITING resmgr:cpu quantum                               56            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    208 HR    WAITING resmgr:cpu quantum                               53            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    206 HR    WAITING resmgr:cpu quantum                               54            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    144 HR    WAITING resmgr:cpu quantum                               58            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    141 HR    WAITING resmgr:cpu quantum                               54            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    140 HR    WAITING resmgr:cpu quantum                               50            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
     74 HR    WAITING resmgr:cpu quantum                               55            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
     81 HR    WAITING resmgr:cpu quantum                               56            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
     85 HR    WAITING resmgr:cpu quantum                               56            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    132 HR    WAITING resmgr:cpu quantum                               54            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    137 HR    WAITING resmgr:cpu quantum                               57            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
     73 HR    WORKING On CPU / runqueue                               506            0 90sp00nt0sspn    0 driver id=                #bytes= 1                 0                                                                               NOT IN WAIT
                                                                                                          0x0000000054435000

     80 HR    WORKING On CPU / runqueue                                56            0 ba40q4n8w1myy    0 driver id=                #bytes= 1                 0                                                                               NOT IN WAIT
                                                                                                          0x0000000062657100

    199 HR    WORKING On CPU / runqueue                                52            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             NOT IN WAIT
     82 HR    WORKING On CPU / runqueue                                57            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             NOT IN WAIT
     13 HR    WORKING On CPU / runqueue                                56            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             NOT IN WAIT
    205 HR    WORKING On CPU / runqueue                                54            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             NOT IN WAIT

23 rows selected.

SQL Stats -Pre-11G


HR@ORACOS> @sqlid ba40q4n8w1myy

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
ba40q4n8w1myy  297848798 select  * from test_directpath2 where name like 'TEST%'

       INS  CH#   UEX  PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS    FETCHES ROWS_PROCESSED     CPU_MS     ELA_MS    AVG_ELA       LIOS       PIOS
---------- ----  ---- ---------- -------- ---------- ---------- ---------- ---------- -------------- ---------- ---------- ---------- ---------- ----------
         1    0     0 3436878245     2147        200          1        200        200              0      52000  5333438.4  26.667192    2103470    2102605

SQL Stats -Pre-11G  –Check Elapsed times and PIO difference


HR@ORACOS> @sqlid ba40q4n8w1myy

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
ba40q4n8w1myy  297848798 select  * from test_directpath2 where name like 'TEST%'

       INS  CH#  UEX  PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS    FETCHES ROWS_PROCESSED     CPU_MS     ELA_MS    AVG_ELA       LIOS       PIOS
---------- ---- ---- ---------- -------- ---------- ---------- ---------- ---------- -------------- ---------- ---------- ---------- ---------- ----------
         1    0    0 3436878245     2147        200          2        200        200              0      16120 102802.028  .51401014    2104800      10513

HR@ORACOS>

Snapper -11G


HR@ORACOS> @sn 10 15
Sampling...

-- Session Snapper v3.10 by Tanel Poder @ E2SN ( http://tech.e2sn.com )

----------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                               ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------
     15, HR        , STAT, session logical reads                   ,          3338,      333.8,    ===>check the difference
     15, HR        , STAT, user I/O wait time                      ,          1040,        104,
     15, HR        , STAT, non-idle wait time                      ,          1039,      103.9,
     15, HR        , STAT, non-idle wait count                     ,            53,        5.3,
     15, HR        , STAT, physical read total IO requests         ,            54,        5.4,
     15, HR        , STAT, physical read total multi block requests,            54,        5.4,
     15, HR        , STAT, physical read total bytes               ,      27869184,      2.79M,    ===>check the difference
     15, HR        , STAT, cell physical IO interconnect bytes     ,      27869184,      2.79M,
     15, HR        , STAT, consistent gets                         ,          3338,      333.8,
     15, HR        , STAT, consistent gets direct                  ,          3338,      333.8,
     15, HR        , STAT, physical reads                          ,          3338,      333.8,
     15, HR        , STAT, physical reads direct                   ,          3338,      333.8,
     15, HR        , STAT, physical read IO requests               ,            53,        5.3,
     15, HR        , STAT, physical read bytes                     ,      27344896,      2.73M,
     15, HR        , STAT, file io wait time                       ,          1010,        101,
     15, HR        , STAT, Number of read IOs issued               ,            53,        5.3,
     15, HR        , STAT, no work - consistent read gets          ,          3338,      333.8,
     15, HR        , STAT, table scan rows gotten                  ,        164188,     16.42k,
     15, HR        , STAT, table scan blocks gotten                ,          3338,      333.8,
     15, HR        , TIME, DB CPU                                  ,         70000,        7ms,      .7%, |@         |
     15, HR        , TIME, sql execute elapsed time                ,      10107298,      1.01s,   101.1%, |@@@@@@@@@@|
     15, HR        , TIME, DB time                                 ,      10107298,      1.01s,   101.1%, |@@@@@@@@@@|
     15, HR        , WAIT, direct path read                        ,       9535205,   953.52ms,    95.4%, |@@@@@@@@@@|
--  End of Stats snap 1, end=2011-08-16 00:02:14, seconds=10

-----------------------------------------------------------------------
Active% | SQL_ID          | EVENT                     | WAIT_CLASS
-----------------------------------------------------------------------
    99% | ba40q4n8w1myy   | direct path read          | User I/O
     1% | ba40q4n8w1myy   | ON CPU                    | ON CPU

--  End of ASH snap 1, end=2011-08-16 00:02:14, seconds=10, samples_taken=99

Snapper – Pre-11G  –check logical I/O per second 


HR@ORACOS> @sn 30 192
Sampling...

-- Session Snapper v3.10 by Tanel Poder @ E2SN ( http://tech.e2sn.com )

----------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                               ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------
    192, HR        , STAT, opened cursors cumulative               ,            77,       2.57,
    192, HR        , STAT, opened cursors current                  ,            -1,       -.03,
    192, HR        , STAT, user calls                              ,           116,       3.87,
    192, HR        , STAT, pinned cursors current                  ,             1,        .03,
    192, HR        , STAT, session logical reads                   ,        406695,     13.56k,	  ===>check the difference
    192, HR        , STAT, CPU used when call started              ,           300,         10,
    192, HR        , STAT, CPU used by this session                ,           300,         10,
    192, HR        , STAT, DB time                                 ,          1926,       64.2,
    192, HR        , STAT, concurrency wait time                   ,             6,         .2,
    192, HR        , STAT, scheduler wait time                     ,          1471,      49.03,
    192, HR        , STAT, non-idle wait time                      ,          1479,       49.3,
    192, HR        , STAT, non-idle wait count                     ,           201,        6.7,
    192, HR        , STAT, enqueue requests                        ,             1,        .03,
    192, HR        , STAT, enqueue releases                        ,             1,        .03,
    192, HR        , STAT, consistent gets                         ,        406695,     13.56k,
    192, HR        , STAT, consistent gets from cache              ,        406695,     13.56k,
    192, HR        , STAT, consistent gets from cache (fastpath)   ,        406695,     13.56k,
    192, HR        , STAT, calls to kcmgcs                         ,           426,       14.2,
    192, HR        , STAT, calls to get snapshot scn: kcmgss       ,            39,        1.3,
    192, HR        , STAT, no work - consistent read gets          ,        406269,     13.54k,
    192, HR        , STAT, table scans (long tables)               ,            39,        1.3,
    192, HR        , STAT, table scan rows gotten                  ,      24488652,    816.29k,
    192, HR        , STAT, table scan blocks gotten                ,        406269,     13.54k,
    192, HR        , STAT, session cursor cache hits               ,            76,       2.53,
    192, HR        , STAT, parse count (total)                     ,            77,       2.57,
    192, HR        , STAT, execute count                           ,            77,       2.57,
    192, HR        , STAT, bytes sent via SQL*Net to client        ,         24169,     805.63,
    192, HR        , STAT, bytes received via SQL*Net from client  ,         19066,     635.53,
    192, HR        , STAT, SQL*Net roundtrips to/from client       ,            76,       2.53,
    192, HR        , TIME, parse time elapsed                      ,          2510,    83.67us,      .0%, |          |
    192, HR        , TIME, PL/SQL execution elapsed time           ,          4570,   152.33us,      .0%, |          |
    192, HR        , TIME, DB CPU                                  ,       3160000,   105.33ms,    10.5%, |@@        |
    192, HR        , TIME, sql execute elapsed time                ,      19257156,   641.91ms,    64.2%, |@@@@@@@   |
    192, HR        , TIME, DB time                                 ,      19757917,    658.6ms,    65.9%, |@@@@@@@   |
    192, HR        , WAIT, Disk file operations I/O                ,            38,     1.27us,      .0%, |          |
    192, HR        , WAIT, latch: cache buffers chains             ,         68244,     2.27ms,      .2%, |          |
    192, HR        , WAIT, resmgr:cpu quantum                      ,      14801390,   493.38ms,    49.3%, |@@@@@     |
    192, HR        , WAIT, SQL*Net message to client               ,           309,     10.3us,      .0%, |          |
    192, HR        , WAIT, SQL*Net message from client             ,       9924100,    330.8ms,    33.1%, |@@@@      |
    192, HR        , WAIT, events in waitclass Other               ,         11340,      378us,      .0%, |          |
--  End of Stats snap 1, end=2011-08-16 00:20:32, seconds=30

-----------------------------------------------------------------------
Active% | SQL_ID          | EVENT                     | WAIT_CLASS
-----------------------------------------------------------------------
    43% | ba40q4n8w1myy   | resmgr:cpu quantum        | Scheduler
    20% | ba40q4n8w1myy   | ON CPU                    | ON CPU

--  End of ASH snap 1, end=2011-08-16 00:20:32, seconds=30, samples_taken=87

I/O Stats – 11G–high runque size

coskan@coskan-lucid:~$ sar -d 1
Linux 2.6.32-33-generic (coskan-lucid) 	16/08/11 	_x86_64_	(4 CPU)

00:00:55          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
00:00:56       dev8-0    132.00 127200.00    104.00    964.42     21.18    130.15      7.58    100.00
00:00:56      dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:00:56      dev8-32    105.00   1680.00     34.00     16.32      0.68      6.38      6.00     63.00
00:00:56      dev8-48      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:00:56      dev8-64     84.00   3240.00   8400.00    138.57     79.66    109.88      9.64     81.00

00:00:56          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
00:00:57       dev8-0    131.00 128320.00     64.00    980.03     22.76    136.87      7.63    100.00
00:00:57      dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:00:57      dev8-32    158.00   2640.00      0.00     16.71      0.79      5.00      5.00     79.00
00:00:57      dev8-48      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:00:57      dev8-64     56.00      0.00  13440.00    240.00    144.52   1049.64     17.86    100.00

I/O Stats – PRe-11G  —no runque


00:11:57          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
00:11:58       dev8-0      2.00      0.00     32.00     16.00      0.03     15.00     15.00      3.00
00:11:58      dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:11:58      dev8-32     10.00    160.00     64.00     22.40      0.04      4.00      4.00      4.00
00:11:58      dev8-48      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:11:58      dev8-64    275.00   2464.00   4736.00     26.18    144.05    193.24      3.64    100.00

00:11:58          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
00:11:59       dev8-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:11:59      dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:11:59      dev8-32      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:11:59      dev8-48      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:11:59      dev8-64    440.00   3344.00   2984.00     14.38    146.16    295.02      2.27    100.00

There are two  solutions in this case one is  creating an index to avoid full table scan or disabling direct path read at system level which is not right at all.
Moral Of the story is TEST YOUR SYSTEM WITH SAME/SIMILAR LOAD BEFORE YOU UPGRADE !!!!!

Being certified on a platform won’t mean it will run fine
If you havent read already, these are reference posts about direct path reads on 11GR2

http://blog.tanelpoder.com/2011/08/11/full-scans-direct-path-reads-and-ora-8103-error-hacking-session-video-here-plus-itunes-podcast-address-2/

http://dioncho.wordpress.com/2009/07/21/disabling-direct-path-read-for-the-serial-full-table-scan-11g/

http://afatkulin.blogspot.com/2009/01/11g-adaptive-direct-path-reads-what-is.html

http://oraclue.com/2009/07/17/direct-path-reads-and-serial-table-scans-in-11g/

http://oracledoug.com/serendipity/index.php?/archives/1321-11g-and-direct-path-reads.html

August 13, 2011

dbms_xplan.display_sql_plan_baseline can also lie !

Filed under: Bugs, CBO, Performance, Plan Stability — coskan @ 6:40 pm

This blog post is about another buggy/funny/misleading behaviour which you MUST know if you are using baselines ( especially in a non-controlled fashion by let oracle manage them
) It looks like a long post but it is actually an easy read since nothing rocket science in the post.

All story started when one of our batches was not using the plan stored in the sql plan baseline repository.
In these cases app support check if the baseline is used by checking v$sql.sql_plan_baseline value.

If it is not used then they check the enabled, accepted baselines by using the statement below as in documentation

dbms_xplan.display_sql_plan_baseline(sql_handle=>’&sql_handle’,plan_name=>’&sql_plan’, format=>’BASIC’));

Once they find the plan_hash_value they are looking for they try to fix it and expect Oracle use the baseline which it should use in theory.

In our case it didn’t. When I checked the plan baseline was created in september 2010 and never actually used since 14 January 2011.
Fun part is some occasions same plan hash value in the baseline is being generated but Oracle is not reporting that baseline is used. But that’s the whole point of baselines having a constant plan hash value. Why oracle is not using the plan baseline even it generates the same plan with same plan hash value

We raised the issue with Oracle and oracle come back with couple of request

1- 10053 trace
2- plan baseline trace( alter session set events ‘trace [SQL_Plan_Management.*]‘;)

We provided the requests and they said if there was a change in the table structure at some point which is causing the plan can’t be reproduced. App team confirmed that there was a structural change to drop global index and recreate it as local on actually the date when the baseline was last used.
The problem is we changed a global index to local index on 14 January 2011 with an index name change, but display_sql_plan_baseline shows us a plan which uses a local index with the current name.This cannot be possible since the baseline is create on September 2010. So what was wrong?

Support could not managed to give me a decent answer or accept that it is a bug so it is time for me to create a test case and reveal the truth why this issue is happening.


HR@ORACOS> ---simple table creation
HR@ORACOS> create table baseline_test as select * from dba_objects;

Table created.

HR@ORACOS> ---index creation
create unique index baseline_test_idx_1
  2  on baseline_test(object_id);

Index created.

HR@ORACOS> ---gather table stats
HR@ORACOS> @gts baseline_test

PL/SQL procedure successfully completed.

HR@ORACOS> HR@ORACOS>
HR@ORACOS>
HR@ORACOS> ---start session level baseline capturing process
HR@ORACOS> alter session set optimizer_capture_sql_plan_baselines=TRUE;

Session altered.

HR@ORACOS> ---run the query below several times for plan capturing happens
HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> @e1

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  910wg4cc9fbwn, child number 0
-------------------------------------
select object_name from baseline_test  where object_id=5 and
data_object_id=10

Plan hash value: 3566649846

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |     1 |    32 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_1 |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("DATA_OBJECT_ID"=10)
   2 - access("OBJECT_ID"=5)

Note
-----
   - SQL plan baseline SQL_PLAN_g64u8gfpwnfr028052575 used for this statement

25 rows selected.

HR@ORACOS> @sql_plan_baselines2 910wg4cc9fbwn

##################
#active baselines#
##################

   INST_ID   UE SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER SQL_PLAN_BASELINE              SQL_HANDLE                     PLAN_NAME                      ENABLED ACCEPTED FIXED CREATED                   LAST_EXECUTED
---------- ---- ------------- --------------- ------------ ------------------------------ ------------------------------ ------------------------------ ------- -------- ----- ------------------------- -------------------------
         1    0 910wg4cc9fbwn      3566649846            0 SQL_PLAN_g64u8gfpwnfr028052575 SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr028052575 YES     YES      NO    13-AUG-11 15.45.42.000000 13-AUG-11 15.45.42.000000

Now it is time to check the plan output


HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'BASIC'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr028052575

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr028052575         Plan id: 671425909
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

Plan hash value: 3566649846

-----------------------------------------------------------
| Id  | Operation                   | Name                |
-----------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |
|   2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_1 |
-----------------------------------------------------------

21 rows selected.

Since we have a baseline now we can change the table index structure to replicate the issue


HR@ORACOS> ---now drop the index
HR@ORACOS> drop index baseline_test_idx_1
  2  ;

Index dropped.

HR@ORACOS> create unique index baseline_test_idx_2 on baseline_test(data_object_id,object_id);

Index created.

HR@ORACOS> ---Run it couple of times for baseline generation

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> @e1

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  910wg4cc9fbwn, child number 0
-------------------------------------
select object_name from baseline_test  where object_id=5 and
data_object_id=10

Plan hash value: 3304552195

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |     1 |    32 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_2 |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DATA_OBJECT_ID"=10 AND "OBJECT_ID"=5)

20 rows selected.

As you can see baseline no baseline is used. Lets check if a baseline captured ?


HR@ORACOS> @sql_plan_baselines2 910wg4cc9fbwn

##################
#active baselines#
##################

   INST_ID   UE SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER SQL_PLAN_BASELINE              SQL_HANDLE                     PLAN_NAME                      ENABLED ACCEPTED FIXED CREATED                   LAST_EXECUTED
---------- ---- ------------- --------------- ------------ ------------------------------ ------------------------------ ------------------------------ ------- -------- ----- ------------------------- -------------------------
         1    0 910wg4cc9fbwn      3304552195            0                                SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr028052575 YES     YES      NO    13-AUG-11 15.45.42.000000 13-AUG-11 15.45.42.000000
         1    0 910wg4cc9fbwn      3304552195            0                                SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr09db5a957 YES     NO       NO    13-AUG-11 15.49.54.000000

Two baselines are captured in the system, one is accepted which is the one initially created before the index recreation but the other one is not accepted ??

Now it is time to see what the plans of these baselines

HR@ORACOS> --ACCEPTED PLAN
HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'BASIC'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr028052575

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr028052575         Plan id: 671425909
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

Plan hash value: 3304552195

-----------------------------------------------------------
| Id  | Operation                   | Name                |
-----------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |
|   2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_2 |
-----------------------------------------------------------

21 rows selected.

HR@ORACOS> ---NON ACCEPTED NEWLY GENERATED PLAN

HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'BASIC'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr09db5a957

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr09db5a957         Plan id: 2645928279
Enabled: YES     Fixed: NO      Accepted: NO      Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

Plan hash value: 3304552195

-----------------------------------------------------------
| Id  | Operation                   | Name                |
-----------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |
|   2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_2 |
-----------------------------------------------------------

21 rows selected.

They are same ???? How come the plan can be same with the one which has different index names ??
If you are a careful reader you might have spotted The plan_id is different with same plan hash value

Lets check with ADVANCED option instead BASIC


HR@ORACOS> ----ACCEPTED BASELINE

HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'ADVANCED'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr028052575

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr028052575         Plan id: 671425909
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Outline Data from SMB:

  /*+
      BEGIN_OUTLINE_DATA
      INDEX_RS_ASC(@"SEL$1" "BASELINE_TEST"@"SEL$1" ("BASELINE_TEST"."OBJECT_ID"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */
--------------------------------------------------------------------------------

Plan hash value: 3304552195

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |     1 |    32 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |     1 |    32 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_2 |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1 / BASELINE_TEST@SEL$1
   2 - SEL$1 / BASELINE_TEST@SEL$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      INDEX_RS_ASC(@"SEL$1" "BASELINE_TEST"@"SEL$1" ("BASELINE_TEST"."DATA_OBJECT_ID"
              "BASELINE_TEST"."OBJECT_ID"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DATA_OBJECT_ID"=10 AND "OBJECT_ID"=5)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "OBJECT_NAME"[VARCHAR2,128]
   2 - "BASELINE_TEST".ROWID[ROWID,10]

68 rows selected.

Did you spot something ? Look carefully because I did not look carefully and wasted 1 day. Once you know how to get the higher detail of information better you should look carefully


HR@ORACOS>---NON ACCEPTED NEWLY GENERATED PLAN
hR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'ADVANCED'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr028052575

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr028052575         Plan id: 671425909
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Outline Data from SMB:

  /*+
      BEGIN_OUTLINE_DATA
      INDEX_RS_ASC(@"SEL$1" "BASELINE_TEST"@"SEL$1" ("BASELINE_TEST"."OBJECT_ID"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */
--------------------------------------------------------------------------------

Plan hash value: 3304552195

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |     1 |    32 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |     1 |    32 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_2 |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1 / BASELINE_TEST@SEL$1
   2 - SEL$1 / BASELINE_TEST@SEL$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      INDEX_RS_ASC(@"SEL$1" "BASELINE_TEST"@"SEL$1" ("BASELINE_TEST"."DATA_OBJECT_ID"
              "BASELINE_TEST"."OBJECT_ID"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DATA_OBJECT_ID"=10 AND "OBJECT_ID"=5)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "OBJECT_NAME"[VARCHAR2,128]
   2 - "BASELINE_TEST".ROWID[ROWID,10]

68 rows selected.

If you haven’t spotted already, the problem with the accepted plan output is the difference between OUTLINE DATA FROM SMB and OUTLINE DATA sections.
There are two outline sections per baseline output. one is OUTLINE DATA FROM SMB and other is OUTLINE DATA

Now you can do line by line comparison and see the difference.

Outline Data from SMB:

/*+
BEGIN_OUTLINE_DATA
INDEX_RS_ASC(@”SEL$1″ “BASELINE_TEST”@”SEL$1″ (“BASELINE_TEST”.”OBJECT_ID”))
OUTLINE_LEAF(@”SEL$1″)
ALL_ROWS
DB_VERSION(‘11.2.0.1′)
OPTIMIZER_FEATURES_ENABLE(‘11.2.0.1′)
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/

Outline Data
————-

/*+
BEGIN_OUTLINE_DATA
INDEX_RS_ASC(@”SEL$1″ “BASELINE_TEST”@”SEL$1″ (“BASELINE_TEST”.”DATA_OBJECT_ID”
“BASELINE_TEST”.”OBJECT_ID”))
OUTLINE_LEAF(@”SEL$1″)
ALL_ROWS
DB_VERSION(‘11.2.0.1′)
OPTIMIZER_FEATURES_ENABLE(‘11.2.0.1′)
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/

What happens here is that Oracle only stores the outline information and when you run the display_sql_plan_baseline
Oracle tries to generate the plan output from the outline but problem starts there because it tries to use the index which is not existent on the system anymore so it can’t reproduce the plan and instead of giving any warning or error it generates a plan for you by lying you.

To confirm what we said lets drop the secondary index and see what dbms_xplan.display_sql_plan_baseline will generate for us for the same baseline


HR@ORACOS> ---dropping index
HR@ORACOS>
HR@ORACOS> drop index baseline_test_idx_2;

Index dropped.

------FIRST BASELINE CREATED

HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'BASIC'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr09db5a957

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr09db5a957         Plan id: 2645928279
Enabled: YES     Fixed: NO      Accepted: NO      Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

Plan hash value: 2027741861

-------------------------------------------
| Id  | Operation         | Name          |
-------------------------------------------
|   0 | SELECT STATEMENT  |               |
|   1 |  TABLE ACCESS FULL| BASELINE_TEST |
-------------------------------------------

20 rows selected.

As you can see now it generates a new plan for us for the same baseline :) What a liar

What is the output from the valid but not accepted baseline, once we dropped the index it was using


HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'BASIC'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr028052575

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr028052575         Plan id: 671425909
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

Plan hash value: 2027741861

-------------------------------------------
| Id  | Operation         | Name          |
-------------------------------------------
|   0 | SELECT STATEMENT  |               |
|   1 |  TABLE ACCESS FULL| BASELINE_TEST |
-------------------------------------------

20 rows selected.

Same issue happened again.

Lets run the same sql to see if it generates and accepts a baseline with full table scan


HR@ORACOS> ---Lets run the sql again to force a plan
HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> @e1

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  910wg4cc9fbwn, child number 0
-------------------------------------
select object_name from baseline_test  where object_id=5 and
data_object_id=10

Plan hash value: 2027741861

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |       |       |   217 (100)|          |
|*  1 |  TABLE ACCESS FULL| BASELINE_TEST |     1 |    32 |   217   (1)| 00:00:02 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("OBJECT_ID"=5 AND "DATA_OBJECT_ID"=10))

19 rows selected.

The baseline is generated as below but it is still not accepted.


HR@ORACOS> @sql_plan_baselines2 910wg4cc9fbwn

##################
#active baselines#
##################

   INST_ID   UE SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER SQL_PLAN_BASELINE              SQL_HANDLE                     PLAN_NAME                      ENABLED ACCEPTED FIXED CREATED                   LAST_EXECUTED
---------- ---- ------------- --------------- ------------ ------------------------------ ------------------------------ ------------------------------ ------- -------- ----- ------------------------- -------------------------
         1    0 910wg4cc9fbwn      2027741861            0                                SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr028052575 YES     YES      NO    13-AUG-11 15.45.42.000000 13-AUG-11 15.45.42.000000
         1    0 910wg4cc9fbwn      2027741861            0                                SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr09db5a957 YES     NO       NO    13-AUG-11 15.49.54.000000
         1    0 910wg4cc9fbwn      2027741861            0                                SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr018837053 YES     NO       NO    13-AUG-11 15.59.15.000000

I assume You already understood the issue. This is a proper buggy/dodgy behaviour for me.

Now lets try some workaround to see how we can get over from this situation.

There are 2 ways to get around from this problem and all of them needs DBA input. I have tested all and they are working but you can go ahead and test them. I don’t want to make this blog post longer than it is so I’m only listing the options.

1- Evolve the baseline for the sql_handle so Oracle accepts the plans (will keep the invalid ones in the system)

2- Drop the invalid baselines and evolve the handle (this way you will get rid of the invalid ones)

3- Drop all baselines related with the objects altered and allow capturing to the trick for you. (Less headache than second but you may lose the good plans- still would be my approach for releases )

Even Oracle warns in documentation “SQL plan baselines cannot help in cases where an event has caused irreversible execution plan changes, such as dropping an index.” and also ” if a change in the system (such as a dropped index) causes all accepted plans to become non-reproducible, then the optimizer selects the best-cost plan. “
The did not give much information about the behaviour.

From my point of view there are two major problems which needs to be addressed for this specific table structure change case

1- Instead of showing a plan which is not same as baseline, display_sql_plan_baseline must report an error that stored baseline is not reproduce-able

2- Baselines are not invalidated there must be an invalidation flag which makes the baseline invalidated at very first time CBO can’t reproduce the plan via baseline and set enabled and accepted flag to no. Since the purging of the plans is 53 weeks by
default for non used baselines this is a needed enhancement. If a plan is accepted and not produceable with smaller cost it avoids other plans accepted.

Another problem with baselines is also covered in this clear post by Maxym Kharchenko which can be related with my first point (that very same post series from Maxym  is one of the reasons I did not continue my series since author nearly covered all -but I still have something to write) What happens if both objects are under two schemas like some development groups uses for different testing phases. One drops the index one did not and you have system level capturing on. ?

I think Oracle must address these issues by introducing another complexity for sql signatures which involves schema owner or something similar.

Long story short you might be easily fooled by baselines with the current structure. If I were you I will always check the plans with at least OUTLINE option or go for ADVANCED.

Confession:n I am always using advanced option but as usual I did not pay that much attention to the output at first point :( That was my biggest mistake there which cost 1 day for analysis. knowing and using an option but not paying attention to the output of the option.)

All tests are done on 11.2.0.1

Bonus material

For the ones who likes tracefiles this issue is shown as this

=======TRACEFILE ======

------- START SPM Plan Dump -------
SPM: failed to reproduce the plan using the following info:
  parse_schema name        : HR
  plan_baseline signature  : 17515423071774325472
  plan_baseline plan_id    : 671425909
  plan_baseline hintset    :
    hint num  1 len 37 text: OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
SPM: generated non-matching plan:
----- Explain Plan Dump -----
----- Plan Table -----

============
Plan Table
============
------------------------------------------+-----------------------------------+
| Id  | Operation          | Name         | Rows  | Bytes | Cost  | Time      |
------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT   |              |       |       |   217 |           |
| 1   |  TABLE ACCESS FULL | BASELINE_TEST|     1 |    32 |   217 |  00:00:02 |
------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
1 - filter(("OBJECT_ID"=5 AND "DATA_OBJECT_ID"=10))

Content of other_xml column
===========================
  db_version     : 11.2.0.1
  parse_schema   : HR
  plan_hash      : 2027741861
  plan_hash_2    : 411267155
  Outline Data:
  /*+
    BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      DB_VERSION('11.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "BASELINE_TEST"@"SEL$1")
    END_OUTLINE_DATA
  */

------- END SPM Plan Dump -------
SPM: couldn't reproduce any enabled+accepted plan so using the cost-based plan, planId = 411267155

Theme: Silver is the New Black. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 206 other followers