Coskan’s Approach to Oracle

November 11, 2011

Where is the sql_id of active session?

Filed under: Bugs, Performance, Plan Stability, Tips — coskan @ 3:39 pm

In this post I will give information on quick ways to find the sql_id’s of sessions for which you can’t see any entry when you look at v$session.

I usually see this behavior when sql is burning CPU in a nested loop and after a long time of running sql_id column suddenly not updated anymore. When I tweeted about it , ,Tanel Poder answered (If I a recall it right damn twitter does not have a proper search and history) something like ” this happens when oracle does too many recursive calls and loses track of the sql_id” (Tanel, if you are reading, please correct me if I am wrong about what you said)

On Metalink there are two notes I could find

Reason for Apparently Incorrect or Null Sql_id in v$sql [ID 406452.1]

This is caused by an architectural change in the 10g release: from 10.2 onwards, the v$sql columns (sql_id, sql_address and so on) will only show the top level user cursor,  if any (ie it will not display the recursive cursors),.
When using a remote procedure call, such as when a rpc call from a client (eg forms client) requests that a plsql object is executed on the server, there is no top level sql and, consequently, there is no sql_id in v$session.

BUG:5146994 - CANNOT FIND RECURSIVE SQL IN V$SQLAREA FOR A RUNNING SNAPSHOT REFRESH JOB
BUG 5724661 Abstract: WRONG VALUES FOR SQL_ID IN V$SESSION

On Null SQL_ID For Active Sessions In V$Session [ID 1343503.1]

From 10.2 onwards sql_id will only show the top level user cursor ( if any). 

Sometimes,Session does a plsql rpc call ie a client ( eg forms client) can request a plsql object to be executed at the server. In this case there is no top level sql and as such we don't see the sql_id in v$session.

Currently, oracle don't store the current sql thru any v$ views. The only way to find that is thru event 10046.
This is covered in Unpublished bug 5528670: "NULL VALUES FOR SQL_ID, SQL_HASH_VALUE, SQL_ADDRESS AND OTHERS IN V$SESSION", Which was closed as "Not a Bug".

When I have this issues I sometimes get it even on a sql which I run from sql_plus without a pl/sql call so I think limitation on these notes is a bit misleading or I am interpreting the information wrong. Anyway finding the reasons needs too much tracing research bla bla bla which you will never have time to go through when you hit the issue on a production system so best I cut the chase and start giving information how to find the sql_id

Notes says there is no v$ view which is a a bit misleading and notes also say 10046 is the only way to find it which is also another documentation error ignoring method 2 below.

On 10G there are 2 ways that I know will usualls work one of which needs tuning and diagnostics pack license other does not

On 11G there is one more way which again needs tuning and diagnostics pack license.

Method 1- using ASH/AWR data + v$sql combination (10g-11g license needed)

1a- Find the latest not null sql_id column for the given session and serial in v$active_session_history

1b- If you cant find it at first step, due to the reason sql_id information already lost in v$active_session_history, then you need to check similar way using dba_hist_active_sess_history

2- Once you find the sql_id,plan_hash_value double check it on v$sql if number on users_executing column is matching in current v$session then you have what you looking for

—Method-1 on Action

DATABASE1> @swact

    SID USERN STATE   EVENT                        SEQ# SEC_IN_STATE         P1         P2         P3  SQL_ID         CHN
------- ----- ------- ---------------------- ---------- ------------ ---------- ---------- ----------  ------------- ----
   1352 XXWEB WAITING db file scattered read      40906            0         67    2638509         32                       --->sql_id empty
    294 XXWEB WAITING db file scattered read      16727            0         82     234468         32                       --->sql_id empy
   1263 COSKA WORKING On CPU / runqueue              44            0 1413697536          1          0  f5cqythck1gfz    1

DATABASE1> @last_sql 1352 %

SAMPLE_TIME                           SESSION_ID SESSION_SERIAL# SQL_ID		PLAN_HASH_VALUE
------------------------------------- ---------- --------------- -------------  ---------------
31-JAN-11 11.00.35.524                      1352              72 3p1b93fq81x0f  2118159443	   ----> very likely our sql
31-JAN-11 11.00.34.514                      1352              72 3p1b93fq81x0f  2118159443
31-JAN-11 11.00.33.514                      1352              72 3p1b93fq81x0f  2118159443
31-JAN-11 11.00.32.514                      1352              72 3p1b93fq81x0f  2118159443
31-JAN-11 11.00.31.514                      1352              72 3p1b93fq81x0f  2118159443
31-JAN-11 11.00.30.514                      1352              72 3p1b93fq81x0f  2118159443
31-JAN-11 11.00.29.504                      1352              72 3p1b93fq81x0f  2118159443
31-JAN-11 11.00.28.494                      1352              72 3p1b93fq81x0f  2118159443
31-JAN-11 11.00.27.494                      1352              72 3p1b93fq81x0f  2118159443

9 rows selected.

DATABASE1> @last_sql 294

SAMPLE_TIME                            SESSION_ID SESSION_SERIAL# SQL_ID	PLAN_HASH_VALUE
-------------------------------------- ---------- --------------- ------------- ---------------
31-JAN-11 10.50.40.071                        294              39 3p1b93fq81x0f 2118159443	  ----> very likely our sql
31-JAN-11 10.50.39.061                        294              39 3p1b93fq81x0f 2118159443
31-JAN-11 10.50.38.061                        294              39 3p1b93fq81x0f 2118159443
31-JAN-11 10.50.37.061                        294              39 3p1b93fq81x0f 2118159443
31-JAN-11 10.50.36.061                        294              39 3p1b93fq81x0f 2118159443
31-JAN-11 10.50.35.061                        294              39 3p1b93fq81x0f 2118159443
31-JAN-11 10.50.34.051                        294              39 3p1b93fq81x0f 2118159443
31-JAN-11 10.50.33.051                        294              39 3p1b93fq81x0f 2118159443
31-JAN-11 10.50.32.051                        294              39 3p1b93fq81x0f 2118159443

9 rows selected.

DATABASE1>-----LETS CHECK THE SQL

DATABASE1> @sqlid 3p1b93fq81x0f

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- --------------------------------------------------------------------------------------------------------
3p1b93fq81x0f 2894132238 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

 CH#   PLAN_HASH OPT_COST EXECUTIONS    FETCHES ROWS_PROCESSED     CPU_MS     ELA_MS    AVG_ELA       LIOS       PIOS USERS_EXECUTING
----  ---------- -------- ---------- ---------- -------------- ---------- ---------- ---------- ---------- ---------- --------------
   0  2118159443   956322          3          1              0 861533.028    5876120 1958.70667   21851873   21739212          2

Elapsed times + users_executing columns saying that these sqls has been running for a very long time and they are %99.999 the ones we are looking for

There are tiny challenges with this approach but not that much challeging.

I used it very much in RAC env with a system where multiple users and child cursors are also running
so if you are careful enough to compare apples with apples (same instance, same plan hash value,right session serial, longer elapsed time cursor) you are very unlikely to be mislead

code used for last_sql is like below (modified a bit for rac aware)


select inst_id,sample_time,session_id,session_serial#,sql_id from (
select inst_id,sample_time,session_id,session_serial#,sql_id from
gv$active_session_history
where
sql_id is not null
 and session_id=&1 and inst_id=&2
 order by 1 desc
) where rownum

Method 2- Oradebug dump error stack (10g-11g no license needed)- Most reliable and certain way it is %100 accurate and good part is it does not need any input from me since Tanel Poder already covered how to go through error stack on his post how to read errorstack output

I tried 10046 and never seen any sql_id information written for the 10 mins period I gathered the trace so I’m not %100 sure if it needs whole call to be finished before it writes the sql_id information. I personally won’t use 10046 for this case.

Method-3- SQL Monitoring (11g onwards-license needed). This is my favourite and its damn quick and %99.9 it works if the sql was running long enough to get monitored and then lost the track. It is damn simple like this piece of code

select distinct inst_id,sid,status,sql_id,sql_plan_hash_value,sql_child_address,sql_exec_id
 from gv$sql_plan_monitor
 where sid in (&1) and status='EXECUTING';

—Method 3 On Action

DATABASE1> @swact

    SID USERN STATE   EVENT                        SEQ# SEC_IN_STATE         P1         P2         P3  SQL_ID         CHN
------- ----- ------- ---------------------- ---------- ------------ ---------- ---------- ----------  ------------- ----
   1352 XXWEB WAITING db file scattered read      40906            0         67    2638509         32                       --->sql_id empty
    294 XXWEB WAITING db file scattered read      16727            0         82     234468         32                       --->sql_id empy
   1263 COSKA WORKING On CPU / runqueue              44            0 1413697536          1          0  f5cqythck1gfz    1

---Check sql_monitoring
SQL> @mon_sql 1352,294

INST_ID  SID STATUS              SQL_ID        SQL_PLAN_HASH_VALUE SQL_CHILD_ADDRES SQL_EXEC_ID
-------	---- ------------------- ------------- ------------------- ---------------- -----------
1        1352 EXECUTING           3p1b93fq81x0f          2118159443 00000002D88378E8 16777216
1        1352 EXECUTING           2kgnch9h5hbkz          1874321290 00000002D6223418 16777216  ---paren pl/sql
1         294 EXECUTING           3p1b93fq81x0f          2118159443 00000002D88378E8 16777217
1         294 EXECUTING           2kgnch9h5hbkz          1874321290 00000002D6223418 16777217

The second sql_id 2kgnch9h5hbkz is from pl_sql parent call which was calling 3p1b93fq81x0f so they both get monitored by v$sql_monitor

Apart from all these again if you have diagnostics pack license you can always use OEM/Grid Control to check back the sessions history but comparing with querying v$sql_monitor it is as slow as driving an electric car when you have ferrari.

As you can see there are more ways then listed if your session lost track of the sql_ids. If you have more ways (especially with tracing -like dump sql if there is an option like that) I would really like to hear.

Update after first comment

Method-4 Using v$open_cursor

First comment on which Michael Dinh mentioned he is using toad open cursors option to hunt down the sqls but he was not sure for the sqlplus.
I had a chance to to check with one of our systems. Looks like v$open_cursor is giving us information which not as clear as the other methods but still can be very
useful for standart edition or users without diagnostics pack and also can be easier than error stack which usually needs a change request on PRD systems.

SQL> @sw 207,980

    SID USERN STATE   EVENT                    SEQ# SEC_IN_STATE SQL_ID         CHN                        P1                        P2                        P3
------- ----- ------- ------------------ ---------- ------------ ------------- ---- ------------------------- ------------------------- -------------------------
    207 XXXXX WORKING On CPU / runqueue       62033            3                    cellhash#=203CB5D5         diskhash#= 349428727      bytes= 16384
    980 XXXXX WORKING On CPU / runqueue         759            1                    cellhash#=D84BEC46         diskhash#=56AC941E       bytes= 16384

SQL> @mon_sql 207,980

   INST_ID        SID STATUS              SQL_ID        SQL_PLAN_HASH_VALUE SQL_CHILD_ADDRES SQL_EXEC_ID
---------- ---------- ------------------- ------------- ------------------- ---------------- -----------
         2        980 EXECUTING           91uc7cvw2rqab          4166729024 000000040D4560F8    33554458
         2        207 EXECUTING           91uc7cvw2rqab          4166729024 000000040D4560F8    33554456

SQL> select * from v$open_cursor where sid in (207,980) order by 2;

SADDR                   SID USER_NAME  ADDRESS          HASH_VALUE SQL_ID        SQL_TEXT LAST_SQL_ SQL_EXEC_ID CURSOR_TYPE
---------------- ---------- ---------- ---------------- ---------- ------------- -------- --------- ----------- -----------------------------------
00000004508BA640        207 XXXXXXX    00000003A72340A0 2292456266 gw771824a86ua Begin PK              33554456 OPEN
00000004508BA640        207 XXXXXXX    00000003E6F0C230 2181403846 5bvw0ka10b566 Begin PK                       OPEN
00000004508BA640        207 XXXXXXX    000000040E81D128 4111302712 6k03snbuhv01s Begin PK                       OPEN
00000004508BA640        207 XXXXXXX    00000003EA88E4F0 1535077424 68u7tx5dryv1h Begin PK                       OPEN
00000004508BA640        207 XXXXXXX    00000004592E3690  914163366 4vs91dcv7u1p6 insert i                       OPEN-RECURSIVE
00000004508BA640        207 XXXXXXX    0000000411FC64C0 1407509578 8nwyam59y9t2a Begin PK                       OPEN
00000004508BA640        207 XXXXXXX    00000004117E1340 2399958818 62h75aq7hsxt2 Begin PK                       OPEN
00000004508BA640        207 XXXXXXX    00000004125C3DC0 4163623243 91uc7cvw2rqab INSERT I              33554456 OPEN-PL/SQL
0000000452AA1BB8        980 XXXXXXX    00000004125C3DC0 4163623243 91uc7cvw2rqab INSERT I              33554458 OPEN-PL/SQL
0000000452AA1BB8        980 XXXXXXX    00000004592E3690  914163366 4vs91dcv7u1p6 insert i                       OPEN-RECURSIVE
0000000452AA1BB8        980 XXXXXXX    000000040E81D128 4111302712 6k03snbuhv01s Begin PK                       OPEN
0000000452AA1BB8        980 XXXXXXX    0000000411FC64C0 1407509578 8nwyam59y9t2a Begin PK                       OPEN
0000000452AA1BB8        980 XXXXXXX    00000003E6F0C230 2181403846 5bvw0ka10b566 Begin PK                       OPEN
0000000452AA1BB8        980 XXXXXXX    00000004117E1340 2399958818 62h75aq7hsxt2 Begin PK                       OPEN
0000000452AA1BB8        980 XXXXXXX    00000003EA88E4F0 1535077424 68u7tx5dryv1h Begin PK                       OPEN
0000000452AA1BB8        980 XXXXXXX    00000003A72340A0 2292456266 gw771824a86ua Begin PK              33554458 OPEN

Once we spot the sqls we need to check all of them one by one on v$sql where if users_executing column (UEX) populated

SQL> @sqlid gw771824a86ua   ---->bingo with two users and high ELAPSED TIMES. this is a pl/sql package calling the sql below

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- --------------------------------------------------------------------------------------------------------------------------------
gw771824a86ua 2292456266 xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

INS  CH#  UEX  PLAN_HASH   OPT_COST        CPU_MS     ELA_MS    AVG_ELA OFFLOAD IO_SAVED_% IO_ELIG_GB IO_INTER_GB       LIOS       PIOS
--- ---- ---- ---------- ----------  - ---------- ---------- ---------- ------- ---------- ---------- ----------- ---------- ----------
  2    0    2          0          0     412784696  414481976  16579.279 No             .00        .00      58.007 2653518789    3248794
  3    0    0          0          0    3215639.15 3307820.05 1653.91002 No             .00        .00       3.212 1480568056     184192

----
---- Tried for all other SQLS but thet do not have users_executing column populated
----

SQL> @sqlid 91uc7cvw2rqab   -->two users and high ELAPSED TIMES on 4th child

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- -----------------------------------------------------------------------------------------------------------------------------------
91uc7cvw2rqab 4163623243 xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

INS  CH#  UEX  PLAN_HASH   OPT_COST       CPU_MS     ELA_MS    AVG_ELA OFFLOAD IO_SAVED_% IO_ELIG_GB IO_INTER_GB       LIOS       PIOS
--- ---- ---- ---------- ----------   ---------- ---------- ---------- ------- ---------- ---------- ----------- ---------- ----------
  2    0    0 4166729024      13081   91245882.5 91690487.2 18338.0974 No             .00        .00      11.644 3739546208     657923
  2    1    0 4166729024      13081    120056449  120491925 15061.4907 No             .00        .00      15.889 1250311783     883494
  2    2    0 4166729024      13081   64616814.8 64899824.1 12979.9648 No             .00        .00      10.564  424413836     587066
  2    3    0 4166729024      13081   58282307.7 58454183.2 19484.7277 No             .00        .00       5.442 1809526416     303954
  2    4    2 4166729024      13081   78886102.5 79233558.3 19808.3896 No             .00        .00      13.555 4163654789     756589

Thanks Michael for the information so we have 1 more method which does not need license

August 13, 2011

dbms_xplan.display_sql_plan_baseline can also lie !

Filed under: Bugs, CBO, Performance, Plan Stability — coskan @ 6:40 pm

This blog post is about another buggy/funny/misleading behaviour which you MUST know if you are using baselines ( especially in a non-controlled fashion by let oracle manage them
) It looks like a long post but it is actually an easy read since nothing rocket science in the post.

All story started when one of our batches was not using the plan stored in the sql plan baseline repository.
In these cases app support check if the baseline is used by checking v$sql.sql_plan_baseline value.

If it is not used then they check the enabled, accepted baselines by using the statement below as in documentation

dbms_xplan.display_sql_plan_baseline(sql_handle=>’&sql_handle’,plan_name=>’&sql_plan’, format=>’BASIC’));

Once they find the plan_hash_value they are looking for they try to fix it and expect Oracle use the baseline which it should use in theory.

In our case it didn’t. When I checked the plan baseline was created in september 2010 and never actually used since 14 January 2011.
Fun part is some occasions same plan hash value in the baseline is being generated but Oracle is not reporting that baseline is used. But that’s the whole point of baselines having a constant plan hash value. Why oracle is not using the plan baseline even it generates the same plan with same plan hash value

We raised the issue with Oracle and oracle come back with couple of request

1- 10053 trace
2- plan baseline trace( alter session set events ‘trace [SQL_Plan_Management.*]‘;)

We provided the requests and they said if there was a change in the table structure at some point which is causing the plan can’t be reproduced. App team confirmed that there was a structural change to drop global index and recreate it as local on actually the date when the baseline was last used.
The problem is we changed a global index to local index on 14 January 2011 with an index name change, but display_sql_plan_baseline shows us a plan which uses a local index with the current name.This cannot be possible since the baseline is create on September 2010. So what was wrong?

Support could not managed to give me a decent answer or accept that it is a bug so it is time for me to create a test case and reveal the truth why this issue is happening.


HR@ORACOS> ---simple table creation
HR@ORACOS> create table baseline_test as select * from dba_objects;

Table created.

HR@ORACOS> ---index creation
create unique index baseline_test_idx_1
  2  on baseline_test(object_id);

Index created.

HR@ORACOS> ---gather table stats
HR@ORACOS> @gts baseline_test

PL/SQL procedure successfully completed.

HR@ORACOS> HR@ORACOS>
HR@ORACOS>
HR@ORACOS> ---start session level baseline capturing process
HR@ORACOS> alter session set optimizer_capture_sql_plan_baselines=TRUE;

Session altered.

HR@ORACOS> ---run the query below several times for plan capturing happens
HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> @e1

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  910wg4cc9fbwn, child number 0
-------------------------------------
select object_name from baseline_test  where object_id=5 and
data_object_id=10

Plan hash value: 3566649846

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |     1 |    32 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_1 |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("DATA_OBJECT_ID"=10)
   2 - access("OBJECT_ID"=5)

Note
-----
   - SQL plan baseline SQL_PLAN_g64u8gfpwnfr028052575 used for this statement

25 rows selected.

HR@ORACOS> @sql_plan_baselines2 910wg4cc9fbwn

##################
#active baselines#
##################

   INST_ID   UE SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER SQL_PLAN_BASELINE              SQL_HANDLE                     PLAN_NAME                      ENABLED ACCEPTED FIXED CREATED                   LAST_EXECUTED
---------- ---- ------------- --------------- ------------ ------------------------------ ------------------------------ ------------------------------ ------- -------- ----- ------------------------- -------------------------
         1    0 910wg4cc9fbwn      3566649846            0 SQL_PLAN_g64u8gfpwnfr028052575 SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr028052575 YES     YES      NO    13-AUG-11 15.45.42.000000 13-AUG-11 15.45.42.000000

Now it is time to check the plan output


HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'BASIC'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr028052575

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr028052575         Plan id: 671425909
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

Plan hash value: 3566649846

-----------------------------------------------------------
| Id  | Operation                   | Name                |
-----------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |
|   2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_1 |
-----------------------------------------------------------

21 rows selected.

Since we have a baseline now we can change the table index structure to replicate the issue


HR@ORACOS> ---now drop the index
HR@ORACOS> drop index baseline_test_idx_1
  2  ;

Index dropped.

HR@ORACOS> create unique index baseline_test_idx_2 on baseline_test(data_object_id,object_id);

Index created.

HR@ORACOS> ---Run it couple of times for baseline generation

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> @e1

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  910wg4cc9fbwn, child number 0
-------------------------------------
select object_name from baseline_test  where object_id=5 and
data_object_id=10

Plan hash value: 3304552195

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |     1 |    32 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_2 |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DATA_OBJECT_ID"=10 AND "OBJECT_ID"=5)

20 rows selected.

As you can see baseline no baseline is used. Lets check if a baseline captured ?


HR@ORACOS> @sql_plan_baselines2 910wg4cc9fbwn

##################
#active baselines#
##################

   INST_ID   UE SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER SQL_PLAN_BASELINE              SQL_HANDLE                     PLAN_NAME                      ENABLED ACCEPTED FIXED CREATED                   LAST_EXECUTED
---------- ---- ------------- --------------- ------------ ------------------------------ ------------------------------ ------------------------------ ------- -------- ----- ------------------------- -------------------------
         1    0 910wg4cc9fbwn      3304552195            0                                SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr028052575 YES     YES      NO    13-AUG-11 15.45.42.000000 13-AUG-11 15.45.42.000000
         1    0 910wg4cc9fbwn      3304552195            0                                SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr09db5a957 YES     NO       NO    13-AUG-11 15.49.54.000000

Two baselines are captured in the system, one is accepted which is the one initially created before the index recreation but the other one is not accepted ??

Now it is time to see what the plans of these baselines

HR@ORACOS> --ACCEPTED PLAN
HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'BASIC'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr028052575

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr028052575         Plan id: 671425909
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

Plan hash value: 3304552195

-----------------------------------------------------------
| Id  | Operation                   | Name                |
-----------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |
|   2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_2 |
-----------------------------------------------------------

21 rows selected.

HR@ORACOS> ---NON ACCEPTED NEWLY GENERATED PLAN

HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'BASIC'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr09db5a957

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr09db5a957         Plan id: 2645928279
Enabled: YES     Fixed: NO      Accepted: NO      Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

Plan hash value: 3304552195

-----------------------------------------------------------
| Id  | Operation                   | Name                |
-----------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |
|   2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_2 |
-----------------------------------------------------------

21 rows selected.

They are same ???? How come the plan can be same with the one which has different index names ??
If you are a careful reader you might have spotted The plan_id is different with same plan hash value

Lets check with ADVANCED option instead BASIC


HR@ORACOS> ----ACCEPTED BASELINE

HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'ADVANCED'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr028052575

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr028052575         Plan id: 671425909
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Outline Data from SMB:

  /*+
      BEGIN_OUTLINE_DATA
      INDEX_RS_ASC(@"SEL$1" "BASELINE_TEST"@"SEL$1" ("BASELINE_TEST"."OBJECT_ID"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */
--------------------------------------------------------------------------------

Plan hash value: 3304552195

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |     1 |    32 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |     1 |    32 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_2 |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1 / BASELINE_TEST@SEL$1
   2 - SEL$1 / BASELINE_TEST@SEL$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      INDEX_RS_ASC(@"SEL$1" "BASELINE_TEST"@"SEL$1" ("BASELINE_TEST"."DATA_OBJECT_ID"
              "BASELINE_TEST"."OBJECT_ID"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DATA_OBJECT_ID"=10 AND "OBJECT_ID"=5)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "OBJECT_NAME"[VARCHAR2,128]
   2 - "BASELINE_TEST".ROWID[ROWID,10]

68 rows selected.

Did you spot something ? Look carefully because I did not look carefully and wasted 1 day. Once you know how to get the higher detail of information better you should look carefully


HR@ORACOS>---NON ACCEPTED NEWLY GENERATED PLAN
hR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'ADVANCED'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr028052575

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr028052575         Plan id: 671425909
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Outline Data from SMB:

  /*+
      BEGIN_OUTLINE_DATA
      INDEX_RS_ASC(@"SEL$1" "BASELINE_TEST"@"SEL$1" ("BASELINE_TEST"."OBJECT_ID"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */
--------------------------------------------------------------------------------

Plan hash value: 3304552195

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |     1 |    32 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |     1 |    32 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_2 |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1 / BASELINE_TEST@SEL$1
   2 - SEL$1 / BASELINE_TEST@SEL$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      INDEX_RS_ASC(@"SEL$1" "BASELINE_TEST"@"SEL$1" ("BASELINE_TEST"."DATA_OBJECT_ID"
              "BASELINE_TEST"."OBJECT_ID"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DATA_OBJECT_ID"=10 AND "OBJECT_ID"=5)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "OBJECT_NAME"[VARCHAR2,128]
   2 - "BASELINE_TEST".ROWID[ROWID,10]

68 rows selected.

If you haven’t spotted already, the problem with the accepted plan output is the difference between OUTLINE DATA FROM SMB and OUTLINE DATA sections.
There are two outline sections per baseline output. one is OUTLINE DATA FROM SMB and other is OUTLINE DATA

Now you can do line by line comparison and see the difference.

Outline Data from SMB:

/*+
BEGIN_OUTLINE_DATA
INDEX_RS_ASC(@”SEL$1″ “BASELINE_TEST”@”SEL$1″ (“BASELINE_TEST”.”OBJECT_ID”))
OUTLINE_LEAF(@”SEL$1″)
ALL_ROWS
DB_VERSION(’11.2.0.1′)
OPTIMIZER_FEATURES_ENABLE(’11.2.0.1′)
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/

Outline Data
————-

/*+
BEGIN_OUTLINE_DATA
INDEX_RS_ASC(@”SEL$1″ “BASELINE_TEST”@”SEL$1″ (“BASELINE_TEST”.”DATA_OBJECT_ID”
“BASELINE_TEST”.”OBJECT_ID”))
OUTLINE_LEAF(@”SEL$1″)
ALL_ROWS
DB_VERSION(’11.2.0.1′)
OPTIMIZER_FEATURES_ENABLE(’11.2.0.1′)
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/

What happens here is that Oracle only stores the outline information and when you run the display_sql_plan_baseline
Oracle tries to generate the plan output from the outline but problem starts there because it tries to use the index which is not existent on the system anymore so it can’t reproduce the plan and instead of giving any warning or error it generates a plan for you by lying you.

To confirm what we said lets drop the secondary index and see what dbms_xplan.display_sql_plan_baseline will generate for us for the same baseline


HR@ORACOS> ---dropping index
HR@ORACOS>
HR@ORACOS> drop index baseline_test_idx_2;

Index dropped.

------FIRST BASELINE CREATED

HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'BASIC'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr09db5a957

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr09db5a957         Plan id: 2645928279
Enabled: YES     Fixed: NO      Accepted: NO      Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

Plan hash value: 2027741861

-------------------------------------------
| Id  | Operation         | Name          |
-------------------------------------------
|   0 | SELECT STATEMENT  |               |
|   1 |  TABLE ACCESS FULL| BASELINE_TEST |
-------------------------------------------

20 rows selected.

As you can see now it generates a new plan for us for the same baseline :) What a liar

What is the output from the valid but not accepted baseline, once we dropped the index it was using


HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'BASIC'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr028052575

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr028052575         Plan id: 671425909
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

Plan hash value: 2027741861

-------------------------------------------
| Id  | Operation         | Name          |
-------------------------------------------
|   0 | SELECT STATEMENT  |               |
|   1 |  TABLE ACCESS FULL| BASELINE_TEST |
-------------------------------------------

20 rows selected.

Same issue happened again.

Lets run the same sql to see if it generates and accepts a baseline with full table scan


HR@ORACOS> ---Lets run the sql again to force a plan
HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> @e1

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  910wg4cc9fbwn, child number 0
-------------------------------------
select object_name from baseline_test  where object_id=5 and
data_object_id=10

Plan hash value: 2027741861

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |       |       |   217 (100)|          |
|*  1 |  TABLE ACCESS FULL| BASELINE_TEST |     1 |    32 |   217   (1)| 00:00:02 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("OBJECT_ID"=5 AND "DATA_OBJECT_ID"=10))

19 rows selected.

The baseline is generated as below but it is still not accepted.


HR@ORACOS> @sql_plan_baselines2 910wg4cc9fbwn

##################
#active baselines#
##################

   INST_ID   UE SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER SQL_PLAN_BASELINE              SQL_HANDLE                     PLAN_NAME                      ENABLED ACCEPTED FIXED CREATED                   LAST_EXECUTED
---------- ---- ------------- --------------- ------------ ------------------------------ ------------------------------ ------------------------------ ------- -------- ----- ------------------------- -------------------------
         1    0 910wg4cc9fbwn      2027741861            0                                SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr028052575 YES     YES      NO    13-AUG-11 15.45.42.000000 13-AUG-11 15.45.42.000000
         1    0 910wg4cc9fbwn      2027741861            0                                SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr09db5a957 YES     NO       NO    13-AUG-11 15.49.54.000000
         1    0 910wg4cc9fbwn      2027741861            0                                SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr018837053 YES     NO       NO    13-AUG-11 15.59.15.000000

I assume You already understood the issue. This is a proper buggy/dodgy behaviour for me.

Now lets try some workaround to see how we can get over from this situation.

There are 2 ways to get around from this problem and all of them needs DBA input. I have tested all and they are working but you can go ahead and test them. I don’t want to make this blog post longer than it is so I’m only listing the options.

1- Evolve the baseline for the sql_handle so Oracle accepts the plans (will keep the invalid ones in the system)

2- Drop the invalid baselines and evolve the handle (this way you will get rid of the invalid ones)

3- Drop all baselines related with the objects altered and allow capturing to the trick for you. (Less headache than second but you may lose the good plans- still would be my approach for releases )

Even Oracle warns in documentation “SQL plan baselines cannot help in cases where an event has caused irreversible execution plan changes, such as dropping an index.” and also ” if a change in the system (such as a dropped index) causes all accepted plans to become non-reproducible, then the optimizer selects the best-cost plan. “
The did not give much information about the behaviour.

From my point of view there are two major problems which needs to be addressed for this specific table structure change case

1- Instead of showing a plan which is not same as baseline, display_sql_plan_baseline must report an error that stored baseline is not reproduce-able

2- Baselines are not invalidated there must be an invalidation flag which makes the baseline invalidated at very first time CBO can’t reproduce the plan via baseline and set enabled and accepted flag to no. Since the purging of the plans is 53 weeks by
default for non used baselines this is a needed enhancement. If a plan is accepted and not produceable with smaller cost it avoids other plans accepted.

Another problem with baselines is also covered in this clear post by Maxym Kharchenko which can be related with my first point (that very same post series from Maxym  is one of the reasons I did not continue my series since author nearly covered all -but I still have something to write) What happens if both objects are under two schemas like some development groups uses for different testing phases. One drops the index one did not and you have system level capturing on. ?

I think Oracle must address these issues by introducing another complexity for sql signatures which involves schema owner or something similar.

Long story short you might be easily fooled by baselines with the current structure. If I were you I will always check the plans with at least OUTLINE option or go for ADVANCED.

Confession:n I am always using advanced option but as usual I did not pay that much attention to the output at first point :( That was my biggest mistake there which cost 1 day for analysis. knowing and using an option but not paying attention to the output of the option.)

All tests are done on 11.2.0.1

Bonus material

For the ones who likes tracefiles this issue is shown as this

=======TRACEFILE ======

------- START SPM Plan Dump -------
SPM: failed to reproduce the plan using the following info:
  parse_schema name        : HR
  plan_baseline signature  : 17515423071774325472
  plan_baseline plan_id    : 671425909
  plan_baseline hintset    :
    hint num  1 len 37 text: OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
SPM: generated non-matching plan:
----- Explain Plan Dump -----
----- Plan Table -----

============
Plan Table
============
------------------------------------------+-----------------------------------+
| Id  | Operation          | Name         | Rows  | Bytes | Cost  | Time      |
------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT   |              |       |       |   217 |           |
| 1   |  TABLE ACCESS FULL | BASELINE_TEST|     1 |    32 |   217 |  00:00:02 |
------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
1 - filter(("OBJECT_ID"=5 AND "DATA_OBJECT_ID"=10))

Content of other_xml column
===========================
  db_version     : 11.2.0.1
  parse_schema   : HR
  plan_hash      : 2027741861
  plan_hash_2    : 411267155
  Outline Data:
  /*+
    BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      DB_VERSION('11.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "BASELINE_TEST"@"SEL$1")
    END_OUTLINE_DATA
  */

------- END SPM Plan Dump -------
SPM: couldn't reproduce any enabled+accepted plan so using the cost-based plan, planId = 411267155

March 4, 2011

Plan stability through Upgrade to 11G-Why is my plan changed?-extra nested loop

Filed under: Bugs, Performance, Plan Stability — coskan @ 1:30 pm

After you upgrade to 11G from lower versions you might realize that some of your plans involving nested loops are changed and you started to see extra nested loops.
According to documentation this is because of the “new” implementation for Nested Loop Joins.

Document says
Oracle Database 11g introduces a new implementation for nested loop joins. As a result, execution plans that include nested loops might appear different
than they did in previous releases of Oracle Database. Both the new implementation and the original implementation for nested loop
joins are possible in Oracle Database 11g. So, when analyzing execution plans, it is important to understand that the number of NESTED LOOPS join row sources
might be different.

Oracle Database 11g introduces a new implementation for nested loop joins to reduce overall latency for physical I/O.
When an index or a table block is not in the buffer cache and is needed to process the join, a physical I/O is required.
Oracle Database 11g can batch multiple physical I/O requests and process them using a vector I/O instead of processing them one at a time.

As part of the new implementation for nested loop joins, two NESTED LOOPS join row sources might appear in the execution plan where only one
would have appeared in prior releases. In such cases, Oracle Database allocates one NESTED LOOPS join row source to join the values from the table
on the outer side of the join with the index on the inner side. A second row source is allocated to join the result of the first join,
which includes the rowids stored in the index, with the table on the inner side of the join.

Here are the plan differences between new implementation and old one.

Query of the plan

select /*+ use_nl(t2) full(t2) gather_plan_statistics*//*test_650*/ * from t2,t1
where t1.id=t2.id
and t2.id>650000;

for 10.2.0.4


HR@ORACOS> @ofe
Enter value for ofe: 10.2.0.4

Session altered.

HR@ORACOS> @test_e

Explained.

HR@ORACOS> @e

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1708821971

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |   350K|    78M|   703K  (1)| 00:58:40 |
|   1 |  NESTED LOOPS                |           |   350K|    78M|   703K  (1)| 00:58:40 |
|*  2 |   TABLE ACCESS FULL          | T2        |   350K|    39M|  3446   (1)| 00:00:18 |
|   3 |   TABLE ACCESS BY INDEX ROWID| T1        |     1 |   118 |     2   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | T1_IDX_ID |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("T2"."ID">650000)
   4 - access("T1"."ID"="T2"."ID")
       filter("T1"."ID">650000)

18 rows selected.

HR@ORACOS>

For 11.2.0.1


HR@ORACOS> @ofe
Enter value for ofe: 11.2.0.1

Session altered.

HR@ORACOS> @test_e

Explained.

HR@ORACOS> @e

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3217821761

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |   350K|    78M|   703K  (1)| 00:58:40 |
|   1 |  NESTED LOOPS                |           |       |       |            |          |
|   2 |   NESTED LOOPS               |           |   350K|    78M|   703K  (1)| 00:58:40 |
|*  3 |    TABLE ACCESS FULL         | T2        |   350K|    39M|  3446   (1)| 00:00:18 |
|*  4 |    INDEX UNIQUE SCAN         | T1_IDX_ID |     1 |       |     1   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1        |     1 |   118 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("T2"."ID">650000)
   4 - access("T1"."ID"="T2"."ID")
       filter("T1"."ID">650000)

19 rows selected.

According to me, this change brought extra complexity to read the plans. NLJ plans prior to 11G are way easier to read and bear in mind this example is just a simple 2 table join.
Once you query has got crazy number of joins it gets harder and harder to read. One thing you can spot is that the cost is “same” which is very good for plan change analysis.
According to my experience, %99 of same cost plan changes are because of this implementation and this made me exclude same cost-different plan_hash_value plans from comparisons.

On the last post, I already gave some information about the controlling hidden CBO parameter _nlj_batching_enabled.
Better I repeat here, this parameter has 2 values, 0 to disable the behaviour, 1 (which is default) to enable the behavior and this parameter is there since 11GR1.

When I interpret the sentence “new implementation for nested loop joins to reduce overall latency for physical I/O”. I expect some improvement with runtime like the effect of bulk collect or changing the array size of your sessions. But is my assumption true ?
I have doubts about this definition because in reality I do not feel any difference as interpreted from the official definition.
Nothing got faster in our production environment related with this change.

To prove my experience, I build a test case which you can find in the bottom of the page.

My test environment is 11.2.0.1 on Ubuntu with 1.5G memory target on a machine with 4 cores.

For this test I have 3 tables all of which have ID columns and have an index on ID columns

T1=100 million rows (I think I was too generous)
T2=1 million rows
T3=50K rows -derived from T2

First query is for small data set which I can do buffer cache read after the initial run to test non phsical I/O performance

select /*+ use_nl(t3) full(t3) gather_plan_statistics*//*test_999550*/ * from t3,t1
where t1.id=t3.id;

Runtime stats and plan statistics for this sql in 11.2.0.1 with _nlj_batching_enabled=0

49999 rows selected.

Elapsed: 00:00:01.51

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      52395  consistent gets
       1842  physical reads
          0  redo size
    1781536  bytes sent via SQL*Net to client
       1612  bytes received via SQL*Net from client
        101  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      49999  rows processed

Plan hash value: 692324935

------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      0 |        |      0 |00:00:00.01 |       0 |      0 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |      1 |  49999 |00:00:00.83 |   52395 |   1842 |
|   2 |   NESTED LOOPS              |           |      1 |  50143 |  99999 |00:00:04.65 |    2396 |    978 |
|   3 |    TABLE ACCESS FULL        | T3        |      1 |  49999 |  49999 |00:00:00.11 |     949 |    849 |
|*  4 |    INDEX UNIQUE SCAN        | T1_IDX_ID |  49999 |      1 |  49999 |00:00:00.16 |    1447 |    129 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."ID"="T3"."ID")

Runtime stats and plan statistics for this sql in 11.2.0.1 with _nlj_batching_enabled=1

select /*+ use_nl(t3) full(t3) gather_plan_statistics*//*test_999550*/ * from t3,t1
  2  where t1.id=t3.id;

49999 rows selected.

Elapsed: 00:00:01.40

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      52395  consistent gets
       1842  physical reads
          0  redo size
    1781536  bytes sent via SQL*Net to client
       1612  bytes received via SQL*Net from client
        101  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      49999  rows processed

Plan hash value: 198103700

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |  49999 |00:00:00.74 |   52395 |   1842 |
|   1 |  NESTED LOOPS                |           |      1 |        |  49999 |00:00:00.74 |   52395 |   1842 |
|   2 |   NESTED LOOPS               |           |      1 |  50143 |  49999 |00:00:00.27 |    2396 |    978 |
|   3 |    TABLE ACCESS FULL         | T3        |      1 |  49999 |  49999 |00:00:00.07 |     949 |    849 |
|*  4 |    INDEX UNIQUE SCAN         | T1_IDX_ID |  49999 |      1 |  49999 |00:00:00.15 |    1447 |    129 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1        |  49999 |      1 |  49999 |00:00:00.22 |   49999 |    864 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."ID"="T3"."ID")

Second query is for bigger data set which always do some physical read for T2

select /*+ use_nl(t2) full(t2) gather_plan_statistics*//*test_650*/ * from t2,t1
where t1.id=t2.id
and t2.id>650000;

Runtime stats and plan statistics for this sql in 11.2.0.1 with _nlj_batching_enabled=0


select /*+ use_nl(t2) full(t2) gather_plan_statistics*//*test_650*/ * from t2,t1
where t1.id=t2.id
and t2.id>650000;
  2    3

350000 rows selected.

Elapsed: 00:00:10.11

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     377819  consistent gets
      23736  physical reads
          0  redo size
   12606669  bytes sent via SQL*Net to client
       8212  bytes received via SQL*Net from client
        701  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     350000  rows processed

Plan hash value: 1010637246

------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |      0 |        |      0 |00:00:00.01 |       0 |      0 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1        |      1 |      1 |    350K|00:00:04.91 |     377K|  23736 |
|   2 |   NESTED LOOPS              |           |      1 |    350K|    700K|00:01:46.31 |   27819 |  17752 |
|*  3 |    TABLE ACCESS FULL        | T2        |      1 |    350K|    350K|00:00:01.39 |   17643 |  16951 |
|*  4 |    INDEX UNIQUE SCAN        | T1_IDX_ID |    350K|      1 |    350K|00:00:01.05 |   10176 |    801 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("T2"."ID">650000)
   4 - access("T1"."ID"="T2"."ID")
       filter("T1"."ID">650000)

Runtime stats and plan statistics for this sql in 11.2.0.1 with _nlj_batching_enabled=1

350000 rows selected.

Elapsed: 00:00:10.00

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     377819  consistent gets
      23736  physical reads
          0  redo size
   12606669  bytes sent via SQL*Net to client
       8212  bytes received via SQL*Net from client
        701  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     350000  rows processed

Plan hash value: 3217821761

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |    350K|00:00:04.67 |     377K|  23736 |
|   1 |  NESTED LOOPS                |           |      1 |        |    350K|00:00:04.67 |     377K|  23736 |
|   2 |   NESTED LOOPS               |           |      1 |    350K|    350K|00:00:02.54 |   27819 |  17752 |
|*  3 |    TABLE ACCESS FULL         | T2        |      1 |    350K|    350K|00:00:01.02 |   17643 |  16951 |
|*  4 |    INDEX UNIQUE SCAN         | T1_IDX_ID |    350K|      1 |    350K|00:00:01.01 |   10176 |    801 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1        |    350K|      1 |    350K|00:00:01.47 |     350K|   5984 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("T2"."ID">650000)
   4 - access("T1"."ID"="T2"."ID")
       filter("T1"."ID">650000)

On the last post I already gave some information about the controlling parameter _nlj_batching_enabled so I change only this parameter to do my test and “keep OFE 11.2.0.1″

I ran these queries 4*50 times

1- _nlj_batching_enabled=1 (11G behaviour), flush the buffer cache twice and flush the cursor before first run, and run the statement 50 times.
2- _nlj_batching_enabled=1 (11G behaviour), flush the cursor at first run, flush the buffer cache twice before “each run” and run the statement 50 times to force phsical read all the time
3- _nlj_batching_enabled=0 (pre 11G behaviour), flush the buffer cache twice and flush the cursor before first run, and run the statement 50 times.
4- _nlj_batching_enabled=0 (pre 11G behaviour), flush the cursor at first run, flush the buffer cache twice before “each run” and run the statement 50 times to force phsical read all the time

After each run I snap the statistics from v$sql for the related sql and then do the comparison afterwards

Here are the results -( AVG1 includes first run for the step 1 and 3 which does both phsical and buffer cache read first run is not included on other calculations)

New behavior was faster in my Small dataset test.
I only consider median and for buffer cache reads it 10 milisecond less for CPU and 25 miliseconds less for Total ELA.
In Percentage we can say ~%5 for CPU and %6 for Total ELA
When it comes to phsical reads again 10 milisecond less for CPU and 11 miliseconds less for Total ELA
In percentage improvement is ~%4 for CPU and ~%1.5 for Total ELA

I was initially expecting bigger dataset will be again faster because we will do more phsical read but
This time new behaviour is slower for big dataset
For buffer cache reads 20 miliseconds slower for CPU and ~3 miliseconds slower for total ELA
In percentage degredation is ~%1.5 for CPU and ~%0.06 for Total ELA
For phsical reads 40 miliseconds slower for CPU and 7.5 miliseconds slower for total ELA
In percentage degredation is ~%2.5 for CPU and ~%0.14 for Total ELA

When I interpret these results I have the gut feeling that this is not like what is written in the documentation.
It looks like response time wise behavior is nearly same. From this output one  can say either CBO engineers wanted to make it harder to read the plans or maybe they wanted to be more honest to show what was always there.
Better optimizer group write a blog post about it rather than me :)

There is a bug note which also supports the idea that its been there since 9i

Bug 9919556: OFN DB: CUSTOMER INTERFACE PROGRAM TAKING LONGER TIME AFTER 11G R2 UPGRADE

There are also some other bugs related with dumps and wrong results reported for this behavior which is inconsistent with the idea that its been there for a long time.
Bugs disappeared when behavior is disabled and revert back to original so it loooks like has some effect somewhere but not necesarrily on performance as it can be interpreted??

Now its time for bonus material, probably careful eyes already spotted that the plan with _nlj_batching_enabled=0 is different than plan would be generated on 10.2.04 and also
there is a problem with runtime A-time values on 3th step of the plans when _nlj_batching_enabled=0 in 11.2.0.1. They are basically reported way higher than actual elapsed time.

First let me show you how actual pre 11G plan and reported A-Time values are

(to get this plan I flush buffer cache twice,flush the sql from shared pool, set ofe to 10.2.0.4 instead of _nlj_batching_enabled=0 on OFE=11.2.0.1)

Small Dataset


HR@ORACOS> @test

49999 rows selected.

Elapsed: 00:00:01.44

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      52395  consistent gets
       1842  physical reads
          0  redo size
    1781536  bytes sent via SQL*Net to client
       1613  bytes received via SQL*Net from client
        101  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      49999  rows processed

Plan hash value: 225503291

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |  49999 |00:00:00.74 |   52395 |   1842 |
|   1 |  NESTED LOOPS                |           |      1 |  50143 |  49999 |00:00:00.74 |   52395 |   1842 |
|   2 |   TABLE ACCESS FULL          | T3        |      1 |  49999 |  49999 |00:00:00.05 |     949 |    849 |
|   3 |   TABLE ACCESS BY INDEX ROWID| T1        |  49999 |      1 |  49999 |00:00:00.44 |   51446 |    993 |
|*  4 |    INDEX UNIQUE SCAN         | T1_IDX_ID |  49999 |      1 |  49999 |00:00:00.15 |    1447 |    129 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."ID"="T3"."ID")

Big Dataset


HR@ORACOS> @test

350000 rows selected.

Elapsed: 00:00:10.25

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
     377819  consistent gets
      23736  physical reads
          0  redo size
   12606669  bytes sent via SQL*Net to client
       8213  bytes received via SQL*Net from client
        701  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     350000  rows processed

Plan hash value: 1708821971

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |    350K|00:00:04.45 |     377K|  23736 |
|   1 |  NESTED LOOPS                |           |      1 |    350K|    350K|00:00:04.45 |     377K|  23736 |
|*  2 |   TABLE ACCESS FULL          | T2        |      1 |    350K|    350K|00:00:01.03 |   17643 |  16951 |
|   3 |   TABLE ACCESS BY INDEX ROWID| T1        |    350K|      1 |    350K|00:00:02.81 |     360K|   6785 |
|*  4 |    INDEX UNIQUE SCAN         | T1_IDX_ID |    350K|      1 |    350K|00:00:01.02 |   10176 |    801 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("T2"."ID">650000)
   4 - access("T1"."ID"="T2"."ID")
       filter("T1"."ID">650000)

As you can see timings are now more accurate and Nested Loop step is now changed its location in the plan.
I don’t know why A-time is reported wrong but thanks to the methods I described in earlier methods I can say, What is causing the plan change difference like below


HR@ORACOS> alter session set "_nlj_batching_enabled"=0;  ===> to revert back pre 11G behavior

Session altered.

HR@ORACOS>
HR@ORACOS> @build_optimizer_bugfix_test_e                ====> to find out which bugfix may have caused
Enter value for optimizer_feature_version: 11.1.0.6      ====> I alread found out it changed after 11.1.0.6
HR@ORACOS> @optimizer_bugfix_test_e

PLAN_HASH_VALUE          BUGFIX
-------------------- ----------
225503291               5585313
692324935               6439032
                        4279274
                        4370840
                        4386734
                        4507997
			....
			....
			....

39 rows selected.

HR@ORACOS> @bugfixg
Enter value for bugno: 5585313

     BUGNO      VALUE SQL_FEATURE               DESCRIPTION                                                      OPTIMIZER_FEATURE_ENABLE
---------- ---------- ------------------------- ---------------------------------------------------------------- ------------------------- -
   5585313          1 QKSFM_EXECUTION_5585313   enable index prefetching for NLJ with unique scan as inner acces 11.1.0.6

Since I know what is causing the plan changes for nested loop now its time for me to go ahead and test if OFE=10.2.0.4 is better for the tests I run above.

Here are the comparison between OFE 11.2.0.1 and OFE 10.2.0.4 (instead of setting _nlj_batching_enabled=0 while OFE is still 11.2.0.1 )

This time 10.2.0.4 results were better for both small and big dataset
for small dataset
For buffer cache reads, 10 miliseconds slower for CPU and ~3 miliseconds slower for total ELA
In percentage degredation is ~%7 for CPU and ~%0.06 for Total ELA
For phsical reads 10 miliseconds slower for CPU and ~19 miliseconds slower for total ELA
In percentage degredation is ~%7 for CPU and ~%8 for Total ELA

for big dataset
For buffer cache reads 70 miliseconds slower for CPU and ~65 miliseconds slower for total ELA
In percentage degredation is ~%5 for CPU and ~%1.3 for Total ELA
For phsical reads 80 miliseconds slower for CPU and 120 miliseconds slower for total ELA
In percentage degredation is ~%4.8 for CPU and ~%2.2 for Total ELA

As a last word I can’t say my tests are best way to measure, however results are not looking consistent with what is written in the official documentation.
I think this is a new feature that will effect your plans after you upgrade your DB to 11G, but it will not necessarily make any perceivable difference to your response times.
In my humble opinion as long as the cost of your queries are same with the new implementation in place,
you can ignore the existence of it and never bother to change the parameter unless support says so.

Please feel free to comment if you have anything to say on this one and if you spot anything wrong in my analysis.

 

Source Code;
Test script
Full Excell Sheet for time comparisons

<–Previous Post in This Series —————- Next Post in this series–>

February 15, 2011

Plan Stability Through Upgrade-Why is my plan changed?-bugfixes-2

Filed under: Bugs, CBO, Performance, Plan Stability — coskan @ 12:01 am

This post is a bonus post where things can get complicated with my approach for finding bugfixes , if you are already upgraded.

On a database where we upgraded from 10.2.0.4 to 11.2.0.1 , We had a sql reported to have changed its response time on 11.2.0.1 .(At this point I need to say I’m sorry that I can’t share the SQL and PLANS because of data privacy. Tried to edit the sql and plans but query is massive . )

Interestingly when I run the optimizer_features_test for the versions between 10.2.0.4 to 11.2.0.1
only change was extra nested loop. Cost is same and plan generated for 10.2.0.4 was not same as pre-upgrade version

Runtime statistics .


SQL> @sqlid b290wxr1j2a2b

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
b290wxr1j2a2b 3272681547 SELECT   /*+gather_plan_statistics*//*test*/q.rowkey, i.NAME instrument,                                  NVL ((q.enddate - to_date('01Oct10',

1 row selected.

 CH# PARENT_HANDLE    OBJECT_HANDLE     PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS    FETCHES ROWS_PROCESSED     CPU_MS     ELA_MS    AVG_ELA       LIOS       PIOS      SORTS USERS_EXECUTING
---- ---------------- ---------------- ---------- -------- ---------- ---------- ---------- ---------- -------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------
   0 00000002BCC6C048 00000002BCC6A0A0   83708128     2022          1          1          1          2            121 1255771.09 1323564.07 1323.56407   62856727      12621          1               0
   1 00000002BCC6C048 00000002BCC2A3C8 1845562615      889          1          1          1          2            121   1847.719   2327.507   2.327507       2889        306          1               0
   2 00000002BCC6C048 00000002BCC24D78   83708128     2022          2          1          2          4            242 2310945.68 2318713.43 1159.35672  125714199        103          2               0
   3 00000002BCC6C048 00000002BCBCEFD0 3790468002     2022          2          1          2          4            242 2349948.75 2361862.35 1180.93118  125714497        424          2               0
   4 00000002BCC6C048 00000002BDDAF630  427381853     2022          2          1          2          4            242 2308517.05 2319381.53 1159.69077  125713318         26          2               0
   5 00000002BCC6C048 00000002DA577EB8  427381853     2022          2          1          2          4            242 2294214.23 2300615.11 1150.30755  125713315          0          2               0

6 rows selected.
Child-Version Match is like below

Child	OFE
=================
0-1	11.2.0.1
2	11.1.0.7
3	11.1.0.6
4	10.2.0.5
5	10.2.0.4

Child number 1 on 11.2.0.1 runs with cardinality feedback and uses right plan for OR with extra nested loop make up and drops the response time from 1323 seconds to 2 seconds
Other

So what is changed and how can I catch it if changing OFE is not working ? Its pretty obvious that we have a stats problem but this query involves 10 different table
and its not always easy to gather stats on tables where all other queries are running fine on. I needed to find a solution.

When I check the query it has an OR and the new plans basically do not generate a plan for OR. I modified the query to use union all instead of OR I can get the same pre-upgrade plan which is interesting
because stats are still same so how come Oracle generates right plan when I use union all and wrong plan when I use OR ? There must be something causing this behaviour.

Since Changing OFE did not change anything in terms of response time we may look into the situation where 8.0 bugfix suddenly appeared on 11.2.0.1 but were not active on 10.2.0.4

How can we spot the bugfix ? You can use the bugfix builder that generates script to test bugfixes for all version below the given version

Script is build_optimizer_bugfix_test_below_e: Same thing but creates a script for all the bugfixes for the versions below the given version

Source code of the script

undefine optimizer_feature_version
set heading off
set feedback off
define optimizer_feature=&quot;&amp;optimizer_feature_version&quot;
set term off
spool optimizer_bugfix_test_e.sql
select 'set feedback off' from dual;
select 'set heading off' from dual;
select 'set term off' from dual;
select 'set echo off' from dual;
select 'spool optimizer_bugfix_test_results_e.log' from dual;
select 'drop table  bugfix_table purge;' from dual;
select 'create  table bugfix_table (bugfix number,plan_hash_value varchar2(20));' from dual;
select 'alter session set optimizer_features_enable=''&amp;optimizer_feature'';' from dual;
select   'alter session set &quot;_fix_control&quot;='''||bugno||decode(value,1,':OFF''',':ON''')||'; '
||chr(10)||'@test_e.sql'||chr(10)||
'insert into bugfix_table values ('||bugno||',(select substr(plan_table_output,18,12) from (select * from table(dbms_xplan.display)) where plan_table_output  like ''Plan hash value:%''));'||chr(10)||
'alter session set &quot;_fix_control&quot;='''||bugno||decode(value,1,':ON''',':OFF''')||'; '
from v$system_fix_control
where
to_number(replace(optimizer_feature_enable,'.',''))&lt;to_number(replace('&amp;optimizer_feature','.',''))
and value in (1,0) order by bugno;
select 'set term on' from dual;
select 'set feedback on' from dual;
select 'set heading on' from dual;
select 'break on plan_hash_value' from dual;
select 'select plan_hash_value,bugfix from bugfix_table order by 1;' from dual;
select 'set feedback on' from dual;
select 'set heading on' from dual;
select 'spool off' from dual;
set feedback on
set heading on
set term on
spool off

When I run it


SQL&gt; @build_optimizer_bugfix_test_below_e
Enter value for optimizer_feature_version: 10.2.0.4
SQL&gt; @optimizer_bugfix_test_e

PLAN_HASH_VALUE          BUGFIX
-------------------- ----------
1186576438              5129233
                        3120429
427381853               1403283
                        2320291
                        2324795
                        2492766
                        2660592
                        2663857
			.......
			.......
76 rows selected.

Two bugfixes gives us the plan we are looking for

SQL&gt; @bugfixg
Enter value for bugno: 5129233

     BUGNO      VALUE SQL_FEATURE               DESCRIPTION                                                      OPTIMIZER_FEATURE_ENABLE       EVENT IS_DEFAULT
---------- ---------- ------------------------- ---------------------------------------------------------------- ------------------------- ---------- ----------
   5129233          1 QKSFM_CBO_5129233         peek at any foldable exprssion during pruning                    10.2.0.3                           0          1

1 row selected.

SQL&gt; @bugfixg
Enter value for bugno: 3120429

     BUGNO      VALUE SQL_FEATURE               DESCRIPTION                                                      OPTIMIZER_FEATURE_ENABLE       EVENT IS_DEFAULT
---------- ---------- ------------------------- ---------------------------------------------------------------- ------------------------- ---------- ----------
   3120429          1 QKSFM_ACCESS_PATH_3120429 account for join key sparsity in computing NL index access cost  10.1.0.3                           0          1

1 row selected.

how can we sure which one is our problem ? Both of them enables us to run with the right plan
At this point only option is to go and check on a 10.2.0.4 database to see what these fixes were


SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> @bugfixg
Enter value for bugno: 5129233

     BUGNO      VALUE DESCRIPTION                                                      OPTIMIZER_FEATURE_ENABLE       EVENT IS_DEFAULT
---------- ---------- ---------------------------------------------------------------- ------------------------- ---------- ----------
   5129233          1 peek at any foldable exprssion during pruning                    10.2.0.3                           0          1

SQL> @bugfixg
Enter value for bugno: 3120429

no rows selected

As you can see 5129233 is again enabled on 10.2.0.4 but bugfix 3120429 does not exist and not enabled/disabled on a database running on 10.2.0.4 binaries.

Even both of them are solving the issue when we are running on 11.2.0.1, in my opinion controlling bugfix is 3120429.
Thanks to the reasonable developer who accepted to change the code using union all instead of or we did not need to enable disable a bugfix.

Since we find two bugs lets test them (I add comments to force new parse instead of flushing the sql for fresh runtime stats so sqlids are different )

When bugfix 3120429 is disabled

SQL> @sqlid 3798wumm6mwsq

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
3798wumm6mwsq 3865703190 SELECT   /*+gather_plan_statistics*//*test_2*/q.rowkey, i.NAME instrument,                                  NVL ((q.enddate - to_date('01Oct10',

1 row selected.

 CH# PARENT_HANDLE    OBJECT_HANDLE     PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS    FETCHES ROWS_PROCESSED     CPU_MS     ELA_MS    AVG_ELA       LIOS       PIOS      SORTS USERS_EXECUTING
---- ---------------- ---------------- ---------- -------- ---------- ---------- ---------- ---------- -------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------
   0 00000002DC1DABC8 00000002DC968950 4098949197      531          2          1          2          2              0     1305.8     1332.1     .66605        207          3          2               0
   1 00000002DC1DABC8 00000002DC6182A0 4098949197      531          2          1          2          2              0   1328.799   1336.494    .668247        207          0          2               0
   2 00000002DC1DABC8 00000002DC76E5E8 3700866780      451          2          1          2          2              0   1353.794   1369.258    .684629        214          0          2               0
   3 00000002DC1DABC8 00000002DCCAAFF8 2856793267      451          2          1          2          2              0   1244.811   1251.411   .6257055        214          0          2               0
   4 00000002DC1DABC8 00000002A5F40D20 1186576438      481          2          1          2          2              0    898.864    906.962    .453481        190          0          2               0

When bugfix 5129233 is disabled


SQL> @sqlid bh6a9xtq27hba

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
bh6a9xtq27hba 1814282602 SELECT   /*+gather_plan_statistics*//*test_3*/q.rowkey, i.NAME instrument,                                  NVL ((q.enddate - to_date('01Oct10',

1 row selected.

 CH# PARENT_HANDLE    OBJECT_HANDLE     PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS    FETCHES ROWS_PROCESSED     CPU_MS     ELA_MS    AVG_ELA       LIOS       PIOS      SORTS USERS_EXECUTING
---- ---------------- ---------------- ---------- -------- ---------- ---------- ---------- ---------- -------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------
   0 00000002DC6F9150 00000002DC955228  986578726      519          2          1          2          2              0    1177.82   1183.933   .5919665        183          0          2               0
   1 00000002DC6F9150 00000002A5F400F0  986578726      519          2          1          2          2              0   1261.808   1268.259   .6341295        183          0          2               0
   2 00000002DC6F9150 00000002DCBB2DE0 3700866780      445          2          1          2          2              0   1506.771   1533.008    .766504        214          0          2               0
   3 00000002DC6F9150 00000002DCBB29C0 2856793267      445          2          1          2          2              0   1305.801   1317.385   .6586925        214          0          2               0
   4 00000002DC6F9150 00000002DC81E000 1186576438      474          2          1          2          2              0    940.858    945.413   .4727065        190          0          2               0

On 11.2.0.1 and 11.1.0.7 plan is changed depending on the bugfix but for 10.2.0.4, 10.2.0.5 and 11.1.0.6 plan is constant for changing different bugfixes.
You can also see previous cardinality feedback plan is also slower then bugfix disabled version.

We can dig further but its not the point of this post and Since I managed to take the response time back to where it was from 1323 seconds to 0.5 seconds I can stop here.

As you can see we still spot a difference after upgrade even time travelling between releases by altering OFE did not help us at first point.

This post was to explain complicated issue and things you can face with my approach.

I also find it very usefull to have v$system_fix_control table snapped before you upgrade/patch and compared afterwards, so you won’t get shocked with newly added previos versions related bugs.

Scripts Used;

build_optimizer_bugfix_test_below_e

<-Previos Post in Series ——————————-  Next Post in Series–>

February 14, 2011

Plan Stability Through Upgrade-Why is my plan changed?-bugfixes-1

Filed under: Bugs, CBO, Performance, Plan Stability — coskan @ 12:00 pm

Before I start the actual post let me clarify something, these post series will be about using practical ways like I did on this post to understand
what is the cause of the post upgrade performance changes. I am trying to be more practical than scientific because of the nature of the production support DBA job.
When you are a production DBA, you deal with many issues/databases at the same time and time spent for a solution is very important for you to be able to move to the second problem.
Normally if you are a scientist, oracle enthusiastic then you get 10053 trace try to understand the data try to find out why optimizer did something on Xth step but in reality when it comes to upgrade issues
this is really time consuming and not very practical. You really need time/experience/knowledge to understand a 10053 trace especially for massive queries which has gazillion joins in it.
You can still understand what is going on from prediction and runtime statistics of a plan but it is not the answer you can give to the developers.
If you go and explain them all the things you spot in execution plan output and 10053 trace All they will say to you is “DB is upgraded and upgrade sucks because previous release was better why are we upgrading”
so you need a bit bigger proof to make them accept the issue and workarounds. I also want to add that, my priority for these series is not focusing on content of sqls
but giving you the practical technique to stabilize their plans during upgrades.

Now back to post, You followed the steps in first post and you found out that your plan is changed, now it is time to find out why it is changed
when the only optimizer version is changed. One of the possible reasons is bugfixes in the optimizer.

a-Brief introduction to bugfixes.
Information below is taken from MOS Note – Init.ora Parameter “_FIX_CONTROL” [Hidden] Reference Note [ID 567171.1];

Support and Historical Notes for "_FIX_CONTROL"

  <strong>_FIX_CONTROL</strong> is a special hidden parameter introduced in 10.2.0.2 which can be used to enable or disable certain bug fixes.
  This is typically used in the optimizer area to allow specific fixes to be turned on or off as some fixes can have a significant
  effect on execution plans. Often the default setting (ON or OFF) for a fix is determined by the
  value of the OPTIMIZER_FEATURES_ENABLE parameter.  A full list of bug numbers which are  controlled by fix control can be found in the V$ views:
    - GV$SYSTEM_FIX_CONTROL
    - GV$SESSION_FIX_CONTROL

  The general format for this parameter is the form: _FIX_CONTROL='bug_number:ON|OFF'

  eg:  alter session set "_fix_control"='4969880:ON';
      or in the init.ora
       _fix_control='4969880:ON'

  Multiple values can be specified:
  eg:	alter session set "_fix_control"='4969880:ON','5483301:off';

  Note that fixes that use _FIX_CONTROL in one release may not use it in other releases so any settings should be reviewed when
  upgrading / downgrading or applying a patch set. Also some fix controls have numeric settings rather than ON or OFF.

In summary in every release of Oracle Database (( including patches) ) there are bugfixes and since 10.2.0.2 their availability can be controlled by
this “undocumented” parameter. Some of these bugfixes are on by default some of them are not depending on the optimizer version+oracle binary(check extra test case at the bottom). you are running.
Bugfixes gives us fine grained control. If the optimizer feature you are running causing problems on some queries because of a bugfix then you can still stay on
the version you want but disable the fixes, having said that you still need aproval from ORacle Support if it is safe or not to enable disable bug.

Here are the number of bugfixes by releases taken from 11.2.0.2 base release

SQL&gt; select optimizer_feature_enable,count(*) from v$system_fix_control group by optimizer_feature_enable order by 1 asc;

OPTIMIZER_FEATURE_ENABLE    COUNT(*)
------------------------- ----------
10.1.0                             3
10.1.0.3                           1
10.1.0.5                           2
10.2.0.1                           7
10.2.0.2                          12
10.2.0.3                          11
10.2.0.4                          73
10.2.0.5                         111
11.1.0.6                          39
11.1.0.7                          19
11.2.0.1                          65
11.2.0.2                         105
8.0.0                             52
8.1.6                              1
8.1.7                              2
9.2.0                              8
9.2.0.8                            3
37

And this is from 10.2.0.4 PSU 2

SQL&gt; select optimizer_feature_enable,count(*) from v$system_fix_control group by optimizer_feature_enable order by 1 asc;

OPTIMIZER_FEATURE_ENABLE    COUNT(*)
------------------------- ----------
10.1.0                             1
10.1.0.5                           1
10.2.0.1                           4
10.2.0.2                          13
10.2.0.3                          10
10.2.0.4                          68
9.2.0                              2
9.2.0.8                            7
9

There are couple of things you can spot here. Despite being higher version Oracle sometimes waits to put a bugfix of lower versions.
There are bugfixes from 10.1.0.3 and 8.0.0 and 8.1.6 8.1.7 which are on 11.2.0.2 relase but not on 10.2.0.4 release which is interesting.
In general do not think that bugfixes are added gradually. Oracle can put/make switchable a bugfix from a very old release to a brand new release. There is one more thing to bear in mind not %100 of bugfixes works with value 1 or 0. There are couple of them with values over 1 probably related with thresholds. In this post I will not pay attention to them.

 

After this brief information now it is time for the actual reason of this post.
As you recall, I shared an optimizer_features_test builder on the first post and now with the same logic I have 2 more builder scripts
to build your bugfix test for versions. After you identify under which OFE your plan behaviour changed, all you need to do is to pass
that version level to the scripts below and let them create test cases by enabling/disabling the bugfixes spesific to that version.

build_optimizer_bugfix_test.sql: This is for runtime bugfix test_case generator. All it does is create optimizer_bugfix_test.sql by using v$system_fix_control_table.
It gets each bugfix from that table for given OFE version and enable/disable them depending on their default value then run the test.sql and disable/enable the bugfix back again.
So we have a controlled environment where we test 1 bugfix at a time. It captures every plan_hash_value into a table and select them once all bugfix numbers tested.
If you are lucky enough or in other words if the problem is caused by a bugfix Oracle will revert the plan back to the original once you are changing that bugfixes value.
There is a tiny little problem with this approach, you cannot run 600 second query for 103 bugfixes. This is mostly for very quick queries.
The code of the sql is like below


undefine optimizer_feature_version
set heading off
set feedback off
define optimizer_feature="&optimizer_feature_version"
set term off
spool optimizer_bugfix_test.sql
select 'set feedback off' from dual;
select 'set heading off' from dual;
select 'set term off' from dual;
select 'set echo off' from dual;
select 'spool optimizer_bugfix_test_results.log' from dual;
select 'drop table  bugfix_table purge;' from dual;
select 'create  table bugfix_gtt (bugfix number,plan_hash_value varchar2(20));' from dual;
select 'alter session set optimizer_features_enable=''&optimizer_feature'';' from dual;
select   'alter session set "_fix_control"='''||bugno||decode(value,1,':OFF''',':ON''')||'; '
||chr(10)||'@test.sql'||chr(10)||
'insert into bugfix_table values ('||bugno||',(select substr(plan_table_output,18,12) from (select * from table(dbms_xplan.display_cursor)) where plan_table_output  like ''Plan hash value:%''));'||chr(10)||
'alter session set "_fix_control"='''||bugno||decode(value,1,':ON''',':OFF''')||'; '
from v$system_fix_control
where optimizer_feature_enable='&optimizer_feature' and value in (1,0) order by bugno;
select 'set term on' from dual;
select 'set feedback on' from dual;
select 'set heading on' from dual;
select 'break on plan_hash_value' from dual;
select 'select plan_hash_value,bugfix from bugfix_table order by 1;' from dual;
select 'spool off' from dual;
spool off
set feedback on
set heading on
set term on

If you are sure your plan change behaviour also taking place when you explain the plan, better you run the builder below.

build_optimizer_bugfix_test_e.sql: This is same idea like above only difference is, this time it generates optimizer_features_test_e.sql which explains plans by runnint test_e.sql for every bugfix.
Instead of dbms_xplan.display_cursor I use dbms_xplan.display. Output of the optimizer_features_test_e.sql is same as optimizer_features_test.sql

source code of the builder

undefine optimizer_feature_version
set heading off
set feedback off
define optimizer_feature="&optimizer_feature_version"
set term off
spool optimizer_bugfix_test_e.sql
select 'set feedback off' from dual;
select 'set heading off' from dual;
select 'set term off' from dual;
select 'set echo off' from dual;
select 'spool optimizer_bugfix_test_results_e.log' from dual;
select 'drop table  bugfix_table purge;' from dual;
select 'create  table bugfix_gtt (bugfix number,plan_hash_value varchar2(20));' from dual;
select 'alter session set optimizer_features_enable=''&optimizer_feature'';' from dual;
select   'alter session set "_fix_control"='''||bugno||decode(value,1,':OFF''',':ON''')||'; '
||chr(10)||'@test_e.sql'||chr(10)||
'insert into bugfix_table values ('||bugno||',(select substr(plan_table_output,18,12) from (select * from table(dbms_xplan.display)) where plan_table_output  like ''Plan hash value:%''));'||chr(10)||
'alter session set "_fix_control"='''||bugno||decode(value,1,':ON''',':OFF''')||'; '
from v$system_fix_control
where optimizer_feature_enable='&optimizer_feature' and value in (1,0) order by bugno;
select 'set term on' from dual;
select 'set feedback on' from dual;
select 'set heading on' from dual;
select 'break on plan_hash_value' from dual;
select 'select plan_hash_value,bugfix from bugfix_table order by 1;' from dual;
select 'set feedback on' from dual;
select 'set heading on' from dual;
select 'spool off' from dual;
set feedback on
set heading on
set term on
spool off

Both of the queries drop and create the table that they insert plan and bugfix numbers. You may want to implement it different way.

Let me show you how they work in two different examples for two of the queries.

This example is for a sql which started to run slow after upgrade from 11.2.0.1 to 11.2.0.2.

first check if the plan is changed


SQL> @sqlhist 1m7wy721qytx3

   SNAP_ID BEGIN_INTERVAL_TIME            SQL_ID          CHILD_NO PLAN_HASH_VALUE
---------- ------------------------------ ------------- ---------- ---------------
     25077 19-JAN-11 04.00.56.483         1m7wy721qytx3          0      3561051786
     25078 19-JAN-11 04.31.00.422         1m7wy721qytx3          0      3561051786
     25269 23-JAN-11 04.00.47.349         1m7wy721qytx3          0      2266344041 =>upgrade
     25317 24-JAN-11 04.00.53.276         1m7wy721qytx3          0      2266344041
     25365 25-JAN-11 04.00.43.525         1m7wy721qytx3          0      2266344041
     25366 25-JAN-11 04.30.47.456         1m7wy721qytx3          0      2266344041
     25413 26-JAN-11 04.00.22.604         1m7wy721qytx3          0      2266344041
     25414 26-JAN-11 04.30.25.833         1m7wy721qytx3          0      2266344041
     25461 27-JAN-11 04.00.18.101         1m7wy721qytx3          0      2266344041
     25462 27-JAN-11 04.30.24.777         1m7wy721qytx3          0      2266344041
     25509 28-JAN-11 04.00.05.631         1m7wy721qytx3          0      2266344041
     25510 28-JAN-11 04.30.10.558         1m7wy721qytx3          0      2266344041
     25557 29-JAN-11 04.00.41.775         1m7wy721qytx3          0      2266344041
     25558 29-JAN-11 04.30.46.453         1m7wy721qytx3          0      2266344041

Plan hash value: 3561051786

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT                  |                      |       |       |  1082K(100)|          |       |       |
|   1 |  MERGE                           | TABLE_XXXXXXXXXXX    |       |       |            |          |       |       |
|   2 |   VIEW                           |                      |       |       |            |          |       |       |
|   3 |    NESTED LOOPS OUTER            |                      |     1 |   286 |  1082K  (1)| 01:44:15 |       |       |
|   4 |     VIEW                         |                      |     1 |    78 |  1082K  (1)| 01:44:15 |       |       |
|   5 |      SORT GROUP BY               |                      |     1 |    75 |  1082K  (1)| 01:44:15 |       |       |
|   6 |       FILTER                     |                      |       |       |            |          |       |       |
|   7 |        NESTED LOOPS              |                      |   237 | 17775 |  1082K  (1)| 01:44:15 |       |       |
|   8 |         HASH JOIN                |                      |   237 | 15168 |  1081K  (1)| 01:44:12 |       |       |
|   9 |          INDEX RANGE SCAN        | INDEX_TABLE_3        |     1 |     9 |     2   (0)| 00:00:01 |       |       |
|  10 |          PARTITION RANGE ITERATOR|                      |  1948 |   104K|  1081K  (1)| 01:44:12 |   KEY |   KEY |
|  11 |           INDEX FAST FULL SCAN   | PK_TABLE_1_XXXXXX    |  1948 |   104K|  1081K  (1)| 01:44:12 |   KEY |   KEY |
|  12 |         INDEX RANGE SCAN         | INDEX_TABLE_2        |     1 |    11 |     2   (0)| 00:00:01 |       |       |
|  13 |     TABLE ACCESS BY INDEX ROWID  | TABLE_XXXXXXXXXXX    |     1 |   208 |     2   (0)| 00:00:01 |       |       |
|  14 |      INDEX UNIQUE SCAN           | PK_TABLE_XXXXXXXXXXX |     1 |       |     1   (0)| 00:00:01 |       |       |
-------------------------------------------------------------------------------------------------------------------------

Plan hash value: 2266344041

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT                  |                      |       |       |  1081K(100)|          |       |       |
|   1 |  MERGE                           | TABLE_XXXXXXXXXXX    |       |       |            |          |       |       |
|   2 |   VIEW                           |                      |       |       |            |          |       |       |
|   3 |    NESTED LOOPS OUTER            |                      |     1 |   282 |  1081K  (1)| 01:44:11 |       |       |
|   4 |     VIEW                         |                      |     1 |    78 |  1081K  (1)| 01:44:11 |       |       |
|   5 |      SORT GROUP BY               |                      |     1 |    75 |  1081K  (1)| 01:44:11 |       |       |
|   6 |       FILTER                     |                      |       |       |            |          |       |       |
|   7 |        NESTED LOOPS              |                      |     1 |    75 |  1081K  (1)| 01:44:11 |       |       |
|   8 |         NESTED LOOPS             |                      |     1 |    64 |  1081K  (1)| 01:44:11 |       |       |
|   9 |          INDEX RANGE SCAN        | INDEX_TABLE_3        |     1 |     9 |     2   (0)| 00:00:01 |       |       |
|  10 |          PARTITION RANGE ITERATOR|                      |     1 |    55 |  1081K  (1)| 01:44:11 |   KEY |   KEY |
|  11 |           INDEX FAST FULL SCAN   | PK_TABLE_1_XXXXXX    |     1 |    55 |  1081K  (1)| 01:44:11 |   KEY |   KEY |
|  12 |         INDEX RANGE SCAN         | INDEX_TABLE_2        |     1 |    11 |     2   (0)| 00:00:01 |       |       |
|  13 |     TABLE ACCESS BY INDEX ROWID  | TABLE_XXXXXXXXXXX    |     1 |   204 |     2   (0)| 00:00:01 |       |       |
|  14 |      INDEX UNIQUE SCAN           | PK_TABLE_XXXXXXXXXXX |     1 |       |     1   (0)| 00:00:01 |       |       |
-------------------------------------------------------------------------------------------------------------------------

Check the 8th step on the plans which makes the diffrence. Oracle suddenly started to choose nested loop instead of hash and estimated rows also drop down to 1.

Lets build the test case

SQL> @build_bind_vars_h
Enter SQL ID ==> 1m7wy721qytx3
Enter Plan Hash Value ==> 2266344041

variable B1 NUMBER

begin

:B1 := 14544;

end;

/

MERGE INTO MXS.TABLE_XXXXXXXXXXX TP USING ( select /* test 1m7wy721qytx3 */ /*+gather_plan_statistics*/TABLE_1.ID,
TABLE_1.SOURCEID, NVL (MAX (CLOSETIMESTAMP), MAX (STARTTIMESTAMP)) LASTTIMESTAMP, MAX (CLOSE) KEEP (DENSE_RANK LAST ORDER BY
STARTTIMESTAMP) AS LAST, MAX (CLOSESPREAD) KEEP (DENSE_RANK LAST ORDER BY STARTTIMESTAMP) AS LASTSPREAD, MAX (CLOSEUPFRONT) KEEP (DENSE_RANK
LAST ORDER BY STARTTIMESTAMP) AS LASTUPFRONT FROM USR1.TABLE_1, USR1.TABLE_2, USR1.TABLE_3 WHERE TABLE_1.ID = TABLE_2.TABLE_2.D
AND TABLE_2.MARKETID =TABLE_3.MARKETID AND TABLE_2.MARKETID IN :B1 AND TABLE_1.SOURCEID =TABLE_3.SOURCEID AND DURATION =
NUMTODSINTERVAL (1, 'DAY') AND STARTTIMESTAMP < TRUNC (SYSDATE, 'YEAR') - 1 AND STARTTIMESTAMP > TRUNC (SYSDATE, 'YEAR') - 7 GROUP BY
TABLE_1.ID, TABLE_1.SOURCEID) TI ON (TP.TABLE_2.D = TI.TABLE_2.D AND TP.SOURCEID = TI.SOURCEID) WHEN MATCHED THEN UPDATE
SET TP.LASTYEARCLOSE = TI.LAST, TP.LASTYEARCLOSETIMESTAMP = TI.LASTTIMESTAMP, TP.LASTYEARCLOSESPREAD = TI.LASTSPREAD,
TP.LASTYEARCLOSEUPFRONT = TI.LASTUPFRONT WHEN NOT MATCHED THEN INSERT (TP.TABLE_2.D, TP.SOURCEID, TP.LASTYEARCLOSE,
TP.LASTYEARCLOSETIMESTAMP, TP.LASTYEARCLOSESPREAD, TP.LASTYEARCLOSEUPFRONT) VALUES (TI.TABLE_2.D, TI.SOURCEID, TI.LAST, TI.LASTTIMESTAMP,
TI.LASTSPREAD, TI.LASTUPFRONT);

Since this is a merge statement I needed to remove the merge part.

test.sql is like below

variable B1 NUMBER

begin

:B1 := 14544;

end;

/
select /* test 1m7wy721qytx3 */ /*+gather_plan_statistics*/TABLE_1.ID,
TABLE_1.SOURCEID, NVL (MAX (CLOSETIMESTAMP), MAX (STARTTIMESTAMP)) LASTTIMESTAMP, MAX (CLOSE) KEEP (DENSE_RANK LAST ORDER BY
STARTTIMESTAMP) AS LAST, MAX (CLOSESPREAD) KEEP (DENSE_RANK LAST ORDER BY STARTTIMESTAMP) AS LASTSPREAD, MAX (CLOSEUPFRONT) KEEP (DENSE_RANK
LAST ORDER BY STARTTIMESTAMP) AS LASTUPFRONT FROM USR1.TABLE_1, USR1.TABLE_2, USR1.TABLE_3 WHERE TABLE_1.ID = TABLE_2.TABLE_2.D
AND TABLE_2.MARKETID =TABLE_3.MARKETID AND TABLE_2.MARKETID IN :B1 AND TABLE_1.SOURCEID =TABLE_3.SOURCEID AND DURATION =
NUMTODSINTERVAL (1, 'DAY') AND STARTTIMESTAMP < TRUNC (SYSDATE, 'YEAR') - 1 AND STARTTIMESTAMP > TRUNC (SYSDATE, 'YEAR') - 7 GROUP BY
TABLE_1.ID, TABLE_1.SOURCEID

test_e.sql is like below


explain plan for select /* test 1m7wy721qytx3 */ /*+gather_plan_statistics*/TABLE_1.ID,
TABLE_1.SOURCEID, NVL (MAX (CLOSETIMESTAMP), MAX (STARTTIMESTAMP)) LASTTIMESTAMP, MAX (CLOSE) KEEP (DENSE_RANK LAST ORDER BY
STARTTIMESTAMP) AS LAST, MAX (CLOSESPREAD) KEEP (DENSE_RANK LAST ORDER BY STARTTIMESTAMP) AS LASTSPREAD, MAX (CLOSEUPFRONT) KEEP (DENSE_RANK
LAST ORDER BY STARTTIMESTAMP) AS LASTUPFRONT FROM USR1.TABLE_1, USR1.TABLE_2, USR1.TABLE_3 WHERE TABLE_1.ID = TABLE_2.TABLE_2.D
AND TABLE_2.MARKETID =TABLE_3.MARKETID AND TABLE_2.MARKETID IN 14544 AND TABLE_1.SOURCEID =TABLE_3.SOURCEID AND DURATION =
NUMTODSINTERVAL (1, 'DAY') AND STARTTIMESTAMP < TRUNC (SYSDATE, 'YEAR') - 1 AND STARTTIMESTAMP > TRUNC (SYSDATE, 'YEAR') - 7 GROUP BY
TABLE_1.ID, TABLE_1.SOURCEID;

build the optimizer test to see if its OFE related

SQL> @build_optimizer_features_test
Enter value for base_optimizer_version: 11.2.0.1

run the test for different OFE settings

SQL> @optimizer_features_test

SQL> alter session set optimizer_features_enable='11.2.0.1';

Session altered.

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  c5y47k1zmrb22, child number 0
-------------------------------------
-------------------------------------

Plan hash value: 2561977778

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |      1 |        |    320K|00:21:18.17 |      25M|     16M|       |       |          |
|   1 |  SORT GROUP BY               |                   |      1 |      1 |    320K|00:21:18.17 |      25M|     16M|    73M|  6303K|   65M (0)|
|*  2 |   FILTER                     |                   |      1 |        |   1579K|01:37:13.89 |      25M|     16M|       |       |          |
|   3 |    NESTED LOOPS              |                   |      1 |      1 |   1579K|01:37:11.10 |      25M|     16M|       |       |          |
|   4 |     NESTED LOOPS             |                   |      1 |      1 |   5446K|00:15:20.03 |      16M|     16M|       |       |          |
|*  5 |      INDEX RANGE SCAN        | INDEX_TABLE_3     |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |       |       |          |
|   6 |      PARTITION RANGE ITERATOR|                   |      1 |      1 |   5446K|00:15:10.67 |      16M|     16M|       |       |          |
|*  7 |       INDEX FAST FULL SCAN   | PK_TABLE_1_XXXXXX |      1 |      1 |   5446K|00:15:01.04 |      16M|     16M|       |       |          |
|*  8 |     INDEX RANGE SCAN         | INDEX_TABLE_2     |   5446K|      1 |   1579K|00:05:19.24 |    9518K|  34458 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TRUNC(SYSDATE@!,'fmyear')-1>TRUNC(SYSDATE@!,'fmyear')-7)
   5 - access("MARKET"."MARKETID"=:B1)
   7 - filter(("DURATION"=INTERVAL'+000000001 00:00:00.000000000' DAY(9) TO SECOND(9) AND "STARTTIMESTAMP">TRUNC(SYSDATE@!,'fmyear')-7
              AND "STARTTIMESTAMP"<TRUNC(SYSDATE@!,'fmyear')-1 AND "TABLE_2.NTERVAL"."TABLE_2.D">0 AND "TABLE_2.NTERVAL"."SOURCEID"="MARKET"."SOURCEID"))
   8 - access("TABLE_2.NTERVAL"."TABLE_2.D"="TABLE_2.."TABLE_2.D" AND "TABLE_2.."MARKETID"=:B1)

##########Second Run Cardinality feedback kicked still same plan with different cardinality on the last step

Plan hash value: 2561977778

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |      1 |        |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   1 |  SORT GROUP BY               |                   |      1 |      1 |      0 |00:00:00.01 |       0 |      0 |    73M|  6303K|   65M (0)|
|*  2 |   FILTER                     |                   |      1 |        |   1444K|00:22:15.31 |      24M|     15M|       |       |          |
|   3 |    NESTED LOOPS              |                   |      1 |      1 |   1444K|00:22:12.87 |      24M|     15M|       |       |          |
|   4 |     NESTED LOOPS             |                   |      1 |      1 |   4894K|00:10:27.91 |      15M|     15M|       |       |          |
|*  5 |      INDEX RANGE SCAN        | INDEX_TABLE_3     |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |       |       |          |
|   6 |      PARTITION RANGE ITERATOR|                   |      1 |      1 |   4894K|00:10:19.78 |      15M|     15M|       |       |          |
|*  7 |       INDEX FAST FULL SCAN   | PK_TABLE_1_XXXXXX |      1 |      1 |   4894K|00:10:11.63 |      15M|     15M|       |       |          |
|*  8 |     INDEX RANGE SCAN         | INDEX_TABLE_2     |   4894K|      1 |   1444K|00:00:53.00 |    8678K|   2859 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TRUNC(SYSDATE@!,'fmyear')-1>TRUNC(SYSDATE@!,'fmyear')-7)
   5 - access("MARKET"."MARKETID"=:B1)
   7 - filter(("DURATION"=INTERVAL'+000000001 00:00:00.000000000' DAY(9) TO SECOND(9) AND "STARTTIMESTAMP">TRUNC(SYSDATE@!,'fmyear')-7
              AND "STARTTIMESTAMP"<TRUNC(SYSDATE@!,'fmyear')-1 AND "TABLE_2.NTERVAL"."TABLE_2.D">0 AND "TABLE_2.NTERVAL"."SOURCEID"="MARKET"."SOURCEID"))
   8 - access("TABLE_2.NTERVAL"."TABLE_2.D"="TABLE_2.."TABLE_2.D" AND "TABLE_2.."MARKETID"=:B1)

Note
-----
   - cardinality feedback used for this statement

SQL> alter session set optimizer_features_enable='11.2.0.1';

Session altered.

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  c5y47k1zmrb22, child number 2
-------------------------------------

Plan hash value: 2179327633

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |      1 |        |    320K|00:21:24.30 |      25M|     16M|       |       |          |
|   1 |  SORT GROUP BY               |                   |      1 |      1 |    320K|00:21:24.30 |      25M|     16M|    73M|  6303K|   65M (0)|
|*  2 |   FILTER                     |                   |      1 |        |   1579K|00:20:19.75 |      25M|     16M|       |       |          |
|   3 |    NESTED LOOPS              |                   |      1 |    234 |   1579K|00:20:17.00 |      25M|     16M|       |       |          |
|*  4 |     HASH JOIN                |                   |      1 |    234 |   5446K|00:09:47.69 |      16M|     16M|  1452K|  1452K|  372K (0)|
|*  5 |      INDEX RANGE SCAN        | INDEX_TABLE_3     |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |       |       |          |
|   6 |      PARTITION RANGE ITERATOR|                   |      1 |   1948 |     20M|00:21:13.17 |      16M|     16M|       |       |          |
|*  7 |       INDEX FAST FULL SCAN   | PK_TABLE_1_XXXXXX |      1 |   1948 |     20M|00:20:39.18 |      16M|     16M|       |       |          |
|*  8 |     INDEX RANGE SCAN         | INDEX_TABLE_2     |   5446K|      1 |   1579K|00:00:51.19 |    9518K|   3216 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TRUNC(SYSDATE@!,'fmyear')-1>TRUNC(SYSDATE@!,'fmyear')-7)
   4 - access("TABLE_2.NTERVAL"."SOURCEID"="MARKET"."SOURCEID")
   5 - access("MARKET"."MARKETID"=:B1)
   7 - filter(("DURATION"=INTERVAL'+000000001 00:00:00.000000000' DAY(9) TO SECOND(9) AND "STARTTIMESTAMP">TRUNC(SYSDATE@!,'fmyear')-7
              AND "STARTTIMESTAMP"<TRUNC(SYSDATE@!,'fmyear')-1 AND "TABLE_2.NTERVAL"."TABLE_2.D">0))
   8 - access("TABLE_2.NTERVAL"."TABLE_2.D"="TABLE_2.."TABLE_2.D" AND "TABLE_2.."MARKETID"=:B1)

##########Second Run Cardinality feedback kicked still same plan with different cardinality on the last step

Plan hash value: 2179327633

------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |      1 |        |    320K|00:17:51.32 |      25M|     16M|       |       |          |
|   1 |  SORT GROUP BY               |                   |      1 |      1 |    320K|00:17:51.32 |      25M|     16M|    73M|  6303K|   65M (0)|
|*  2 |   FILTER                     |                   |      1 |        |   1579K|00:19:42.92 |      25M|     16M|       |       |          |
|   3 |    NESTED LOOPS              |                   |      1 |    234 |   1579K|00:19:40.24 |      25M|     16M|       |       |          |
|*  4 |     HASH JOIN                |                   |      1 |    234 |   5446K|00:10:22.24 |      16M|     16M|  1452K|  1452K|  419K (0)|
|*  5 |      INDEX RANGE SCAN        | INDEX_TABLE_3     |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |       |       |          |
|   6 |      PARTITION RANGE ITERATOR|                   |      1 |   1948 |     20M|00:21:33.34 |      16M|     16M|       |       |          |
|*  7 |       INDEX FAST FULL SCAN   | PK_TABLE_1_XXXXXX |      1 |   1948 |     20M|00:20:59.31 |      16M|     16M|       |       |          |
|*  8 |     INDEX RANGE SCAN         | INDEX_TABLE_2     |   5446K|      1 |   1579K|00:00:54.86 |    9518K|   3161 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TRUNC(SYSDATE@!,'fmyear')-1>TRUNC(SYSDATE@!,'fmyear')-7)
   4 - access("TABLE_2.NTERVAL"."SOURCEID"="MARKET"."SOURCEID")
   5 - access("MARKET"."MARKETID"=:B1)
   7 - filter(("DURATION"=INTERVAL'+000000001 00:00:00.000000000' DAY(9) TO SECOND(9) AND "STARTTIMESTAMP">TRUNC(SYSDATE@!,'fmyear')-7
              AND "STARTTIMESTAMP"<TRUNC(SYSDATE@!,'fmyear')-1 AND "TABLE_2.NTERVAL"."TABLE_2.D">0))
   8 - access("TABLE_2.NTERVAL"."TABLE_2.D"="TABLE_2.."TABLE_2.D" AND "TABLE_2.."MARKETID"=:B1)

Note
-----
   - cardinality feedback used for this statement

As you see cardinality feedback kicked two times with same plan generated with different cardinalities.
When you check the new 11.2.0.2 plan, when it uses nested loop cardinality suddenly drop to 1 which is a bit suspicios.

Runtime sql statistics

SQL> @sqlid c5y47k1zmrb22

 CH# PARENT_HANDLE    OBJECT_HANDLE     PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS    FETCHES ROWS_PROCESSED     CPU_MS     ELA_MS    AVG_ELA       LIOS       PIOS      SORTS USERS_EXECUTING
---- ---------------- ---------------- ---------- -------- ---------- ---------- ---------- ---------- -------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------
   0 000000075E5B37D0 00000009C5E1FEC8 2561977778  1081582          1          1          1        642         320375 633798.646 2034600.72 2034.60072   25789778   16298873          1               0
   1 000000075E5B37D0 00000009022773E0 2561977778  1081582          1          1          1        643         320375 607056.710 1424804.27 1424.80427   25136188   16034118          1               0
   2 000000075E5B37D0 0000000731957330 2179327633  1082314          1          1          1        642         320375 516728.448 1284256.36 1284.25636   25789785   16266663          1               0
   3 000000075E5B37D0 00000008735DB1A8 2179327633  1082314          1          1          1        642         320375 512717.052 1315325.55 1315.32555   25789675   16266877          1               0

Clearly 11.2.0.1 has got better response times. Because the runtime is too long better try our chance if the behaviour is same for explain plan with literals as well

Build the explain plan test case

SQL> @build_optimizer_features_test_e
Enter value for base_optimizer_version: 11.2.0.1

Lets run the test

SQL> @optimizer_features_test_e
SQL> alter session set optimizer_features_enable='11.2.0.2';

Session altered.

SQL> set echo off

Explained.

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2561977778

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |     1 |    75 |  1081K  (1)| 01:44:11 |       |       |
|   1 |  SORT GROUP BY               |                   |     1 |    75 |  1081K  (1)| 01:44:11 |       |       |
|*  2 |   FILTER                     |                   |       |       |            |          |       |       |
|   3 |    NESTED LOOPS              |                   |     1 |    75 |  1081K  (1)| 01:44:11 |       |       |
|   4 |     NESTED LOOPS             |                   |     1 |    64 |  1081K  (1)| 01:44:11 |       |       |
|*  5 |      INDEX RANGE SCAN        | INDEX_TABLE_3     |     1 |     9 |     2   (0)| 00:00:01 |       |       |
|   6 |      PARTITION RANGE ITERATOR|                   |     1 |    55 |  1081K  (1)| 01:44:11 |   KEY |   KEY |
|*  7 |       INDEX FAST FULL SCAN   | PK_TABLE_1_XXXXXX |     1 |    55 |  1081K  (1)| 01:44:11 |   KEY |   KEY |
|*  8 |     INDEX RANGE SCAN         | INDEX_TABLE_2     |     1 |    11 |     2   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TRUNC(SYSDATE@!,'fmyear')-1>TRUNC(SYSDATE@!,'fmyear')-7)
   5 - access("MARKET"."MARKETID"=14544)
   7 - filter("DURATION"=INTERVAL'+000000001 00:00:00.000000000' DAY(9) TO SECOND(9) AND
              "STARTTIMESTAMP">TRUNC(SYSDATE@!,'fmyear')-7 AND "STARTTIMESTAMP"<TRUNC(SYSDATE@!,'fmyear')-1 AND
              "TABLE_2.NTERVAL"."TABLE_2.D">0 AND "TABLE_2.NTERVAL"."SOURCEID"="MARKET"."SOURCEID")
   8 - access("TABLE_2.NTERVAL"."TABLE_2.D"="TABLE_2.."TABLE_2.D" AND "TABLE_2.."MARKETID"=14544)

25 rows selected.

SQL> alter session set optimizer_features_enable='11.2.0.1';

Session altered.

SQL> set echo off

Explained.

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2179327633

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |     1 |    75 |  1082K  (1)| 01:44:15 |       |       |
|   1 |  SORT GROUP BY               |                   |     1 |    75 |  1082K  (1)| 01:44:15 |       |       |
|*  2 |   FILTER                     |                   |       |       |            |          |       |       |
|   3 |    NESTED LOOPS              |                   |   234 | 17550 |  1082K  (1)| 01:44:15 |       |       |
|*  4 |     HASH JOIN                |                   |   234 | 14976 |  1081K  (1)| 01:44:12 |       |       |
|*  5 |      INDEX RANGE SCAN        | INDEX_TABLE_3     |     1 |     9 |     2   (0)| 00:00:01 |       |       |
|   6 |      PARTITION RANGE ITERATOR|                   |  1948 |   104K|  1081K  (1)| 01:44:12 |   KEY |   KEY |
|*  7 |       INDEX FAST FULL SCAN   | PK_TABLE_1_XXXXXX |  1948 |   104K|  1081K  (1)| 01:44:12 |   KEY |   KEY |
|*  8 |     INDEX RANGE SCAN         | INDEX_TABLE_2     |     1 |    11 |     2   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TRUNC(SYSDATE@!,'fmyear')-1>TRUNC(SYSDATE@!,'fmyear')-7)
   4 - access("TABLE_2.NTERVAL"."SOURCEID"="MARKET"."SOURCEID")
   5 - access("MARKET"."MARKETID"=14544)
   7 - filter("DURATION"=INTERVAL'+000000001 00:00:00.000000000' DAY(9) TO SECOND(9) AND
              "STARTTIMESTAMP">TRUNC(SYSDATE@!,'fmyear')-7 AND "STARTTIMESTAMP"<TRUNC(SYSDATE@!,'fmyear')-1 AND
              "TABLE_2.NTERVAL"."TABLE_2.D">0)
   8 - access("TABLE_2.NTERVAL"."TABLE_2.D"="TABLE_2.."TABLE_2.D" AND "TABLE_2.."MARKETID"=14544)

26 rows selected.

Plans behaviour is same so there is no need for me to run it for every bugfix better I use explain plan testing

Its time to build the bugfix test

SQL> @build_optimizer_bugfix_test_e
Enter value for optimizer_feature_version: 11.2.0.2

Time to rock

SQL> @optimizer_bugfix_test_e		=>output is truncated and there is break on plan_hash_value

PLAN_HASH_VALUE          BUGFIX
-------------------- ----------
2179327633              7641601
2561977778              4926618
                        5982893
                        6086930
                        6236862
                        6408301
                        6472966
			.......
			.......
			.......

103 rows selected.

As you see bugfix 7641601 is the guilty guy for us and we found this info under 1 minute. Once we disable it it reverts back to the 11.2.0.1 plan

Lets see what this bug is. Bugfixg is a script where I check bugfix definition from v$system_fix_control

SQL> @bugfixg
Enter value for bugno: 7641601

     BUGNO      VALUE SQL_FEATURE               DESCRIPTION                                                      OPTIMIZER_FEATURE_ENABLE       EVENT IS_DEFAULT
---------- ---------- ------------------------- ---------------------------------------------------------------- ------------------------- ---------- ----------
   7641601          1 QKSFM_CARDINALITY_7641601 treat a single value column as having a frequency histogram      11.2.0.2                           0          1

This one looks like a new very interesting bugfix which needs more research from Jonathan Lewis, I don’t feel that much comfortable to understand how it works :)

One more thing to prove how enabling disabling the bugfix works

SQL> alter session set "_fix_control"='7641601:OFF';

Session altered.

SQL> @test_e

Explained.

SQL> @e

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2179327633

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |     1 |    75 |  1082K  (1)| 01:44:15 |       |       |
|   1 |  SORT GROUP BY               |                   |     1 |    75 |  1082K  (1)| 01:44:15 |       |       |
|*  2 |   FILTER                     |                   |       |       |            |          |       |       |
|   3 |    NESTED LOOPS              |                   |   234 | 17550 |  1082K  (1)| 01:44:15 |       |       |
|*  4 |     HASH JOIN                |                   |   234 | 14976 |  1081K  (1)| 01:44:12 |       |       |
|*  5 |      INDEX RANGE SCAN        | INDEX_TABLE_3     |     1 |     9 |     2   (0)| 00:00:01 |       |       |
|   6 |      PARTITION RANGE ITERATOR|                   |  1948 |   104K|  1081K  (1)| 01:44:12 |   KEY |   KEY |
|*  7 |       INDEX FAST FULL SCAN   | PK_TABLE_1_XXXXXX |  1948 |   104K|  1081K  (1)| 01:44:12 |   KEY |   KEY |
|*  8 |     INDEX RANGE SCAN         | INDEX_TABLE_2     |     1 |    11 |     2   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TRUNC(SYSDATE@!,'fmyear')-1>TRUNC(SYSDATE@!,'fmyear')-7)
   4 - access("TABLE_2.NTERVAL"."SOURCEID"="MARKET"."SOURCEID")
   5 - access("MARKET"."MARKETID"=14544)
   7 - filter("DURATION"=INTERVAL'+000000001 00:00:00.000000000' DAY(9) TO SECOND(9) AND
              "STARTTIMESTAMP">TRUNC(SYSDATE@!,'fmyear')-7 AND "STARTTIMESTAMP"<TRUNC(SYSDATE@!,'fmyear')-1 AND
              "TABLE_2.NTERVAL"."TABLE_2.D">0)
   8 - access("TABLE_2.NTERVAL"."TABLE_2.D"="TABLE_2.."TABLE_2.D" AND "TABLE_2.."MARKETID"=14544)

26 rows selected.

SQL> alter session set "_fix_control"='7641601:ON';

Session altered.

SQL> @test_e

Explained.

SQL> @e

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2561977778

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |     1 |    75 |  1081K  (1)| 01:44:11 |       |       |
|   1 |  SORT GROUP BY               |                   |     1 |    75 |  1081K  (1)| 01:44:11 |       |       |
|*  2 |   FILTER                     |                   |       |       |            |          |       |       |
|   3 |    NESTED LOOPS              |                   |     1 |    75 |  1081K  (1)| 01:44:11 |       |       |
|   4 |     NESTED LOOPS             |                   |     1 |    64 |  1081K  (1)| 01:44:11 |       |       |
|*  5 |      INDEX RANGE SCAN        | INDEX_TABLE_3     |     1 |     9 |     2   (0)| 00:00:01 |       |       |
|   6 |      PARTITION RANGE ITERATOR|                   |     1 |    55 |  1081K  (1)| 01:44:11 |   KEY |   KEY |
|*  7 |       INDEX FAST FULL SCAN   | PK_TABLE_1_XXXXXX |     1 |    55 |  1081K  (1)| 01:44:11 |   KEY |   KEY |
|*  8 |     INDEX RANGE SCAN         | INDEX_TABLE_2     |     1 |    11 |     2   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TRUNC(SYSDATE@!,'fmyear')-1>TRUNC(SYSDATE@!,'fmyear')-7)
   5 - access("MARKET"."MARKETID"=14544)
   7 - filter("DURATION"=INTERVAL'+000000001 00:00:00.000000000' DAY(9) TO SECOND(9) AND
              "STARTTIMESTAMP">TRUNC(SYSDATE@!,'fmyear')-7 AND "STARTTIMESTAMP"<TRUNC(SYSDATE@!,'fmyear')-1 AND
              "TABLE_2.NTERVAL"."TABLE_2.D">0 AND "TABLE_2.NTERVAL"."SOURCEID"="MARKET"."SOURCEID")
   8 - access("TABLE_2.NTERVAL"."TABLE_2.D"="TABLE_2.."TABLE_2.D" AND "TABLE_2.."MARKETID"=14544)

25 rows selected.

I think research for this particular sql is now finished. We found out the problem. We have enough info to stabilize this sql. Rest is checking the effect of statistics with
dynamic sampling and also raising the issue with Oracle which I will also explain later.

I want to share another example with the bugfix I mentioned in this post.

I will skip the initial steps and paste only the output. The query normally runs under 3 seconds in worst condition and has got many binds so I would like to use
build_optimizer_bugfix_test instead of explain plan.

Different optimizer behavior

SQL> @optimizer_features_test
SQL> alter session set optimizer_features_enable='11.2.0.2';

Session altered.

SQL> set echo off
timing for: time_for_ofe_11.2.0.2
Elapsed: 00:00:03.01

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  64xn5ccrd1f9n, child number 0
-------------------------------------

Plan hash value: 2063817604

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                             |      1 |        |     50 |00:00:00.09 |    5308 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TABLE_1_XXXXXXXXXXXX        |      1 |     12 |     50 |00:00:00.09 |    5308 |
|*  2 |   INDEX SKIP SCAN           | IDX_TABLE_1_XXXXXXXXXXXX_05 |      1 |     12 |     50 |00:00:00.45 |    5272 |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("TABLE_1_XXXXXXXXXXXX"."SOURCEID"=:N51)
       filter(("TABLE_1_XXXXXXXXXXXX"."SOURCEID"=:N51 AND
              INTERNAL_FUNCTION("TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID")))

32 rows selected.

Elapsed: 00:00:00.04
SQL> alter session set optimizer_features_enable='11.2.0.1';

Session altered.

Elapsed: 00:00:00.01
Session altered.

Elapsed: 00:00:00.01
SQL> set echo off
timing for: time_for_ofe_11.2.0.1
Elapsed: 00:00:00.11

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  64xn5ccrd1f9n, child number 1
-------------------------------------

Plan hash value: 1927123218

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |      1 |        |     50 |00:00:00.01 |     113 |
|   1 |  INLIST ITERATOR             |                         |      1 |        |     50 |00:00:00.01 |     113 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TABLE_1_XXXXXXXXXXXX    |     50 |     12 |     50 |00:00:00.01 |     113 |
|*  3 |    INDEX UNIQUE SCAN         | PK_TABLE_1_XXXXXXXXXXXX |     50 |     12 |     50 |00:00:00.01 |      78 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access((("TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N1 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N2 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N3 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N4 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N5 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N6 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N7 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N8 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N9 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N10 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N11 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N12 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N13 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N14 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N15 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N16 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N17 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N18 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N19 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N20 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N21 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N22 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N23 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N24 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N25 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N26 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N27 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N28 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N29 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N30 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N31 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N32 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N33 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N34 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N35 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N36 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N37 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N38 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N39 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N40 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N41 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N42 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N43 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N44 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N45 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N46 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N47 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N48 OR
              "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N49 OR "TABLE_1_XXXXXXXXXXXX"."TABLE_1_XXXXXXXXXXXX_ID"=:N50)) AND
              "TABLE_1_XXXXXXXXXXXX"."SOURCEID"=:N51)

SQL> @build_optimizer_bugfix_test
Enter value for optimizer_feature_version: 11.2.0.2
SQL> @optimizer_bugfix_test

PLAN_HASH_VALUE          BUGFIX
-------------------- ----------
1927123218              8893626
2063817604              4926618
                        5982893
                        6086930
                        6236862
                        6408301
                        6472966
                        6754080
                        7215982
                        7277732
                        7284269
                        7641601
			.......
			.......
			.......
			.......

103 rows selected.

In under 5 minutes I found out which bugfix is causing the issue

SQL> @bugfixg
Enter value for bugno: 8893626

     BUGNO      VALUE SQL_FEATURE               DESCRIPTION                                                      OPTIMIZER_FEATURE_ENABLE       EVENT IS_DEFAULT
---------- ---------- ------------------------- ---------------------------------------------------------------- ------------------------- ---------- ----------
   8893626          1 QKSFM_ACCESS_PATH_8893626 apply index filter selectivity during skip scan costing          11.2.0.2                           0          1

In summary it is not black magic to spot what is causing the plan behaviour change when it comes to bugfixes.
Once you find out you are ready to take control of your plans again. Once you have this valuable information you can get faster responses from Oracle Support on your issue as well.
You will also look cool to your developers when

You can ask, what if my plan is not changing only on 1 bugfix? In that case I would check for the plan I am looking for
and if it exists I still assume thats my bugfix if it does not exist I will follow the upcoming steps.

I will post one more example for the issue I explained above when very old bugfixes suddenly appear on new version.

References Used:
Init.ora Parameter “_FIX_CONTROL” [Hidden] Reference Note [ID 567171.1];

Scripts Used:
All in one zip file

build_optimizer_bugfix_test.sql
build_optimizer_bugfix_test_e.sql
bugfixg.sql

<-Previous Post in Series ——————————-   Next Post in Series–>

January 24, 2011

Analysing Temp usage on 11GR2 when Temp space is not released

Filed under: Bugs — coskan @ 2:47 pm

We had an interesting issue on our 3 node cluster about the temp usage. One night one of the critical batch operation which never used high temp suddenly hit “ORA-01652: unable to extend temp segment by string in tablespace string tips”. Once we check the system we saw some “inactive sessions” were using very high amount of temp like below which were allocating most of the temp space causing batch operations fail even with very small temp request.

—The picture of temp usage right after the issue

PROD1> @temp_usageg

#####################################################################
#######################GLOBAL TEMP USAGE#############################
#####################################################################

   INST_ID TABLESPACE         MB_TOTAL     MB_USED    MB_FREE
---------- --------------- ----------- ----------- ----------
         3 TEMP              22528.000       7.000      22521
         2 TEMP              22528.000      56.000      22472
         1 TEMP              22528.000   19102.000       3426

#####################################################################
#######################GLOBAL TEMP USERS-over 10MB ##################
#####################################################################

   INST_ID SID_SERIAL USERNAME     OSUSER     SPID   SERVICE            MB_USED TABLESPACE
---------- ---------- ------------ ---------- ------ -------------------------- ---------------
         1 620,4322   USER_1      osusr1   21712  WEB_USER_SERVICE         16.000 TEMP
         1 41,35454   USER_1      osusr1   20334  WEB_USER_SERVICE         19.000 TEMP
         1 529,21034  USER_1      osusr1   5776   WEB_USER_SERVICE         20.000 TEMP
         1 538,54672  USER_1      osusr1   30383  WEB_USER_SERVICE         27.000 TEMP
         1 1258,36144 USER_1      osusr1   5745   WEB_USER_SERVICE         31.000 TEMP
         1 1351,56539 USER_2       root      29910  WEB_USER_SERVICE         67.000 TEMP
         1 1113,59480 USER_2       root      11397  WEB_USER_SERVICE         78.000 TEMP
         1 30,4102    USER_2       root      15244  WEB_USER_SERVICE         78.000 TEMP
         1 892,14168  USER_2       root      19971  WEB_USER_SERVICE         79.000 TEMP
         1 699,56288  USER_1      osusr1   11240  WEB_USER_SERVICE        668.000 TEMP
         1 1501,55691 USER_1      osusr1   11236  WEB_USER_SERVICE        685.000 TEMP
         1 1396,7154  USER_1      osusr1   12526  WEB_USER_SERVICE        808.000 TEMP
         1 233,46662  USER_1      osusr1   11334  WEB_USER_SERVICE        810.000 TEMP
         1 1004,14127 USER_1      osusr1   24287  WEB_USER_SERVICE        832.000 TEMP
         1 1196,820   USER_1      osusr1   12530  WEB_USER_SERVICE        868.000 TEMP
         1 1121,1990  USER_1      osusr1   24290  WEB_USER_SERVICE        893.000 TEMP
         1 297,2940   USER_1      osusr1   8910   WEB_USER_SERVICE        903.000 TEMP
         1 294,39051  USER_1      osusr1   15116  WEB_USER_SERVICE        924.000 TEMP
         1 37,46739   USER_1      osusr1   15118  WEB_USER_SERVICE        932.000 TEMP
         1 414,31439  USER_1      osusr1   8877   WEB_USER_SERVICE        935.000 TEMP
         1 913,40598  USER_1      osusr1   9710   WEB_USER_SERVICE       1064.000 TEMP
         1 710,44883  USER_1      osusr1   9703   WEB_USER_SERVICE       1115.000 TEMP
         1 147,16701  USER_1      osusr1   9295   WEB_USER_SERVICE       1137.000 TEMP
         1 1446,25613 USER_1      osusr1   12372  WEB_USER_SERVICE       1139.000 TEMP
         1 721,5152   USER_1      osusr1   12728  WEB_USER_SERVICE       1183.000 TEMP
         1 1295,31011 USER_1      osusr1   32078  WEB_USER_SERVICE       1208.000 TEMP
         1 14,60566   USER_1      osusr1   8236   WEB_USER_SERVICE       1209.000 TEMP
         1 29,64998   USER_1      osusr1   9292   WEB_USER_SERVICE       1318.000 TEMP

28 rows selected.

PROD1> check active sessions

PROD1> @swact

    SID STATE   EVENT                                     SEQ# SEC_IN_STATE         P1         P2         P3 SQL_ID         CHN
------- ------- ---------------------------------------- ----- ------------ ---------- ---------- ---------- ------------- ----
   1082 WAITING Streams AQ: waiting for messages in the  34587            1       8560 2.2692E+10          5
    390 WAITING wait for unread message on broadcast cha 52505            0 2.2570E+10 2.3036E+10          0 2b064ybzkwf1y    0
    888 WORKING On CPU / runqueue                        10058            0 1413697536          1          0 dgcp1b9y4h89a    0

PROD1> --none of the sessions were active

According to my Oracle knowledge, this should not has happened once the operation finishes (commit rollback) ,Oracle should has released the space. All of these sessions were all specific to one product group and service. Question was if this was a permanent problem which we never realized or something was going wrong ?
I checked back my old spool files for temporary checks (thanks to Tanel Poder who thought us how to spool every action- check init.sql from his script library) and saw that sessions with same username and service name were also holding big temp back in september and November like below. (These checks were probably done when we hit temp issue on node-1 or when I was checking the script so I did not pay attention if those sessions were active or not (this can be counted as invalid dba exception for me :))

From november – before 11.2.0.1 upgrade from 10.2.0.4


PROD3> @temp_usageg

#####################################################################
#######################GLOBAL TEMP USAGE#############################
#####################################################################

   INST_ID TABLESPACE         MB_TOTAL     MB_USED    MB_FREE
---------- --------------- ----------- ----------- ----------
         1 TEMP              20480.000     119.000      20361
         2 TEMP              20480.000        .000      20480
         3 TEMP              20480.000    8029.000      12451 ==>right node

#####################################################################
#######################GLOBAL TEMP USERS-over 10MB ##################
#####################################################################

   INST_ID SID_SERIAL USERNAME   OSUSER     SPID   PROGRAM             MB_USED TABLESPACE
---------- ---------- ---------- ---------- ------ --------------------------- ------------

         3 484,26686  USER_1       osusr1   6798    WEB_USER_SERVICE        109.000 TEMP
         3 898,53562  USER_1       osusr1   7272    WEB_USER_SERVICE        127.000 TEMP
         3 986,27409  USER_1       osusr1   5644    WEB_USER_SERVICE        147.000 TEMP
         3 771,4399   USER_1       osusr1   23613   WEB_USER_SERVICE        148.000 TEMP
         3 946,18709  USER_1       osusr1   12065   WEB_USER_SERVICE        163.000 TEMP
         3 378,20452  USER_1       osusr1   5646    WEB_USER_SERVICE        163.000 TEMP
         3 1007,23777 USER_1       osusr1   23413   WEB_USER_SERVICE        200.000 TEMP
         3 391,37179  USER_1       osusr1   9245    WEB_USER_SERVICE        701.000 TEMP
	 3 768,23493  USER_1       osusr1   12829   WEB_USER_SERVICE        904.000 TEMP
         3 607,13902  USER_1       osusr1   26307   WEB_USER_SERVICE       1000.000 TEMP
         3 475,32196  USER_1       osusr1   8460    WEB_USER_SERVICE       1010.000 TEMP
         3 593,58443  USER_1       osusr1   31126   WEB_USER_SERVICE       1063.000 TEMP
         3 879,51688  USER_1       osusr1   12831   WEB_USER_SERVICE       1113.000 TEMP
         3 664,40877  USER_1       osusr1   8470    WEB_USER_SERVICE       1136.000 TEMP

18 rows selected.

From October – before 11.2.0.1 upgrade from 10.2.0.4


PROD3> @temp_usage

TABLESPACE         MB_TOTAL     MB_USED    MB_FREE
--------------- ----------- ----------- ----------
TEMP              20480.000    7899.000      12581

#####################################################################
#######################GLOBAL TEMP USERS-over 10MB ##################
#####################################################################

SID_SERIAL USERNAME        OSUSER          SPID    PROGRAM             MB_USED TABLESPACE
---------- --------------- --------------- ------- --------------------------- -----------

463,42814  USER_1            osusr1        1447     WEB_USER_SERVICE         17.000 TEMP
1008,59864 USER_1            osusr1        16932    WEB_USER_SERVICE         17.000 TEMP
594,37070  USER_1            osusr1        6451     WEB_USER_SERVICE         20.000 TEMP
512,40923  USER_1            osusr1        16934    WEB_USER_SERVICE         25.000 TEMP
789,37294  USER_1            osusr1        9483     WEB_USER_SERVICE        731.000 TEMP
893,63174  USER_1            osusr1        6018     WEB_USER_SERVICE        815.000 TEMP
583,54040  USER_1            osusr1        23859    WEB_USER_SERVICE        933.000 TEMP
675,37736  USER_1            osusr1        9487     WEB_USER_SERVICE        984.000 TEMP
737,58648  USER_1            osusr1        25274    WEB_USER_SERVICE       1035.000 TEMP
831,46843  USER_1            osusr1        25272    WEB_USER_SERVICE       1062.000 TEMP
973,64549  USER_1            osusr1        10270    WEB_USER_SERVICE       1109.000 TEMP
774,18708  USER_1            osusr1        6625     WEB_USER_SERVICE       1133.000 TEMP

As you can see USER_1 from WEB_USER_SERVICE on the 3th (preferred) node was always using high temp.

Next question is why we did not have this issue before on Node-1? Once we check in detail we saw services of this product group were running on the wrong node (relocated due to a failure just before change freeze) which is our main OLTP and batch node. We already knew the issue (thanks to our service check script) but it was change freeze time and we did not have chance to move service to the right nodes. When I check with the product developer, I have been told they usually reboot their web servers every 3 weeks for maintenance. This was the obvious answer to the question why we did not have the issue before. We never run this service on the OLTP node and we never let web servers hit the limits even when they are running on their own node.

Since we gave answer to our curiosity questions, now it was time for the real question “Why was Oracle not releasing the temp space back ?” On 11GR2 we have a nice column called temp_space_allocated in V$ACTIVE_SESSION_HISTORY and DBA_HIST_ACTIVE_SESS_HISTORY tables and I am lucky that we have diagnostics pack license.

Let see what the trend is and see if some sqls are the main cause. The output below is taken when the issue happened.


SQL> select sample_time,event,sql_id,is_sqlid_current,temp_space_allocated/1024/1024 TEMP_MB
 1 from v$active_session_history where session_id=29;

SAMPLE_TIME             EVENT                     SQL_ID        I TEMP_MB
----------------------- ------------------------- ------------- - -----
04-JAN-11 10.18.09.591                            akgmrn2bww2p5 Y  1319
04-JAN-11 09.50.31.125  db file sequential read   73adfqhtsd909 Y  1319
04-JAN-11 09.46.29.488                            0gaghgkkb4qyc Y  1318
04-JAN-11 08.58.14.930  db file sequential read   73adfqhtsd909 Y  1318
04-JAN-11 08.13.09.845                            0gaghgkkb4qyc Y  1318
04-JAN-11 07.46.01.586  db file sequential read   ds7zkf3vhtms0 Y  1318
04-JAN-11 07.33.05.592  row cache lock            4tcbsv32h1xh9 Y  1317
04-JAN-11 06.49.49.772  db file sequential read   9s48k84m7kt36 Y  1317
04-JAN-11 06.49.48.772  db file sequential read   734fasjm6rfu8 Y  1316
04-JAN-11 06.49.47.762  db file sequential read   734fasjm6rfu8 Y  1316
04-JAN-11 06.49.45.762  db file sequential read   9s48k84m7kt36 Y  1315
04-JAN-11 06.49.44.762  log file sync                           Y  1314
04-JAN-11 06.18.09.894                            0gaghgkkb4qyc Y  1314
04-JAN-11 06.03.21.508                            akgmrn2bww2p5 Y  1314
04-JAN-11 04.18.20.232                            11chc5v8rkusd Y  1313
04-JAN-11 02.28.14.858                            73adfqhtsd909 Y  1312
04-JAN-11 02.08.13.726                            54hw5asg19hpt Y  1312
04-JAN-11 01.00.25.381                            akgmrn2bww2p5 Y  1312
04-JAN-11 00.58.15.042  db file sequential read   9s48k84m7kt36 Y  1312
04-JAN-11 00.00.02.040  direct path read temp     9tgtnsr31xsv5 Y  1312
03-JAN-11 23.52.33.856                            ds7zkf3vhtms0 Y  1301
03-JAN-11 22.33.14.607                            g7sy7a6ynn9r1 Y  1295
03-JAN-11 22.30.07.101  direct path read          9tgtnsr31xsv5 Y  1295

Because we did not sort the issue yet I also have the current picture of our system with a better script which accepts session sid and serial and uses lag function for the temp_space_allocated differences between sample times. (output is truncated)


PROD3> set echo on
PROD3> @temp_hist 603 8375
DBA_HIST_ACTIVE_SESS_HISTORY

PROD3>  select sample_time,session_id,session_serial#,sql_id,temp_space_allocated/1024/1024 temp_mb,
  2   temp_space_allocated/1024/1024-lag(temp_space_allocated/1024/1024,1,0) over (order by sample_time) as temp_diff
  3   from dba_hist_active_sess_history
  4   --from v$active_session_history
  5   where
  6   session_id=&1
  7   and session_serial#=&2
  8   order by sample_time asc
  9   /

SAMPLE_TIME               SESSION_ID SESSION_SERIAL# SQL_ID          TEMP_MB  TEMP_DIFF
------------------------- ---------- --------------- ------------- --------- ----------
22-JAN-11 13.00.51.214           603            8375                       1          1
22-JAN-11 15.14.29.430           603            8375 29kz43xq2s5xc         1          0
22-JAN-11 15.14.39.450           603            8375 734fasjm6rfu8         3          2
22-JAN-11 15.14.49.470           603            8375 734fasjm6rfu8         8          5
22-JAN-11 15.14.59.490           603            8375 734fasjm6rfu8        13          5
22-JAN-11 15.15.09.520           603            8375 9s48k84m7kt36        18          5
22-JAN-11 15.15.19.540           603            8375 734fasjm6rfu8        23          5
22-JAN-11 15.15.29.567           603            8375 9s48k84m7kt36        29          6
22-JAN-11 16.35.50.566           603            8375 bx5zxp1a5wb55        29          0
22-JAN-11 19.45.45.831           603            8375 73adfqhtsd909        32          3
22-JAN-11 21.40.40.613           603            8375 g7sy7a6ynn9r1        32          0
23-JAN-11 06.30.35.544           603            8375 ds7zkf3vhtms0        33          1
23-JAN-11 15.18.37.539           603            8375 9d618ka6h2hvz        35          2
23-JAN-11 15.18.47.583           603            8375 2du821ps20a4c        35          0
23-JAN-11 15.18.57.593           603            8375 29kz43xq2s5xc        35          0
23-JAN-11 15.19.07.623           603            8375                      39          4
23-JAN-11 15.19.17.643           603            8375 734fasjm6rfu8        44          5
23-JAN-11 15.19.27.663           603            8375 ds7zkf3vhtms0        49          5
23-JAN-11 15.19.37.683           603            8375 734fasjm6rfu8        55          6
23-JAN-11 15.19.47.713           603            8375 9s48k84m7kt36        61          6
23-JAN-11 15.19.57.733           603            8375 7902sa7n37sv9        62          1
23-JAN-11 22.00.39.071           603            8375 basj9jchbmu4q        63          1
24-JAN-11 05.00.45.895           603            8375 apmkt8bvrpyk0        70          7
24-JAN-11 13.00.11.180           603            8375 9tgtnsr31xsv5        86         16

24 rows selected.

ACTIVE_SESS_HIST

PROD3>  select sample_time,session_id,session_serial#,sql_id,temp_space_allocated/1024/1024 temp_mb,
  2   temp_space_allocated/1024/1024-lag(temp_space_allocated/1024/1024,1,0) over (order by sample_time) as temp_diff
  3   --from dba_hist_active_sess_history
  4   from v$active_session_history
  5   where
  6   session_id=&1
  7   and session_serial#=&2
  8   order by sample_time asc
  9   /

SAMPLE_TIME               SESSION_ID SESSION_SERIAL# SQL_ID          TEMP_MB  TEMP_DIFF
------------------------- ---------- --------------- ------------- --------- ----------
23-JAN-11 05.30.39.251           603            8375                      33         33
23-JAN-11 06.00.35.429           603            8375 73adfqhtsd909        33          0
23-JAN-11 06.10.45.871           603            8375 g7sy7a6ynn9r1        33          0
23-JAN-11 06.30.35.544           603            8375 ds7zkf3vhtms0        33          0
23-JAN-11 06.40.50.971           603            8375                      33          0
23-JAN-11 07.00.40.613           603            8375 g7sy7a6ynn9r1        33          0
23-JAN-11 07.01.21.713           603            8375 ds7zkf3vhtms0        33          0
23-JAN-11 07.10.45.941           603            8375 1rsg7wva9pyxd        33          0
23-JAN-11 08.00.45.805           603            8375 73adfqhtsd909        33          0
23-JAN-11 08.15.35.931           603            8375 ds7zkf3vhtms0        33          0
23-JAN-11 09.15.39.009           603            8375                      34          1
23-JAN-11 14.15.50.763           603            8375 g7sy7a6ynn9r1        34          0
23-JAN-11 14.20.35.388           603            8375 g7sy7a6ynn9r1        34          0
23-JAN-11 14.45.45.800           603            8375 4awcvxgvdauy6        35          1
23-JAN-11 15.15.39.123           603            8375 6ubavp8u2k7xa        35          0
23-JAN-11 15.18.36.539           603            8375 d042b4z3hnra4        35          0
23-JAN-11 15.18.37.539           603            8375 9d618ka6h2hvz        35          0
23-JAN-11 15.18.38.539           603            8375 4vwy3jp3wmmz5        35          0
23-JAN-11 15.18.39.539           603            8375 4vwy3jp3wmmz5        35          0
23-JAN-11 15.18.40.549           603            8375 5jf1jgqnp993z        35          0
23-JAN-11 15.18.41.549           603            8375 5jf1jgqnp993z        35          0
23-JAN-11 15.18.42.549           603            8375 37fxhfs1p44p6        35          0
23-JAN-11 15.18.43.549           603            8375 29kz43xq2s5xc        35          0
23-JAN-11 15.18.44.559           603            8375 2du821ps20a4c        35          0
23-JAN-11 15.18.45.573           603            8375 5jf1jgqnp993z        35          0
23-JAN-11 15.18.46.573           603            8375 5jf1jgqnp993z        35          0
23-JAN-11 15.18.47.583           603            8375 2du821ps20a4c        35          0
23-JAN-11 15.18.48.583           603            8375 67kpwgzpd4qam        35          0
23-JAN-11 15.18.49.583           603            8375 5jf1jgqnp993z        35          0
23-JAN-11 15.18.50.583           603            8375 7htn6x6s336f7        35          0
23-JAN-11 15.18.51.583           603            8375 29kz43xq2s5xc        35          0
23-JAN-11 15.18.52.593           603            8375 29kz43xq2s5xc        35          0
23-JAN-11 15.18.53.593           603            8375 29kz43xq2s5xc        35          0
23-JAN-11 15.18.54.593           603            8375 29kz43xq2s5xc        35          0
23-JAN-11 15.18.55.593           603            8375 29kz43xq2s5xc        35          0
23-JAN-11 15.18.56.593           603            8375 29kz43xq2s5xc        35          0
23-JAN-11 15.18.57.593           603            8375 29kz43xq2s5xc        35          0
23-JAN-11 15.18.58.603           603            8375 2du821ps20a4c        35          0
23-JAN-11 15.18.59.603           603            8375 9s48k84m7kt36        35          0
23-JAN-11 15.19.00.603           603            8375 9s48k84m7kt36        36          1

.....
.....
.....
129 rows selected.

PROD3>

Bear in mind relying on DBA_HIST data, for detecting the sqls which caused big jumps, will not be that realistic because of the sample frequency so it is better to focus on v$active_session_history.

When we look at the v$active_session_history it like there is no obvious reason. There are some sqlids which certainly increase the temp usage but not big jumps.I did one more check before I raise the question to Oracle support and checked what type of segments these TEMP segments are ? To do this I tweaked the temp_usage.sql and add segtype column to the scripts.


PROD1> @temp_usageg

#####################################################################
#######################GLOBAL TEMP USAGE#############################
#####################################################################

   INST_ID TABLESPACE         MB_TOTAL     MB_USED    MB_FREE
---------- --------------- ----------- ----------- ----------
         3 TEMP              22528.000       7.000      22521
         2 TEMP              22528.000      56.000      22472
         1 TEMP              22528.000   19102.000       3426

#####################################################################
#######################GLOBAL TEMP USERS-over 10MB###################
#####################################################################

   INST_ID SID_SERIAL USERNAME        OSUSER          SPID   SERVICE		   SEGTYPE	MB_USED TABLESPACE
---------- ---------- --------------- --------------- ------ ---------------------------------- ---------------
         1 620,4322   USER_1         osusr1        21712  WEB_USER_SERVICE       LOB_DATA     16.000 TEMP
         1 41,35454   USER_1         osusr1        20334  WEB_USER_SERVICE       LOB_DATA     19.000 TEMP
         1 529,21034  USER_1         osusr1        5776   WEB_USER_SERVICE       LOB_DATA     20.000 TEMP
         1 538,54672  USER_1         osusr1        30383  WEB_USER_SERVICE       LOB_DATA     27.000 TEMP
         1 1258,36144 USER_1         osusr1        5745   WEB_USER_SERVICE       LOB_DATA     31.000 TEMP
         1 1351,56539 USER_2          root           29910  WEB_USER_SERVICE       LOB_DATA     67.000 TEMP
         1 1113,59480 USER_2          root           11397  WEB_USER_SERVICE       LOB_DATA     78.000 TEMP
         1 30,4102    USER_2          root           15244  WEB_USER_SERVICE       LOB_DATA     78.000 TEMP
         1 892,14168  USER_2          root           19971  WEB_USER_SERVICE       LOB_DATA     79.000 TEMP
         1 699,56288  USER_1         osusr1        11240  WEB_USER_SERVICE       LOB_DATA    668.000 TEMP
         1 1501,55691 USER_1         osusr1        11236  WEB_USER_SERVICE       LOB_DATA    685.000 TEMP
         1 1396,7154  USER_1         osusr1        12526  WEB_USER_SERVICE       LOB_DATA    808.000 TEMP
         1 233,46662  USER_1         osusr1        11334  WEB_USER_SERVICE       LOB_DATA    810.000 TEMP
         1 1004,14127 USER_1         osusr1        24287  WEB_USER_SERVICE       LOB_DATA    832.000 TEMP
         1 1196,820   USER_1         osusr1        12530  WEB_USER_SERVICE       LOB_DATA    868.000 TEMP
         1 1121,1990  USER_1         osusr1        24290  WEB_USER_SERVICE       LOB_DATA    893.000 TEMP
         1 297,2940   USER_1         osusr1        8910   WEB_USER_SERVICE       LOB_DATA    903.000 TEMP
         1 294,39051  USER_1         osusr1        15116  WEB_USER_SERVICE       LOB_DATA    924.000 TEMP
         1 37,46739   USER_1         osusr1        15118  WEB_USER_SERVICE       LOB_DATA    932.000 TEMP
         1 414,31439  USER_1         osusr1        8877   WEB_USER_SERVICE       LOB_DATA    935.000 TEMP
         1 913,40598  USER_1         osusr1        9710   WEB_USER_SERVICE       LOB_DATA   1064.000 TEMP
         1 710,44883  USER_1         osusr1        9703   WEB_USER_SERVICE       LOB_DATA   1115.000 TEMP
         1 147,16701  USER_1         osusr1        9295   WEB_USER_SERVICE       LOB_DATA   1137.000 TEMP
         1 1446,25613 USER_1         osusr1        12372  WEB_USER_SERVICE       LOB_DATA   1139.000 TEMP
         1 721,5152   USER_1         osusr1        12728  WEB_USER_SERVICE       LOB_DATA   1183.000 TEMP
         1 1295,31011 USER_1         osusr1        32078  WEB_USER_SERVICE       LOB_DATA   1208.000 TEMP
         1 14,60566   USER_1         osusr1        8236   WEB_USER_SERVICE       LOB_DATA   1209.000 TEMP
         1 29,64998   USER_1         osusr1        9292   WEB_USER_SERVICE       LOB_DATA   1318.000 TEMP

28 rows selected.

My initial thought was temp tablespaces but I was wrong, it looks like all temp allocators which do not release the space are LOB segments. I also checked the sqls in the ash trend and it reveals sql_id’s which has got lob operations (some of them are packages). Double checked with developer and got the answer that they use dbms_lob.

Now it is time to look for bug, First hit on metalink “How to Release the Temp LOB Space and Avoid Hitting ORA-1652 [ID 802897.1]“. It says issue is for releases 8.1.5.0 to 11.1.0.7 but we are on 11.2.0.1. This document says there are 2 workarounds for versions above 10.2.0.4, first workaround is basically use
dbms_lob.freetemporary(a) but still drop the connection which is not a workaround at all.

Second solution is using the event 60025 to force session to freed space. The event information is like below

Cause
Temp LOB segments used for temporary LOBs are deleted only on session exit which may lead to large amounts of memory being held across multiple sessions.

Action
Setting this event will cause temporary LOB segments to be freed when there are no active temporary LOBs in the session. Setting this event will have a significant performance impact as it can cause temporary lob segments to be allocated and deleted many times during a session rather than once per session. Use this event only when temporary LOB segment memory use is an issue.

When I try the event it works (not immediately but when you call the allocation again) like below. I raised a call with Oracle before we implement the event change (probably with logon trigger) and still working with them to see if there is a one-off patch for the issue. If Oracle could not come up with one-off patch we will use the event.


---Before the session runs

14:23:00 SYS@ORACOS> @temp_usage

#####################################################################
#######################LOCAL TEMP USAGE#############################
#####################################################################

TABLESPACE         MB_TOTAL     MB_USED    MB_FREE
--------------- ----------- ----------- ----------
TEMP               2697.000        .000       2697

#####################################################################
#######################LOCAL TEMP USERS#############################
#####################################################################

no rows selected

---Session runs the dbms_lob

HR@ORACOS> @temp_test
14:23:10 HR@ORACOS>
14:23:10 HR@ORACOS> DECLARE
14:23:10   2  a clob;
14:23:10   3  BEGIN
14:23:10   4  dbms_lob.createtemporary(a, TRUE,dbms_lob.call);
14:23:10   5  END;
14:23:10   6  /

PL/SQL procedure successfully completed.

14:23:10 HR@ORACOS>
14:23:10 HR@ORACOS> commit;

Commit complete.

14:23:10 HR@ORACOS>
14:23:10 HR@ORACOS> ---check the temp usage on another session
14:23:10 HR@ORACOS> exec dbms_lock.sleep(15);

---monitoring session

14:23:01 SYS@ORACOS> @temp_usage

#####################################################################
#######################LOCAL TEMP USAGE#############################
#####################################################################

TABLESPACE         MB_TOTAL     MB_USED    MB_FREE
--------------- ----------- ----------- ----------
TEMP               2697.000       1.000       2696

#####################################################################
#######################LOCAL TEMP USERS#############################
#####################################################################

SID_SERIAL USERNAME          OSUSER          SPID  SERVICE_NAME        MB_USED TABLESPACE      STATEMENTS
---------- ----------------- --------------- --------------------- ----------- --------------- ----------
200,7236   HR                coskan          22101 SYS$USERS             1.000 TEMP                     1

---session activates the event

14:23:25 HR@ORACOS> ---enable the event
14:23:25 HR@ORACOS> alter session set events '60025 trace name context forever';

Session altered.

14:23:25 HR@ORACOS>
14:23:25 HR@ORACOS> ---check the temp usage on another session
14:23:25 HR@ORACOS> exec dbms_lock.sleep(15);

---monitoring session

14:23:15 SYS@ORACOS> @temp_usage

#####################################################################
#######################LOCAL TEMP USAGE#############################
#####################################################################

TABLESPACE         MB_TOTAL     MB_USED    MB_FREE
--------------- ----------- ----------- ----------
TEMP               2697.000       1.000       2696

#####################################################################
#######################LOCAL TEMP USERS#############################
#####################################################################

SID_SERIAL USERNAME          OSUSER          SPID   SERVICE_NAME        MB_USED TABLESPACE      STATEMENTS
---------- ----------------- --------------- ------ --------------- ----------- --------------- ----------
200,7236   HR                coskan          22101  SYS$USERS             1.000 TEMP                     1

---session activates the event and run the dbms_lob again

14:23:40 HR@ORACOS> ---re run the procedure
14:23:40 HR@ORACOS> DECLARE
14:23:40   2  a clob;
14:23:40   3  BEGIN
14:23:40   4  dbms_lob.createtemporary(a, TRUE,dbms_lob.call);
14:23:40   5  END;
14:23:40   6  /

PL/SQL procedure successfully completed.

14:23:40 HR@ORACOS>
14:23:40 HR@ORACOS> commit;

Commit complete.

14:23:40 HR@ORACOS>
14:23:40 HR@ORACOS> ---check the temp usage on another session
14:23:40 HR@ORACOS> exec dbms_lock.sleep(15);

--monitoring session
14:23:29 SYS@ORACOS> @temp_usage

#####################################################################
#######################LOCAL TEMP USAGE#############################
#####################################################################

TABLESPACE         MB_TOTAL     MB_USED    MB_FREE
--------------- ----------- ----------- ----------
TEMP               2697.000        .000       2697

#####################################################################
#######################LOCAL TEMP USERS#############################
#####################################################################

no rows selected

no rows selected

14:23:44 SYS@ORACOS>

In summary With this post I walk through how you can diagnose a temp issue and how new temp_allocated column of v$session and history tables can be very helpful during your diagnose.
Hope it will help some of you guys.

Scripts I used for this post

temp_usageg.sql for global temp usage info in RAC
temp_usage.sql for global temp usage info
temp_history.sqlto get historic information for a spesific sid,serial

Information about bug for sqlid column of V$TEMPSEG_USAGE / v$sort_usage by Kerry Osborne

Update 08/10/2012

Related info on another blog is also here for usage of WMSYS.WM_CONCAT and temp allocation

 

http://asanga-pradeep.blogspot.co.uk/2012/10/use-of-wmconcat-can-exhaust-temporary.html

January 17, 2011

11.2.0.2 performance stories

Filed under: Bugs, Performance — coskan @ 12:52 pm

As you might remember from the previous post this Saturday we upgraded one of our databases from 11.2.0.1 to 11.2.0.2. The performance of this DB was not always stable but it was running fine before the upgrade.

As usual we came on Monday and start post upgrade firefighting. server seems to use more CPU then it was using. When I checked the queries which were on CPU, they were nearly all changed their plans (I have awr option to check the plan changes) and interestingly Oracle suddenly decided not to use unique primary keys but other indexes with skip scan option.

What could be done to understand the problem ?

1- build a test case
2- run the test case with 11.2.0.1 and 11.2.0.2 optimizer_features_enable
3- If the behaviour is different then find the bottom of the problem by digging which optimizer_fix could have caused the problem ?

Building a test case is very easy thanks to Kerry Osborne and his amazingly useful (I have no idea how many times it made my day) build_bind_vars script. Normally building a test case for a sql which has 100 bind variables is a nightmare but with that script its just a second.

Once I build the test case I run the sql with different optimizer_features_enable settings and found out that sql goes back to its previous good plan when we run it with 11.2.0.1 optimizer settings

Since we found out what caused the problem in general, we can either move forward to 3th step to find bottom of the problem or change optimizer_features_enable to 11.2.0.1 system wide. We chose the first option to see if it is our main problem and it gave us stability across the system.

Was this acceptable workaround ? Not for me or for business. We upgraded the database and now why not using the 11.2.0.2 features. We needed to find out what fix actually caused.

To find out the actual problem we have two brilliant view called v$system_fix_control and v$session_fix_control.

Looks like we have 105 options to check

SQL> select optimizer_feature_enable,count(*) from v$system_fix_control group by optimizer_feature_enable order by 1 asc;

OPTIMIZER_FEATURE_ENABLE    COUNT(*)
------------------------- ----------
10.1.0                             3
10.1.0.3                           1
10.1.0.5                           2
10.2.0.1                           7
10.2.0.2                          12
10.2.0.3                          11
10.2.0.4                          73
10.2.0.5                         111
11.1.0.6                          39
11.1.0.7                          19
11.2.0.1                          65
11.2.0.2                         105
8.0.0                             52
8.1.6                              1
8.1.7                              2
9.2.0                              8
9.2.0.8                            3
                                  37

Lets see what they are

SQL>  select * from v$system_fix_control where optimizer_feature_enable='11.2.0.2';

     BUGNO      VALUE SQL_FEATURE                        DESCRIPTION                                                      OPTIMIZER_FEATURE
---------- ---------- ---------------------------------- ---------------------------------------------------------------- -----------------
   8602840          1 QKSFM_CBO_8602840                  kkeRangeJoinSel should avoid using collapsed min/max column stat 11.2.0.2
   8725296          1 QKSFM_OR_EXPAND_8725296            try non-driver predicate only if advanced check requested        11.2.0.2
   8628970          1 QKSFM_CBO_8628970                  group  predicates like col op const  of 1 NDV col                11.2.0.2
   6754080          1 QKSFM_ACCESS_PATH_6754080          enable expression replacement thru views                         11.2.0.2
   8767442          1 QKSFM_DYNAMIC_SAMPLING_8767442     compute NDV for all columns in a multi-column join key using DS  11.2.0.2
   8760135          1 QKSFM_CBO_8760135                  allow compile-time peeking of CURRENT_DATE / CURRENT_TIMESTAMP   11.2.0.2
   8644935          1 QKSFM_PQ_8644935                   new parallelization of concat                                    11.2.0.2
   8352378          1 QKSFM_CBO_8352378                  Allow virtual columns in OR expansion index access               11.2.0.2
   8685327          1 QKSFM_ACCESS_PATH_8685327          Heuristic to enable full index scan used if table & index analyz 11.2.0.2
   8763472          1 QKSFM_PARTITION_8763472            enable parallel nested loop PWJ for ref-part tables              11.2.0.2
   8773324          1 QKSFM_CBO_8773324                  better recognition of index only plans with virtual columns      11.2.0.2
   8813674          1 QKSFM_TRANSFORMATION_8813674       relax restriction for non-nullness check                         11.2.0.2
   8629716          1 QKSFM_CBO_8629716                  enable early replacement for virtual columns                     11.2.0.2
   7277732          1 QKSFM_CBO_7277732                  allow skip scan costing for NL with non-join predicate           11.2.0.2
   8692170          1 QKSFM_TRANSFORMATION_8692170       do not apply JF if it prevents JPPD                              11.2.0.2
   8900973          1 QKSFM_TRANSFORMATION_8900973       relax restriction on inequality join for group-by placement (GBP 11.2.0.2
   8919133          1 QKSFM_CBO_8919133                  correct nested OR costing bug                                    11.2.0.2
   8551880          1 QKSFM_CBO_8551880                  Rewrite decode predicate to join                                 11.2.0.2
   8901237          1 QKSFM_TRANSFORMATION_8901237       correct search of state-space for group-by placement (GBP)       11.2.0.2
   6236862          1 QKSFM_FIRST_ROWS_6236862           account for partition-extended names in first rows optimization  11.2.0.2
   8528517          1 QKSFM_TRANSFORMATION_8528517       early query edit checks use weak typecheck expression comparison 11.2.0.2
   7215982          1 QKSFM_UNNEST_7215982               unnest subquery embedded inside an expression                    11.2.0.2
   8214022          1 QKSFM_UNNEST_8214022               perform additional CBQT phase for subquery unnesting             11.2.0.2
   8595392          1 QKSFM_CBO_8595392                  reject join predicate pushdown if parallel access path selected  11.2.0.2
   8890233          1 QKSFM_PQ_8890233                   enable pushing bloom filter through NLJ                          11.2.0.2
   8999317          1 QKSFM_UNNEST_8999317               correct cost comparison for the additional phase for JPPD        11.2.0.2
   8986163          1 QKSFM_PQ_8986163                   align partition-wise gby DOP requirements with partition-wise jo 11.2.0.2
   8855396          1 QKSFM_ACCESS_PATH_8855396          sanity check for skip scan costing                               11.2.0.2
   8800514         20 QKSFM_JPPD_8800514                 threshold of table count to disable extended JPPD                11.2.0.2
   9007859          1 QKSFM_CBO_9007859                  Context index to be picked on XMLType Columns                    11.2.0.2
   9053879          1 QKSFM_CBO_9053879                  search all query blocks for replacement candidates               11.2.0.2
   6086930          1 QKSFM_ACCESS_PATH_6086930          correct skip scan selectivity evaluation for BETWEEN predicate   11.2.0.2
   7641601          1 QKSFM_CARDINALITY_7641601          treat a single value column as having a frequency histogram      11.2.0.2
   9052506          1 QKSFM_UNNEST_9052506               Allow NAAJ for UPDATE and DELETE                                 11.2.0.2
   9047975          1 QKSFM_CBO_9047975                  improve selectivity for truncated character strings              11.2.0.2
   8893626          0 QKSFM_ACCESS_PATH_8893626          apply index filter selectivity during skip scan costing          11.2.0.2
   9111170          1 QKSFM_TRANSFORMATION_9111170       allow subquery to appear on left side when generating inline vie 11.2.0.2
   8971829          1 QKSFM_ACCESS_PATH_8971829          set KKEGIXSL for unpartitioned global index                      11.2.0.2
   9125151          1 QKSFM_STATS_9125151                produce more accurate stats for olap when stats has not been run 11.2.0.2
   9106224          1 QKSFM_PQ_9106224                   allow range distribution in create of global partitioned index   11.2.0.2
   9185228          1 QKSFM_UNNEST_9185228               move correlated filters from subquery to outer query             11.2.0.2
   9206747          1 QKSFM_CBO_9206747                  share metadata for virtual columns when making copy              11.2.0.2
   9088510          1 QKSFM_TRANSFORMATION_9088510       compress predicate tree before cost-based query transformation   11.2.0.2
   9143856          1 QKSFM_TRANSFORMATION_9143856       uncorrelated OR-ed unary predicates are OK for unnesting         11.2.0.2
   8949971          1 QKSFM_STATS_8949971                make stats for leaf level data appear very large in COTs         11.2.0.2
   9148171          1 QKSFM_TRANSFORMATION_9148171       allow distinct elim & distinct aggr transform for non-select stm 11.2.0.2
   8706652          1 QKSFM_CBO_8706652                  fix for 7449971 not complete                                     11.2.0.2
   9245114          1 QKSFM_TABLE_ELIM_9245114           eliminate redundant join predicates in join elimination          11.2.0.2
   9011016          1 QKSFM_ACCESS_PATH_9011016          use 1/NDV+1/NROWS for col1 LIKE col2 selectivities (index driver 11.2.0.2
   9265681          1 QKSFM_CARDINALITY_9265681          sanity check for derived ndv/cdn for inner join on range         11.2.0.2
   7284269          1 QKSFM_CURSOR_SHARING_7284269       extended cursor sharing for like predicates                      11.2.0.2
   9272549          1 QKSFM_DYNAMIC_SAMPLING_9272549     do not sample columns which have statistics                      11.2.0.2
   8531463          1 QKSFM_ACCESS_PATH_8531463          cost cutoff for bitmap OR chains based on best table access cost 11.2.0.2
   9263333          1 QKSFM_CBO_9263333                  generate transitive predicates for virtual column expressions    11.2.0.2
   8896955          1 QKSFM_TRANSFORMATION_8896955       interleave TE with ST and JF                                     11.2.0.2
   9041934          1 QKSFM_EXECUTION_9041934            use selected measures for AW LOOP OPTIMIZED looping strategy     11.2.0.2
   9344709          1 QKSFM_PQ_9344709                   disable parallel execution for a qb with a constant false pred   11.2.0.2
   9024933          1 QKSFM_JPPD_9024933                 Do not allow Old JPPD for OJ view with window function           11.2.0.2
   9033718          1 QKSFM_TABLE_EXPANSION_9033718      remove restriction on bind variables for table expansion         11.2.0.2
   9081848          1 QKSFM_CBO_9081848                  don't use fake index stats as extended stats                     11.2.0.2
   5982893          1 QKSFM_SQL_CODE_GENERATOR_5982893   compact row vector of colocated join                             11.2.0.2
   9287401          1 QKSFM_TRANSFORMATION_9287401       full outer join to outer join conversion                         11.2.0.2
   8590021          1 QKSFM_CBO_8590021                  using col stats for pred NVL()  const selectivity estimation 11.2.0.2
   9340120          1 QKSFM_CBO_9340120                  derive stats for sys generated UA view selectivity estimation    11.2.0.2
   9355794          1 QKSFM_CBO_9355794                  clear sort merge joins paths noted in apafjo                     11.2.0.2
   9385634          1 QKSFM_EXECUTION_9385634            always return error for null end point expression                11.2.0.2
   9069046          1 QKSFM_CBO_9069046                  amend histogram column tracking for multicolumn stats            11.2.0.2
   9239337          1 QKSFM_CBO_9239337                  eliminate unreferenced subqueries after view merging             11.2.0.2
   9298010          1 QKSFM_PARTITION_9298010            enable pruning for partitioned IOT rowid predicates              11.2.0.2
   8836806          1 QKSFM_CBO_8836806                  push rownum predicate into sortable domain index                 11.2.0.2
   9344055          1 QKSFM_CBO_9344055                  Control the memory used during query optimization                11.2.0.2
   9203723          1 QKSFM_SQL_CODE_GENERATOR_9203723   allow bloom pruning and bloom filtering on the same join         11.2.0.2
   9443476          1 QKSFM_CBO_9443476                  set OPNF2NOSEL bit for auto generated virtual column predicates  11.2.0.2
   9195582          1 QKSFM_ACCESS_PATH_9195582          leaf blocks as upper limit for skip scan blocks                  11.2.0.2
   9433490          1 QKSFM_CBO_9433490                  enable VC replacement for nest operator                          11.2.0.2
   9303766          1 QKSFM_ACCESS_PATH_9303766          use 1/NDV+1/NROWS for col1 LIKE col2 selectivities (table access 11.2.0.2
   9437283          1 QKSFM_CBO_9437283                  Do not consider NLJ from JPPD when checking for FCP              11.2.0.2
   9116214          1 QKSFM_ACCESS_PATH_9116214          index filter ordering                                            11.2.0.2
   9456688          1 QKSFM_ACCESS_PATH_9456688          account for to_number/to_char cost after temp conversion         11.2.0.2
   9342979          1 QKSFM_CURSOR_SHARING_9342979       disable cardinality feedback for old JPPD                        11.2.0.2
   9465425          1 QKSFM_CURSOR_SHARING_9465425       correct hint for index join from cardinality feedback            11.2.0.2
   9092442          1 QKSFM_DML_9092442                  equi-partition load                                              11.2.0.2
   4926618          1 QKSFM_EXECUTION_4926618            do not use hash unique for subqueries in update-set expression   11.2.0.2
   8792846          1 QKSFM_STAR_TRANS_8792846           use _optimizer_star_trans_min_ratio even if ST not hinted        11.2.0.2
   9474259          1 QKSFM_CBO_9474259                  use cdn sanity check when unique colgroup on both sides of join  11.2.0.2
   6472966          1 QKSFM_PARTITION_6472966            load and use statistics for hash sub-partitions                  11.2.0.2
   6408301          1 QKSFM_DYNAMIC_SAMPLING_6408301     use recursive idx selectivity for partitioned table as well      11.2.0.2
   8500130          1 QKSFM_EXECUTION_8500130            enable rownum optimization without partition pushup              11.2.0.2
   9584723          1 QKSFM_INDEX_JOIN_9584723           enable functional indexes for index join                         11.2.0.2
   9593680          1 QKSFM_CBO_9593680                  fix typo in cost computation of subquery filters                 11.2.0.2
   9309281          1 QKSFM_JPPD_9309281                 outer join JPPD allowed for function-based index access path     11.2.0.2
   8693158          1 QKSFM_JPPD_8693158                 consider pushing if only shared predicates to push               11.2.0.2
   9381638          1 QKSFM_PQ_9381638                   Treat CP differently from NLJ for parallel optimizations         11.2.0.2
   9383967          1 QKSFM_COMPILATION_9383967          Allow unique (ie. select distinct) pushdown                      11.2.0.2
   7711900          1 QKSFM_CBQT_7711900                 copy query block text position fields in copy service            11.2.0.2
   9218587          1 QKSFM_CARDINALITY_9218587          Don't use column density for selectivity with 1-bucket histogram 11.2.0.2
   9728438          1 QKSFM_CBO_9728438                  don't go parallel if no operations are expensive enough          11.2.0.2
   9577300          1 QKSFM_CBO_9577300                  Improve range join selectivity for predicates with round         11.2.0.2
   9171113          1 QKSFM_CBO_9171113                  consider parallel cost for partition bitmap table access by rowi 11.2.0.2
   8973745          1 QKSFM_SQL_PLAN_MANAGEMENT_8973745  auto-capture only if literal replaced SQL parses recursively     11.2.0.2
   9102474          1 QKSFM_CBO_9102474                  use IO calibrate statistics to estimate time from cost           11.2.0.2
   9243499          1 QKSFM_CBO_9243499                  relax conditions for logical antijoin                            11.2.0.2
   9912503          1 QKSFM_TRANSFORMATION_9912503       Remove having clause subquery at all levels                      11.2.0.2
   9153459          1 QKSFM_TABLE_ELIM_9153459           allow loading of rely constraints for all statement types        11.2.0.2
   9762592          3 QKSFM_PQ_9762592                   fold bloom filter when offload to storage                        1

Lets shoot in the dark

SQL>  select * from v$system_fix_control where optimizer_feature_enable='11.2.0.2' and description like '%skip%';

     BUGNO      VALUE SQL_FEATURE                                                      DESCRIPTION                                                      OPTIMIZER_FEATURE_ENABLE
---------- ---------- ---------------------------------------------------------------- ---------------------------------------------------------------- -------------------------
   7277732          1 QKSFM_CBO_7277732                                                allow skip scan costing for NL with non-join predicate           11.2.0.2
   8855396          1 QKSFM_ACCESS_PATH_8855396                                        sanity check for skip scan costing                               11.2.0.2
   6086930          1 QKSFM_ACCESS_PATH_6086930                                        correct skip scan selectivity evaluation for BETWEEN predicate   11.2.0.2
   8893626          1 QKSFM_ACCESS_PATH_8893626                                        apply index filter selectivity during skip scan costing          11.2.0.2
   9195582          1 QKSFM_ACCESS_PATH_9195582                                        leaf blocks as upper limit for skip scan blocks                  11.2.0.2

Lets focus on those fixes

In the stack below test4.sql has got the sql causing problem and what I do is altering the session to disable the fix in question and keep it disabled while my current session is under 11.2.0.2 . (For data security I could not add the sql and table names. )

SQL> alter session set optimizer_features_enable='11.2.0.2';

Session altered.

SQL> alter session set "_fix_control"='8855396:OFF';

Session altered.

SQL> @test4.sql

Plan hash value: 2840796242

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                             |      1 |        |      0 |00:00:01.09 |    5404 |
|   1 |  NESTED LOOPS OUTER           |                             |      1 |      1 |      0 |00:00:01.09 |    5404 |
|   2 |   NESTED LOOPS                |                             |      1 |      1 |      0 |00:00:01.09 |    5404 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| TABLE_XXXXXXXXXXXXXX        |      1 |      1 |      0 |00:00:01.09 |    5404 |
|*  4 |     INDEX SKIP SCAN           | IDX_TABLE_XXXXXXXXXXXXXX_02 |      1 |      1 |     50 |00:00:00.07 |    5317 |
|   5 |    TABLE ACCESS BY INDEX ROWID| TABLE_YYYYYYYYYYYYYYYYYY    |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  6 |     INDEX UNIQUE SCAN         | PK_ZZZZZZZZZZZZZZZZZZZ      |      0 |      1 |      0 |00:00:00.01 |       0 |
|   7 |   TABLE ACCESS BY INDEX ROWID | TABLE_TTTTTTTTTTTT          |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  8 |    INDEX UNIQUE SCAN          | PK_TABLE_TTTTTTTTTTTT       |      0 |      1 |      0 |00:00:00.01 |       0 |
-----------------------------------------------------------------------------------------------------------------------

48 rows selected.

SQL>
SQL> alter session set "_fix_control"='6086930:OFF';

Session altered.

SQL> @test4.sql

PL/SQL procedure successfully completed.

Plan hash value: 2840796242

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                             |      1 |        |      0 |00:00:01.11 |    5404 |
|   1 |  NESTED LOOPS OUTER           |                             |      1 |      1 |      0 |00:00:01.11 |    5404 |
|   2 |   NESTED LOOPS                |                             |      1 |      1 |      0 |00:00:01.11 |    5404 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| TABLE_XXXXXXXXXXXXXX        |      1 |      1 |      0 |00:00:01.11 |    5404 |
|*  4 |     INDEX SKIP SCAN           | IDX_TABLE_XXXXXXXXXXXXXX_02 |      1 |      1 |     50 |00:00:00.07 |    5317 |
|   5 |    TABLE ACCESS BY INDEX ROWID| TABLE_YYYYYYYYYYYYYYYYYY    |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  6 |     INDEX UNIQUE SCAN         | PK_ZZZZZZZZZZZZZZZZZZZ      |      0 |      1 |      0 |00:00:00.01 |       0 |
|   7 |   TABLE ACCESS BY INDEX ROWID | TABLE_TTTTTTTTTTTT          |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  8 |    INDEX UNIQUE SCAN          | PK_TABLE_TTTTTTTTTTTT       |      0 |      1 |      0 |00:00:00.01 |       0 |
-----------------------------------------------------------------------------------------------------------------------

SQL>
SQL> alter session set "_fix_control"='8893626:OFF';

Session altered.

SQL> @test4.sql

PL/SQL procedure successfully completed.

Plan hash value: 3290908244

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                          |      1 |        |      0 |00:00:00.08 |     178 |     15 |
|   1 |  NESTED LOOPS OUTER            |                          |      1 |      1 |      0 |00:00:00.08 |     178 |     15 |
|   2 |   NESTED LOOPS                 |                          |      1 |      1 |      0 |00:00:00.08 |     178 |     15 |
|   3 |    INLIST ITERATOR             |                          |      1 |        |      0 |00:00:00.08 |     178 |     15 |
|*  4 |     TABLE ACCESS BY INDEX ROWID| TABLE_XXXXXXXXXXXXXX     |     50 |      1 |      0 |00:00:00.08 |     178 |     15 |
|*  5 |      INDEX UNIQUE SCAN         | PK_TABLE_XXXXXXXXXXXXXX  |     50 |     50 |     50 |00:00:00.08 |      91 |     15 |
|   6 |    TABLE ACCESS BY INDEX ROWID | TABLE_YYYYYYYYYYYYYYYYYY |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |
|*  7 |     INDEX UNIQUE SCAN          | PK_ZZZZZZZZZZZZZZZZZZZ   |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |
|   8 |   TABLE ACCESS BY INDEX ROWID  | TABLE_TTTTTTTTTTTT       |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |
|*  9 |    INDEX UNIQUE SCAN           | PK_TABLE_TTTTTTTTTTTT    |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |
------------------------------------------------------------------------------------------------------------------------------

SQL>
SQL> alter session set "_fix_control"='9195582:OFF';

Session altered.

SQL> @test4.sql

---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                          |      1 |        |      0 |00:00:00.01 |     178 |
|   1 |  NESTED LOOPS OUTER            |                          |      1 |      1 |      0 |00:00:00.01 |     178 |
|   2 |   NESTED LOOPS                 |                          |      1 |      1 |      0 |00:00:00.01 |     178 |
|   3 |    INLIST ITERATOR             |                          |      1 |        |      0 |00:00:00.01 |     178 |
|*  4 |     TABLE ACCESS BY INDEX ROWID| TABLE_XXXXXXXXXXXXXX     |     50 |      1 |      0 |00:00:00.01 |     178 |
|*  5 |      INDEX UNIQUE SCAN         | PK_TABLE_XXXXXXXXXXXXXX  |     50 |     50 |     50 |00:00:00.01 |      91 |
|   6 |    TABLE ACCESS BY INDEX ROWID | TABLE_YYYYYYYYYYYYYYYYYY |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  7 |     INDEX UNIQUE SCAN          | PK_ZZZZZZZZZZZZZZZZZZZ   |      0 |      1 |      0 |00:00:00.01 |       0 |
|   8 |   TABLE ACCESS BY INDEX ROWID  | TABLE_TTTTTTTTTTTT       |      0 |      1 |      0 |00:00:00.01 |       0 |
|*  9 |    INDEX UNIQUE SCAN           | PK_TABLE_TTTTTTTTTTTT    |      0 |      1 |      0 |00:00:00.01 |       0 |
---------------------------------------------------------------------------------------------------------------------

As you can see we hit the jackpot when we disabled the fix 8893626. Our Table called TABLE_XXXXXXXXXXXXXX started to use PK_TABLE_XXXXXXXXXXXXXX instead of IDX_TABLE_XXXXXXXXXXXXXX_02 and ran faster.

this was a quick teaser post I will talk about this technique (you may think dodgy) further in the future

I may also add test case if I have time.

FootNote: This fix was for immediate action. Gathering stats is not helping us so we needed to apply it. I am not the big fun of these changes and as you see they are undocumented please do it with asking Oracle as we are doing at the moment.

For more about the bug

Bug 8893626 – Index Skip Scan with selective index filter predicate has high cost [ID 8893626.8]

For more about the optimizer_fix_control
Init.ora Parameter “_FIX_CONTROL” [Hidden] Reference Note [ID 567171.1]

http://el-caro.blogspot.com/2007/06/fix-control.html

http://jonathanlewis.wordpress.com/2009/12/22/optimizer-features/

Update – 24/01/11 - We now have an official bug for our issue
Bug 11672675: INDEX SKIP SCAN IS USED INAPPROPR. AFTER UPGR. FROM 11.2.0.1 -> 11.2.0.2

Beware of Deferred Segment during 11.2.0.2 upgrade from 11.2.0.1

Filed under: Bugs — coskan @ 12:38 am

We recently upgraded one of our biggest databases to 11.2.0.2 from 11.2.0.1 due to the library cache mutex related bugs on 11.2.0.1. Since I wasn’t the one who did the upgrade I have nothing to say about the process but I was the on call DBA on the next day after the upgrade.

Users started to report the ORA-00600 for the statement below

delete from TABLE_1 where id=49;

ERROR at line 1:
ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:objnotfound], [1002300], [], [], [], [], [], [], [], [], [], []

Checked for the error and found 3 bugs (9774817 10322768 10373381) related with the issue and not just none of them seems to have a workaround but all of them also says restore is the option which was the last choice for us (database server financial markets and we were close to the opening of markets ). It looks like we had some issues with deferred segments.

The table we try to delete from has got segments so it should not be the problem however it has got on delete trigger doing a merge on other tables and looks like one of those tables were causing the issue.

We first raised the Severity 1 ticket. while the engineer was reviewing the SR we decided to trace all the delete operation and see what is actually failing and once again 10046 trace revealed everything

When the error raised last operation was below

select /*+ all_rows */ count(1) from TABLE_2 where id=49

We run the statement manually, issue repeated again and when I check if the table has got any segment in dba_segments table it did not return anything.

Updated the SR with the new information. Support engineer asked us to try to regenerate the objects but he wasn’t hopeful. He suggested probably we needed to restore the db to before upgrade state.

I wasn’t as hopeful as ORacle support so to see which tables are affected I ran the output of the query below

select 'select count(*) from '||owner||'.'||table_name||' ;' from dba_tables where segment_created = 'NO';

All of the segmentless tables were raising the ORA-00600 so we decided to re-create them.

After recreation of all of the tables and their indexes I ran the same output again and none of the tables raised the error. Our workaround worked fine and we got away from restoring the DB and 2 more standbys which would be a nightmare just for tables which never used before. Interesting part is that we did not see the error on our QA db which I am not so sure what was the difference in terms of upgrade.

Hope you won’t hit this problem but better you check your deferred segments before and after 11.2.0.2 upgrade.

Note: I also check the LOB segments but we did not have any so no action taken for them.

select count(*) from dba_lobs where segment_created = 'NO';

November 22, 2009

Crash recovery cannot be done automatically – [kcratr1_lastbwr]

Filed under: Bugs — coskan @ 2:30 pm

I hit the error below during startup of my 10.2.0.1 RAC installation running on Solaris X86-64 on VMWare

ORA-00600: internal error code, arguments: [kcratr1_lastbwr], [], [], [], [],[], [], []

Basically none of the instances start-up with srvctl command and log this error to alert log.

Tried to run manual startup of instances but no chance same error.

Quick search on MOS (yes it was working :) ) take me to the note 393984.1 (no bug number)

It says this is a bug on Enterprise Edition – Version: 10.2.0.1 to 11.1.0.6.0

Solution is manual recovery.

SQL> startup;
        ORACLE instance started.

Total System Global Area  369098752 bytes
Fixed Size                  2020864 bytes
Variable Size             138414592 bytes
Database Buffers          226492416 bytes
Redo Buffers                2170880 bytes
Database mounted.
ORA-00600: internal error code, arguments: [kcratr1_lastbwr], [], [], [], [],
[], [], []

SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

September 14, 2009

Simple blank matters

Filed under: Basics, Bugs — coskan @ 1:16 pm

I saw this interesting bug on Metalink headlines and I thought it will be nice to post it here.

This is the heading of the bug ‘ADDING COLUMN WITH DEFAULT NULL LEADS TO UNNECESSARY FULL TABLE UPDATE ‘ numbered 8840491.

Normally, if you add a column with a default null value, there won’t be an update on the table itself (I think after 10G because trace for 9i shows update on table), running at the background, but if you put some spaces to make your alter statement look better, it is where the problem begins.

What I mean is these 4 statements are different during runtime due to the bug.

alter table test add (id number default null);

alter table test add ( id number default null);

alter table test add (id number default null );

alter table test add ( id number default null );

Lets see how do they differ.  Actual bug is reported on 10.2.0.4  and I am able reproduce it on 11.1.0.6


SQL> set timing on
SQL> create table test (id number not null);

Table created.

Elapsed: 00:00:00.54
SQL> insert into test
  2  select rownum from dual connect by level<=1000000;

1000000 rows created.

Elapsed: 00:00:01.65

SQL> alter session set tracefile_identifier=null_bug_test;

Session altered.

Elapsed: 00:00:00.00
SQL> exec dbms_session.session_trace_enable;

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.09
SQL> --no space before after ()
SQL> alter table test add (id2 number default null)  ;

Table altered.

Elapsed: 00:00:00.09
SQL> --space after (
SQL> alter table test add ( id3 number default null)  ;

Table altered.

Elapsed: 00:00:00.01
SQL> --space before )
SQL> alter table test add (id4 number default null )  ;

Table altered.

Elapsed: 00:01:12.90
SQL> --space before and after ()
SQL> alter table test add ( id5 number default null ) ;

Table altered.

Elapsed: 00:01:34.27
SQL> -- spaces everywhere but not after before ()
SQL> alter table test add (id6  number   default   null)  ;

Table altered.

Elapsed: 00:00:00.04
SQL>

As you see from the timings some of them are longer than expected. Lets find them in tracefile

These are the update statements from the trace file.

=====================
PARSING IN CURSOR #1 len=29 dep=1 uid=82 oct=6 lid=82 tim=21689660840 hv=720540867 ad='30679e28' sqlid='7nb3cf4pg5563'
update "TEST" set "ID4"=null 
END OF STMT
PARSE #1:c=15625,e=17207,p=0,cr=106,cu=1,mis=1,r=0,dep=1,og=1,tim=21689660835
=====================
PARSING IN CURSOR #5 len=29 dep=1 uid=82 oct=6 lid=82 tim=21762749484 hv=1088193227 ad='30679058' sqlid='f4g28hd0dt0qb'
update "TEST" set "ID5"=null 
END OF STMT
PARSE #5:c=46875,e=146504,p=294,cr=100,cu=0,mis=1,r=0,dep=1,og=1,tim=21762749480

As you see space before ) and space before and after () causing full table update and guess the damage of this tiny hidden change on over 10 million row huge table.

Next time, if you wait your not null new column getting added longer than it needs on 10G>, check the syntax you might be hitting this bug.

Older Posts »

The Silver is the New Black Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 193 other followers