Coskan’s Approach to Oracle

August 2, 2013

oratop from MOS

Filed under: Basics, Linux, Monitoring, Performance, Tips — coskan @ 10:12 am

This is just a pointer post to a tool which I discovered today. It is ORATOP, to monitor databases real time like top of unix flavours. It is pretty much like MOATS   (tool from Tanel Poder and Adrian Billington and RAC version by  Jagjeet Singh),   with a bit better output (at least for my taste)

 

It doesn’t need anything to be installed which is the best part of the tool.

oratop – utility for near real-time monitoring of databases, RAC and Single Instance (Doc ID 1500864.1)

 

Sample output is like below. It can be a good addition to your performance tuning toolkits.

 

oratop

April 19, 2013

Direct Path Reads Again

Filed under: Performance, Tips — coskan @ 7:20 pm

Note: This post is about diagnosing real life scenario caused by  direct path decision change on 11.2.0.3 which already covered  by Tanel Poder on his awesome Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_statistics_driven) post
If you already read and know how to approach the scenario you can easily skip this post and wait another year till I write another post while I’m working at the bank.

Story

Number of Oracle database in our estate is a high number and they are on shared storage arrays which means if one of them is naughty on I/O it has a potential impact on others.
To be proactive, from Time to time I run bunch of queries on all of the databases in a loop to check what is the wait profiles from dba_hist_active_sess_history and v$active_session_history
to see if we can optimize some of the work and offload those unnecessary I/O to unused CPU by utilizing more memory or getting rid of direct path reads when necessary.

Query is really simple and it works for my needs.
All I do is on the first query get the count(*) from ASH grouped by event. By this way I can get the overall wait profile within the ASH.
For the second query get count(*) for wait_class I/O grouped by sql_id ,sql_plan_hash_value,force_matching_signature,event. By this sql I can get the top sql_id for I/O waits (not top elapsed which is all matters for end users)

I’m not saying this is the right approach but practically it works for the purpose.

SQL: RR columns are ratio to report

prompt ************************************
prompt **** ASH OVERALL WAIT PROFILE
prompt ************************************
SELECT MIN(sample_time) min_ash_available,sysdate-MIN(sample_time) available duration FROM v$active_session_history;

select * from (
select NVL(event,'CPU') event,count(*),
round((ratio_to_report(sum(1)) over ()*100),1) rr
from gv$active_session_history
WHERE user_id<>0
AND sample_time<trunc(SYSDATE+1) AND sample_time>trunc(sysdate-1)
group by event
order by 2 desc
) where rownum<10;

prompt ************************************
prompt **** ASH I/O by SQL_ID
prompt ************************************

COLUMN force_matching_signature FOR 999999999999999999999999999
select * from (
select
sql_id ,sql_plan_hash_value,force_matching_signature,
NVL(event,'CPU') Event,
count(*),
round((ratio_to_report(sum(1)) over ()*100),1) rr
from gv$active_session_history
where
1=1
AND wait_class LIKE '%I/O'
--AND event IS null
and user_id<>0
AND sql_id IS NOT NULL
group by
sql_id,sql_plan_hash_value,event,force_matching_signature
order by 5 desc
) where rownum<30;          

For one of the databases which is not looked after by me the output was like this


************************************
**** ASH OVERALL
************************************

EVENT                                           COUNT(*)     RR
--------------------------------------------- ---------- ------
CPU                                               109648   63.1
direct path read                                   35914   20.7
log file sync                                      26136   15.1
enq: KO - fast object checkpoint                    1089     .6
db file sequential read                              263     .2
control file sequential read                         188     .1
log buffer space                                     149     .1
asynch descriptor resize                             112     .1
enq: HW - contention                                  31     .0

************************************
**** ASH I/O
************************************

SQL_ID          SQL_PLAN_HASH_VALUE     FORCE_MATCHING_SIGNATURE EVENT                                           COUNT(*)     RR
--------------- ------------------- ---------------------------- --------------------------------------------- ---------- ------
804t23t68dbmw            1520416513         10621052756171684880 direct path read                                   11689   32.1
9wthcvamy608f            1764147946         10621052756171684880 direct path read                                   11059   30.4
7scfmpxgv5mu5            3641059178          6835008959050889988 direct path read                                    6448   17.7
804t23t68dbmw            2550353655         10621052756171684880 direct path read                                    3060    8.4
9wthcvamy608f            2550353655         10621052756171684880 direct path read                                    2787    7.7
9wthcvamy608f            3119703268         10621052756171684880 direct path read                                     777    2.1
1q21k2wqgw8m3            3586661573            29420080321854211 control file sequential read                         135     .4

From this output, we can say, for the period I checked, overall sessions are %63 on CPU and %20 on direct path read. If we checke the ASH I/O we can see same sqls with different plans (804t23t68dbmw,9wthcvamy608f) are contributing in total 77% of that %20 direct path read.   Then next step check how much I/O per run the sql (804t23t68dbmw) is doing

 SQL> l
  1  select
  2     plan_hash_value plan_hash,
  3     executions,
  4     PX_SERVERS_EXECUTIONS pxe,
  5     cpu_time/1000 cpu_ms,
  6     elapsed_time/1000 ela_ms,
  7     (elapsed_time/1000000)/decode(nvl(executions,0),0,1,executions) avg_ela,
  8     IO_INTERCONNECT_BYTES/1024/1024/1024 io_inter_gb,
  9     PHYSICAL_READ_BYTES/1024/1024/1024 PHYSICAL_READ_GB,
 10     PHYSICAL_READ_BYTES/1024/1024/decode(nvl(executions,0),0,1,executions) PIO_MB_PE,
 11     buffer_gets/decode(nvl(executions,0),0,1,executions) LIOS_PE,
 12     disk_reads/decode(nvl(executions,0),0,1,executions) PIOS_PE
 13  from
 14     gv$sql
 15  where
 16     sql_id = ('&1')
 17  order BY
 18     inst_id,
 19     sql_id,
 20     hash_value,
 21*    child_number
SQL>

 PLAN_HASH EXECUTIONS        PXE     CPU_MS     ELA_MS    AVG_ELA      IO_INTER_GB PHYSICAL_READ_GB        PIO_MB_PE       LIOS_PE       PIOS_PE
---------- ---------- ---------- ---------- ---------- ---------- ---------------- ---------------- ---------------- ------------- -------------
2550353655         45          0   88018.62 147748.384 3.28329742             9.37             9.37           213.14      27402.60      27282.00
2550353655      17142          0 16035460.2   34342974 2.00344032          2208.88          2208.71           131.94      16973.79      16901.79

131MB per run (PIO_MB_PE) is looking small at first, but If you check the total reads and executions it looks a bit scary (Note: I use I/O_INTER_GB as we have exadata environments as well, in this case it is not relevant) I checked how many times in a day it runs and how much total being read

SQL> l
  1  with ds as (
  2  select min(snap_id) min_snap,max(snap_id) max_snap from
  3  dba_hist_snapshot
  4  where begin_interval_time between trunc(sysdate-1) and trunc(sysdate))
  5  select sum(executions_delta) total_execution_yesterday,sum(executions_delta)*&_tmp_pio_pe/1024/1024 total_pio_pd_TB
  6  from dba_hist_sqlstat,ds
  7  where snap_id  between ds.min_snap and ds.max_snap
  8* and sql_id='&1'

TOTAL_EXECUTION_YESTERDAY TOTAL_PIO_PD_TB
------------------------- ---------------
                    18689      2.39794088

As you can see daily read is  2.3 TB with 18K executions read per day from a single query reads innocently 131MB per execution. Table size is not that big

SQL> select sum(bytes)/1024/1024 total_mb,sum(blocks) total_blocks 
from dba_segments where segment_name='PROBLEM_TABLE';

  TOTAL_MB TOTAL_BLOCKS
---------- ------------
       109        15360

SGA is way above to go via direct path read

SQL> show parameter sga_target

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------
sga_target                           big integer 9G

It looks like our table (all partitions) is in the cache. 15.2K blocks over total 15.3K blocks

SQL> select count(distinct block#) from v$bh
  2  where objd in (select data_object_id from dba_objects where object_name='PROBLEM_TABLE' and object_type like 'TABLE%');

COUNT(DISTINCTBLOCK#)
---------------------
                15260

So, table Size is small to fit in cache also Table is already in cache then Why we are scanning via direct path read ?

To know the answer you need to be a good blog follower like me and immediately recall
Tanels awesome post “Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_statistics_driven),
which is new change on 11.2.0.3 where our version is.

To make sure this behaviour change  is the case, I run the test with _direct_read_decision_statistics_driven=TRUE (default) and then FALSE

SQL>alter session set "_direct_read_decision_statistics_driven"=TRUE;

SQL>@mon_sqlid2 bcfqr520321mf

DBMS_SQLTUNE.REPORT_SQL_MONITOR(SQL_ID=>'BCFQR520321MF')
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL Monitoring Report

SQL Text
------------------------------
select /*+monitor*//*coskan2*/.......... from VIEW_XXXXXXXXXXXXXXX where COLUMN='ZZZ'

Global Stats
=========================================================================================
| Elapsed |   Cpu   |    IO    | Application |  Other   | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Calls |  Gets  | Reqs | Bytes |
=========================================================================================
|    4.68 |    1.17 |     2.76 |        0.23 |     0.53 |     2 |  21228 |  171 | 164MB |
=========================================================================================

SQL Plan Monitoring Details (Plan Hash Value=2550353655)
=================================================================================================================================================================================================
| Id |                   Operation                   |         Name          |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  |  Mem  | Activity |   Activity Detail    |
|    |                                               |                       | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes | (Max) |   (%)    |     (# samples)      |
=================================================================================================================================================================================================
|  0 | SELECT STATEMENT                              |                       |         |       |         1 |     +5 |     1 |       16 |      |       |       |          |                      |
|  1 |   VIEW                                        | VIEW_XXXXXXXXXXXXXXX  |       3 | 25252 |         1 |     +5 |     1 |       16 |      |       |       |          |                      |
|  2 |    SORT ORDER BY                              |                       |       3 | 25252 |         1 |     +5 |     1 |       16 |      |       |  4096 |          |                      |
|  3 |     HASH GROUP BY                             |                       |       3 | 25252 |         1 |     +5 |     1 |       16 |      |       |    1M |          |                      |
|  4 |      VIEW                                     |                       |       3 | 25250 |         1 |     +5 |     1 |     289K |      |       |       |          |                      |
|  5 |       UNION-ALL                               |                       |         |       |         1 |     +5 |     1 |     289K |      |       |       |          |                      |
|  6 |        FILTER                                 |                       |         |       |         1 |     +5 |     1 |     289K |      |       |       |          |                      |
|  7 |         HASH GROUP BY                         |                       |       2 | 19143 |         2 |     +4 |     1 |     289K |      |       |   62M |          |                      |
|  8 |          HASH JOIN                            |                       |    990K | 19090 |         4 |     +2 |     1 |     289K |      |       |   30M |    25.00 | Cpu (1)              |
|  9 |           PART JOIN FILTER CREATE             | :BF0000               |    988K |  6099 |         3 |     +2 |     1 |     289K |      |       |       |          |                      |
| 10 |            PARTITION LIST SINGLE              |                       |    988K |  6099 |         3 |     +2 |     1 |     289K |      |       |       |          |                      |
| 11 |             TABLE ACCESS FULL                 | PROBLEM_TABLE         |    988K |  6099 |         4 |     +1 |     1 |     289K |   57 |  55MB |       |    25.00 | direct path read (1) |
| 12 |           PARTITION LIST SINGLE               |                       |    988K |  6102 |         2 |     +4 |     1 |     289K |      |       |       |          |                      |
| 13 |            TABLE ACCESS FULL                  | PROBLEM_TABLE         |    988K |  6102 |         3 |     +3 |     1 |     289K |   57 |  55MB |       |    50.00 | direct path read (2) |
| 14 |        FILTER                                 |                       |         |       |           |        |     1 |          |      |       |       |          |                      |
| 15 |         HASH GROUP BY                         |                       |       1 |  6107 |           |        |     1 |          |      |       |       |          |                      |
| 16 |          FILTER                               |                       |         |       |           |        |     1 |          |      |       |       |          |                      |
| 17 |           NESTED LOOPS                        |                       |         |       |           |        |     1 |          |      |       |       |          |                      |
| 18 |            NESTED LOOPS                       |                       |       1 |  6102 |           |        |     1 |          |      |       |       |          |                      |
| 19 |             PARTITION LIST SINGLE             |                       |       1 |  6099 |           |        |     1 |          |      |       |       |          |                      |
| 20 |              TABLE ACCESS FULL                | PROBLEM_TABLE         |       1 |  6099 |           |        |     1 |          |   57 |  55MB |       |          |                      |
| 21 |             PARTITION LIST SINGLE             |                       |       1 |     2 |           |        |       |          |      |       |       |          |                      |
| 22 |              INDEX RANGE SCAN                 | PROBLEM_TABLE    _IX3 |       1 |     2 |           |        |       |          |      |       |       |          |                      |
| 23 |            TABLE ACCESS BY LOCAL INDEX ROWID  | PROBLEM_TABLE         |       1 |     3 |           |        |       |          |      |       |       |          |                      |
| 24 |           SORT AGGREGATE                      |                       |       1 |       |           |        |       |          |      |       |       |          |                      |
| 25 |            PARTITION LIST SINGLE              |                       |       1 |     4 |           |        |       |          |      |       |       |          |                      |
| 26 |             TABLE ACCESS BY LOCAL INDEX ROWID | PROBLEM_TABLE         |       1 |     4 |           |        |       |          |      |       |       |          |                      |
| 27 |              INDEX RANGE SCAN                 | PROBLEM_TABLE    _IX  |       1 |     3 |           |        |       |          |      |       |       |          |                      |
=================================================================================================================================================================================================

This time we read 164MB and if you check line 11,13 and 20
where we access PROBLEM_TABLE via full partition scan, you can see all the direct path reads are because of that access.

When we revert the parameter to pre 11.2.0.3 behaviour.


SQL>alter session set "_direct_read_decision_statistics_driven"=false;

Session altered.

SQL>@mon_sqlid2 fds2rw8vuutcn

DBMS_SQLTUNE.REPORT_SQL_MONITOR(SQL_ID=>'FDS2RW8VUUTCN')
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL Monitoring Report

SQL Text
------------------------------
select /*+monitor*//*coskan*/........................
from VIEW_XXXXXXXXXXXXXXX where COLUMN='ZZZ'

Global Stats
=================================================
| Elapsed |   Cpu   |  Other   | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|    0.93 |    0.93 |     0.00 |     2 |  21043 |
=================================================

SQL Plan Monitoring Details (Plan Hash Value=2550353655)
=============================================================================================================================================================================
| Id |                   Operation                   |         Name          |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   |  Mem  | Activity | Activity Detail |
|    |                                               |                       | (Estim) |       | Active(s) | Active |       | (Actual) | (Max) |   (%)    |   (# samples)   |
=============================================================================================================================================================================
|  0 | SELECT STATEMENT                              |                       |         |       |         1 |     +1 |     1 |       16 |       |          |                 |
|  1 |   VIEW                                        | VIEW_XXXXXXXXXXXXXXX  |       3 | 25252 |         1 |     +1 |     1 |       16 |       |          |                 |
|  2 |    SORT ORDER BY                              |                       |       3 | 25252 |         1 |     +1 |     1 |       16 |  4096 |          |                 |
|  3 |     HASH GROUP BY                             |                       |       3 | 25252 |         1 |     +1 |     1 |       16 |    1M |          |                 |
|  4 |      VIEW                                     |                       |       3 | 25250 |         1 |     +1 |     1 |     287K |       |          |                 |
|  5 |       UNION-ALL                               |                       |         |       |         1 |     +1 |     1 |     287K |       |          |                 |
|  6 |        FILTER                                 |                       |         |       |         1 |     +1 |     1 |     287K |       |          |                 |
|  7 |         HASH GROUP BY                         |                       |       2 | 19143 |         1 |     +1 |     1 |     287K |   62M |   100.00 | Cpu (1)         |
|  8 |          HASH JOIN                            |                       |    990K | 19090 |         1 |     +1 |     1 |     287K |   31M |          |                 |
|  9 |           PART JOIN FILTER CREATE             | :BF0000               |    988K |  6099 |         1 |     +1 |     1 |     287K |       |          |                 |
| 10 |            PARTITION LIST SINGLE              |                       |    988K |  6099 |         1 |     +1 |     1 |     287K |       |          |                 |
| 11 |             TABLE ACCESS FULL                 | PROBLEM_TABLE         |    988K |  6099 |         1 |     +1 |     1 |     287K |       |          |                 |
| 12 |           PARTITION LIST SINGLE               |                       |    988K |  6102 |         1 |     +1 |     1 |     287K |       |          |                 |
| 13 |            TABLE ACCESS FULL                  | PROBLEM_TABLE         |    988K |  6102 |         1 |     +1 |     1 |     287K |       |          |                 |
| 14 |        FILTER                                 |                       |         |       |           |        |     1 |          |       |          |                 |
| 15 |         HASH GROUP BY                         |                       |       1 |  6107 |           |        |     1 |          |       |          |                 |
| 16 |          FILTER                               |                       |         |       |           |        |     1 |          |       |          |                 |
| 17 |           NESTED LOOPS                        |                       |         |       |           |        |     1 |          |       |          |                 |
| 18 |            NESTED LOOPS                       |                       |       1 |  6102 |           |        |     1 |          |       |          |                 |
| 19 |             PARTITION LIST SINGLE             |                       |       1 |  6099 |           |        |     1 |          |       |          |                 |
| 20 |              TABLE ACCESS FULL                | PROBLEM_TABLE         |       1 |  6099 |           |        |     1 |          |       |          |                 |
| 21 |             PARTITION LIST SINGLE             |                       |       1 |     2 |           |        |       |          |       |          |                 |
| 22 |              INDEX RANGE SCAN                 | PROBLEM_TABLE_IX3 		 |       1 |     2 |           |        |       |          |       |          |                 |
| 23 |            TABLE ACCESS BY LOCAL INDEX ROWID  | PROBLEM_TABLE         |       1 |     3 |           |        |       |          |       |          |                 |
| 24 |           SORT AGGREGATE                      |                       |       1 |       |           |        |       |          |       |          |                 |
| 25 |            PARTITION LIST SINGLE              |                       |       1 |     4 |           |        |       |          |       |          |                 |
| 26 |             TABLE ACCESS BY LOCAL INDEX ROWID | PROBLEM_TABLE         |       1 |     4 |           |        |       |          |       |          |                 |
| 27 |              INDEX RANGE SCAN                 | PROBLEM_TABLE_IX      |       1 |     3 |           |        |       |          |       |          |                 |
=============================================================================================================================================================================

Suddenly, all reads are from cache and runtime dropped from 4.5 to 0.98 seconds

To make myself more sure, I checked the stats why this is happening.

Statistics think that it has 2 million rows and 55K blocks.

SQL>select num_rows,blocks,last_analyzed from dba_tables where table_name='PROBLEM_TABLE';

  NUM_ROWS     BLOCKS LAST_ANALYZED
---------- ---------- -------------------
   2086124      55981 18/04/2013 23:42:56

In reality oracle has 600K rows and 15K blocks

SQL>select count(*) from ptd_owner.PROBLEM_TABLE    ;

  COUNT(*)
----------
    611960

SQL>select sum(blocks) from dba_segments where segment_name='PROBLEM_TABLE    ';

SUM(BLOCKS)
-----------
      15360

Our problem is definitely that When the direct path read decision driven from statistics (which are not representative) it goes wrong access path.

I did not stop here and check the all_tab_modifications to see what happened to table since last analyze


####################################################
################TABLE MODIFICATIONS#################
####################################################

TABLE_OWNER   TABLE_NAME      PARTITION_NAME  INSERTS          UPDATES          DELETES TIMESTAMP TRUNCATED DROP_SEGMENTS
------------- --------------- ----------------------- ---------------- ---------------- --------- --------- -------------
PTD_OWNER     PROBLEM_TABLE                         0                0          2086124 18-APR-13 NO              0
PTD_OWNER     PROBLEM_TABLE   XXXXXXXXXXXX          0                0           267418 18-APR-13 YES             0
PTD_OWNER     PROBLEM_TABLE   YYYYYYY               0                0           830434 18-APR-13 YES             0
PTD_OWNER     PROBLEM_TABLE   ZZZ                   0                0           988272 18-APR-13 YES             0

It looks like table partitions are truncated after the nightly stats gathering and starting to get load till the next truncate.

When we check with business, they also confirmed the behaviour.

So what is next,

As the filter is the partition key which needs full partition access, indexing is not a solution.
To increase sga, will help but then why would we, if the table can fit to the cache most of the time (till it grows to a certain point during the day)
To change the parameter session wide, is not possible via trigger
To change the parameter in the whole database is also not something we want to implement

Last option is to have representative statistics in the database gathered when the table is full in half way and then lock them. (or set the stats manually like Tanel)
By this way we can avoid using undocumented parameters, we can still get buffered reads till a certain point throughout the day (oracle may decide to change the path if the table gets bigger in size)

Well thats about it. Finally I manage to write a blog after 11 months right before a year ended (banking is killing productivity and increasing spam comments)

I have to admit, when the scripts are ready, it takes about 10 mins to sort the problem but it took 2 hours to share it , so I hope you enjoyed it.

Morals of the story are check your systems regularly,  read blogs and god bless  Tanel:)

May 31, 2012

Plan stability through Upgrade to 11G-Why is my plan changed?-Auto Adjusted Dynamic Sampling

Filed under: CBO, Performance, Plan Stability, Upgrade — Tags: — coskan @ 7:59 pm

Todays story is about, how dynamic sampling can effect your plan stability during upgrade and how you can recover from it as long as you know what your are doing.

A little bit of background, I’m currently in the middle of migrate and upgrade cycle for 12TB warehouse from 10.2.0.3 Solaris to 11.2.0.3 (yet) Linux.
Plan stability on this system is maintained by locking the main driving table statistics and hinting the queries in case CBO does not pick the right driving table.
The system uses temporary tables a lot for staging operations and they rely on dynamic sampling Level 2 which 2 is enough. They also use parallelism for the heavy batch operations. Good part of the system is, since they use partitioning heavily they can simulate the close to real workload on cut down version of the database. What we are doing is getting the cut down version 10.2.0.3 export from solaris imported into 11.2.0.3 linux and run the batches for the close of business date with same memory and other settings. Initial load test done on OFE(optimizer_features_enable)=11.2.0.3 and we saw many parallel queries started to suffer. Our aim was not to check what is slow just collect the necessary data so we did not tune anything. Then we did another test on OFE=10.2.0.3 and saw that 10.2.0.3 did way better for parallel queries but it also suffer for other issues which is not subject of the current post. We have many other runs with different options for OFE and new stats but they will also be covered later if I come up something interesting. Todays story is why our parallel queries suddenly went bad on OFE=11.2.0.3

After this background now I can start the actual story.

Once I check the plans I saw something interesting. Oracle suddenly started to use dynamic sampling 7 without any manual setting. I did ask it on twitter and thankfully Timur Akhmadeev pointed me the right direction on the manual (RTFM rocks)

About where the dynamic sampling is used

Oracle 10GR2 documentation says

14.5.6 Estimating Statistics with Dynamic Sampling

This dynamic sampling feature is controlled by the OPTIMIZER_DYNAMIC_SAMPLING parameter.
For dynamic sampling to automatically gather the necessary statistics, this parameter should be set to a value of 2 or higher.
The default value is 2. See "Dynamic Sampling Levels" for information about the sampling levels that can be set.

Oracle 11GR2 documentation says

13.6.2.2 When the Optimizer Uses Dynamic Sampling

During compilation, the optimizer decides whether to use dynamic sampling based on a number of factors,
including whether the statements use parallel processing or serial processing.

For parallel statements, the optimizer automatically decides whether to use dynamic sampling and which level to use.
The decision depends on the size of the tables and the complexity of the predicates. The optimizer expects parallel statements to be resource-intensive,
so the additional overhead at compile time is worth it to ensure the best plan. The database ignores the OPTIMIZER_DYNAMIC_SAMPLING setting
unless set to a nondefault value, in which case the value is honored.

For serially processed SQL statements, the dynamic sampling level depends on the value of the OPTIMIZER_DYNAMIC_SAMPLING parameter and
is not triggered automatically by the optimizer. Serial statements are typically short-running, so that any overhead at compile time
 could have a huge impact on their performance.

Did you spot the difference ? on 10GR2 the decision of the level completely depends on the optimizer_dynamic_sampling parameter (system,session,hint level),
however on 11GR2 for parallel statements optimizer “automatically” decides whether to use dynamic sampling and which level to use.

Since 2005 where my Oracle career starts, I learned one thing, When there is something “automagically” adjusted and decided, there is always big possibility for less stability.

Below is the difference in a simple case where I create a table, do not gather statistics, keep optimizer_dynamic_sampling=2 and run explain plan with parallel hint.


HR@ORACOSL> create table ds_test
as
with generator as (
select--+ materialize
rownum id
from dual
connect by
level  explain plan for select  * from ds_test;

Explained.

HR@ORACOSL> @e

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4280396642

-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |   185M|    15G|   469K  (1)| 01:33:57 |
|   1 |  TABLE ACCESS FULL| DS_TEST |   185M|    15G|   469K  (1)| 01:33:57 |
-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1 / DS_TEST@SEL$1

Note
-----
   - dynamic sampling used for this statement (level=2)

17 rows selected.

-----WITH PARALLELISM

HR@ORACOSL> explain plan for select /*+parallel*/ * from ds_test;

Explained.

HR@ORACOSL> @e

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 791174141

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |    92M|  7654M|   260K  (1)| 00:52:02 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |    92M|  7654M|   260K  (1)| 00:52:02 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |    92M|  7654M|   260K  (1)| 00:52:02 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS FULL| DS_TEST  |    92M|  7654M|   260K  (1)| 00:52:02 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   4 - SEL$1 / DS_TEST@SEL$1

Note
-----
   - dynamic sampling used for this statement (level=6)
   - automatic DOP: skipped because of IO calibrate statistics are missing

22 rows selected.

-----WITH OFE=10.2.0.3

HR@ORACOSL> alter session set optimizer_features_enable='10.2.0.3';

Session altered.

HR@ORACOSL> explain plan for select /*+parallel*/ * from ds_test;

Explained.

HR@ORACOSL> @e

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 791174141

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |   108M|  8965M|   260K  (1)| 00:08:41 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |   108M|  8965M|   260K  (1)| 00:08:41 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |   108M|  8965M|   260K  (1)| 00:08:41 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS FULL| DS_TEST  |   108M|  8965M|   260K  (1)| 00:08:41 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   4 - SEL$1 / DS_TEST@SEL$1

Note
-----
   - dynamic sampling used for this statement (level=2)
   - automatic DOP: Computed Degree of Parallelism is 2 because of degree limit

22 rows selected.

As you can see Oracle adjusted my default dynamic sampling level to 6 on OFE 11.2.0.3 when the statement is parallel and keeps it as Level 2 when statement is serial or OFE is lower than 11.2.0.1. Since this is a simple case there is no effect but when you have complex joins which you are sure about the plan you can suddenly start to see changes and got heavily hit by the feature. I can assure, I had the issue with tables that has got statistics as well so it is not the case for tables that need dynamic sampling.

Since I introduced you with the feature, next question is how can we control the feature?. I think optimizer guys did a mistake here and did not give any control via an optimizer parameter. How do I know it ? If you follow what I have done in the optimizer features post you can see there is no controlling on this feature via a parameter. So we only have optimizer bugfixes as controller.
Lets check the bugfixes related with dynamic sampling.

For this I use the sql below for bugfix.sql

column sql_feaure for a34
column ofe for a10
column description for a64

select bugno,optimizer_feature_enable ofe ,sql_feature,description,value from v$system_fix_control where (sql_feature like upper('%&1%') or upper(description) like '%&1%')
order by 2 desc nulls last,1 asc;

Results for calling it with search key as ‘dynamic’

HR@ORACOSL> @bugfix dynamic

     BUGNO OFE        SQL_FEATURE                                                      DESCRIPTION                                                           VALUE
---------- ---------- ---------------------------------------------------------------- ---------------------------------------------------------------- ----------
  12399886 11.2.0.3   QKSFM_DYNAMIC_SAMPLING_12399886                                  update the index clustering factor (DS) if statistics do not exi          1
   6408301 11.2.0.2   QKSFM_DYNAMIC_SAMPLING_6408301                                   use recursive idx selectivity for partitioned table as well               1
   8767442 11.2.0.2   QKSFM_DYNAMIC_SAMPLING_8767442                                   compute NDV for all columns in a multi-column join key using DS           1
   9272549 11.2.0.2   QKSFM_DYNAMIC_SAMPLING_9272549                                   do not sample columns which have statistics                               1
   7452863 11.2.0.1   QKSFM_DYNAMIC_SAMPLING_7452863                                   adjust DS level for large parallel tables based on the size               1
   6766962 11.1.0.7   QKSFM_DYNAMIC_SAMPLING_6766962                                   sample size in DS is at most equal to the partition sample size           1
   6956212 10.2.0.5   QKSFM_DYNAMIC_SAMPLING_6956212                                   allow dynamic sampling when OR-expansion is used                          1
   7592673 10.2.0.5   QKSFM_DYNAMIC_SAMPLING_7592673                                   do not use dyn sampling index selectivity if not all keys sample          1
   6608941            QKSFM_DYNAMIC_SAMPLING_6608941                                   allow dynamic sampling on fixed tables                                    0
   6708183            QKSFM_DYNAMIC_SAMPLING_6708183                                   allow dynamic sampling on table functions                                 0

10 rows selected.

If you look carefully to the ones which are introduced on 11.2.0.* you can see the 7452863 with description adjust DS level for large parallel tables based on the size.

Now time to try to disable the feature ( alter session set “_fix_control”=’7452863:OFF';) and see what the effect is

HR@ORACOSL> alter session set optimizer_features_enable='11.2.0.3';

Session altered.

HR@ORACOSL> alter session set "_fix_control"='7452863:OFF';

Session altered.

HR@ORACOSL> explain plan for select /*+parallel*/ * from ds_test;

Explained.

HR@ORACOSL> @e

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 791174141

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |   108M|  8965M|   260K  (1)| 00:52:03 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |   108M|  8965M|   260K  (1)| 00:52:03 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |   108M|  8965M|   260K  (1)| 00:52:03 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS FULL| DS_TEST  |   108M|  8965M|   260K  (1)| 00:52:03 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   4 - SEL$1 / DS_TEST@SEL$1

Note
-----
   - dynamic sampling used for this statement (level=2)
   - automatic DOP: skipped because of IO calibrate statistics are missing

22 rows selected.

As you can see dynamic sampling is back to the default level 2. You can use this fix control till oracle comes up with a better idea for your parallel queries where you want plan stability and you are happy with your defauly dynamic sampling level.

Attention I must also warn the a bit of unexpected behaviour that this automagical adjustment is working a bit weird. When I set the dynamic sampling to level higher than 2 Oracle accepts the new level as adjusted value like below

Update: After reading the blog posts Dominic shared in comments, I ‘m now sure feature is only active with default optimizer_dynamic_sampling=2

HR@ORACOSL> alter system set optimizer_dynamic_sampling=3;

System altered.

HR@ORACOSL> explain plan for select /*+parallel*/ * from ds_test;

Explained.

HR@ORACOSL> @e

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 791174141

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |   108M|  8965M|   260K  (1)| 00:52:03 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |   108M|  8965M|   260K  (1)| 00:52:03 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |   108M|  8965M|   260K  (1)| 00:52:03 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS FULL| DS_TEST  |   108M|  8965M|   260K  (1)| 00:52:03 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   4 - SEL$1 / DS_TEST@SEL$1

Note
-----
   - dynamic sampling used for this statement (level=3)
   - automatic DOP: skipped because of IO calibrate statistics are missing

10053 trace has got the info below when it adjust dynamic_sampling to 6 when the default is 2

Dynamic sampling level auto-adjusted from 6 to 6  

*** 2012-05-31 20:23:02.408
** Performing dynamic sampling initial checks. **
** Dynamic sampling initial checks returning TRUE (level = 6).
** Dynamic sampling updated table stats.: blocks=1726021

I still don’t get why it says from 6 to 6 normally it should be from 2 to 6 ()

and info below when we set it to level 3 either session or system level

*************************************
  PARAMETERS WITH ALTERED VALUES
  ******************************
Compilation Environment Dump
sqlstat_enabled                     = true
optimizer_dynamic_sampling          = 3
statistics_level                    = all
parallel_query_default_dop          = 2
parallel_hinted                     = default
_optimizer_use_feedback             = false
Bug Fix Control Environment

*** 2012-05-31 20:25:34.191
** Performing dynamic sampling initial checks. **
** Dynamic sampling initial checks returning TRUE (level = 3).
** Dynamic sampling updated table stats.: blocks=1726021

With this behaviour I started to think it only kicks in when dynamic sampling is level 2 but I need to confirm it. I will update the post if I get something different than my final thought.

For more information you can check the MOS note below

Different Level for Dynamic Sampling used than the Specified Level [ID 1102413.1]

Bugs:
Bug 11841491 : POOR PERFORMANCE WITH FIX 7452863
Bug 10623519 : SQL3 – INCORRECT CARDINALITY ESTIMATES IN 11G WITH MULTIPLE FILTERS ON SAME TABLE
Bug 14059307 : AUTO-ADJUST IMPOSED DYNAMIC SAMPLING INACCURATE FOR LARGE PARTITIONED TABLE
Bug 7832981 : WITH FIX 7676927 MERGE SQL TAKES LONG AND CONSUMES LOT OF TEMP SPACE

Update:

Additional more scientific info is in the blog posts below (Thanks to Dominic’s comment )

Randolf Geist touches on it here:
http://allthingsoracle.com/dynamic-sampling-ii-controlling-the-activity-part-2/

Also Jean Pierre-Dijcks here:

https://blogs.oracle.com/datawarehousing/entry/explaining_explain_plan_note

April 11, 2012

When dbms_xplan.display_sql_plan_baseline fails to show the plan

Filed under: Performance, Plan Stability, Tips — coskan @ 8:45 pm

Last week my colleague (Paul Reed – my new content provider with issues on his databases :) ), had a problem where he needed to identify which baseline was using index
on a plan where collection table (like SELECT column FROM TABLE(:B1) ) was used.

When he tried DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE he got “ERROR: An error has happened, error code: 22905″ which was related with the bug below.

DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE returns "ERROR: An error has happened, error code: 22905" [ID 1418186.1]

When he asked me, if there is a workaround, I was under the impression that dba_sql_plan_baselines has got the outline information, so I asked him to check if he can get the outline info.
He checked and make us realize the truth that cbo developers did not make outline information available to the public within dba_sql_plan_baselines table (I’m still thinking why not ??)

We checked the other options (dba_hist_active_sess_history.SQL_PLAN_OPERATION) and found out what we needed but this was not enough for me.
While I was thinking how we can get the outline, my collegue managed to get the answer accidentally which is the main reason, I’m writing this blog post.

To keep the baseline safe he was backing up them to staging table with DBMS_SPM.pack_stgtab_baseline procedure
and when he select from the staging table he saw that comp_data column is actually the outline information.

Here is how he did it with a little bit of polishing.
(I will be using the baseline generated on this post - dbms_xplan.display_sql_plan_baseline works for this but not important for the post purposes!!)

 

DBA_SQL_PLAN_BASELINE content

HR@ORACOS> @printtab 'select * from dba_sql_plan_baselines where plan_name="SQL_PLAN_13w748wknkcwd8576eb1f"'
SIGNATURE                     : 1292784087274697613
SQL_HANDLE                    : SYS_SQL_11f0e4472549338d
SQL_TEXT                      : select count(*), max(col2) from t1 where flag = :n
PLAN_NAME                     : SQL_PLAN_13w748wknkcwd8576eb1f
CREATOR                       : HR
ORIGIN                        : MANUAL-LOAD
PARSING_SCHEMA_NAME           : HR
DESCRIPTION                   :
VERSION                       : 11.2.0.1.0
CREATED                       : 23-MAR-12 10.23.23.000000
LAST_MODIFIED                 : 23-MAR-12 10.23.23.000000
LAST_EXECUTED                 : 23-MAR-12 10.24.56.000000
LAST_VERIFIED                 :
ENABLED                       : YES
ACCEPTED                      : YES
FIXED                         : NO
AUTOPURGE                     : YES
OPTIMIZER_COST                : 2
MODULE                        : SQL*Plus
ACTION                        :
EXECUTIONS                    : 3
ELAPSED_TIME                  : 14772
CPU_TIME                      : 0
BUFFER_GETS                   : 9
DISK_READS                    : 0
DIRECT_WRITES                 : 0
ROWS_PROCESSED                : 3
FETCHES                       : 3
END_OF_FETCH_COUNT            : 3
-----------------
HR@ORACOS>

As you can see baseline is there but no outline info available to the public.

Lets create a staging table then pack the baseline.

Staging Table Creation

BEGIN
  DBMS_SPM.CREATE_STGTAB_BASELINE(
    table_name      => 'SPM_STAGING',
    table_owner     => 'HR'
);
END;
/

PL/SQL procedure successfully completed.

Packing the baseline

SET SERVEROUTPUT ON
DECLARE
l_plans_packed PLS_INTEGER;
BEGIN
l_plans_packed := DBMS_SPM.pack_stgtab_baseline(
table_name =&gt; 'SPM_STAGING',
table_owner =&gt; 'HR',
sql_handle=&gt;'SYS_SQL_11f0e4472549338d',
plan_name=&gt;'SQL_PLAN_13w748wknkcwd8576eb1f'
);
DBMS_OUTPUT.put_line('Plans Packed: ' || l_plans_packed);
END;
/

Plans Packed: 1

PL/SQL procedure successfully completed.

Lets see what is inside the SPM_STAGING table where we packed the data into

 

Staging Table Content 

R@ORACOS> HR@ORACOS> @printtab 'select * from spm_staging'
VERSION                       : 2
SIGNATURE                     : 1292784087274697613
SQL_HANDLE                    : SYS_SQL_11f0e4472549338d
OBJ_NAME                      : SQL_PLAN_13w748wknkcwd8576eb1f
OBJ_TYPE                      : SQL_PLAN_BASELINE
PLAN_ID                       : 2239163167
SQL_TEXT                      : select count(*), max(col2) from t1 where flag = :n
CREATOR                       : HR
ORIGIN                        : MANUAL-LOAD
DESCRIPTION                   :
DB_VERSION                    : 11.2.0.1.0
CREATED                       : 23-MAR-12 10.23.23.000000
LAST_MODIFIED                 : 23-MAR-12 10.23.23.000000
LAST_EXECUTED                 : 23-MAR-12 10.24.56.000000
LAST_VERIFIED                 :
STATUS                        : 11
OPTIMIZER_COST                : 2
MODULE                        : SQL*Plus
ACTION                        :
EXECUTIONS                    : 3
ELAPSED_TIME                  : 14772
CPU_TIME                      : 0
BUFFER_GETS                   : 9
DISK_READS                    : 0
DIRECT_WRITES                 : 0
ROWS_PROCESSED                : 3
FETCHES                       : 3
END_OF_FETCH_COUNT            : 3
CATEGORY                      : DEFAULT
SQLFLAGS                      : 0
TASK_ID                       :
TASK_EXEC_NAME                :
TASK_OBJ_ID                   :
TASK_FND_ID                   :
TASK_REC_ID                   :
INUSE_FEATURES                : 1
PARSE_CPU_TIME                :
PRIORITY                      :
OPTIMIZER_ENV                 :
E289FB89E126A80034011000AEF9C3E2CFFA331056414555519521105555551545545558591555449665851D5511058555555155515122555415A0EA0E5551454265455454449081566E001696A35615551403025415505AE126A800050402000002000000100000000100002000000208D007000000001000243303000101000038F8000400009299010000000800E126A80065646
40202643202320000020003020A0A05050A140002000032F4010000500A0A0A0A64E803000064E08003FFFF00000804020000080032
BIND_DATA                     : BEDA0B2001004F6C4EAD000101F001200369025931
PARSING_SCHEMA_NAME           : HR
COMP_DATA                     :
<![CDATA[IGNORE_OPTIM_EMBEDDED_HINTS]]><![CDATA[OPTIMIZER_FEATURES_ENABLE('11.2.0.1')]]><![CDATA[DB_VERSION('11.2.0.1')]]><![CDATA[ALL_ROWS]]><![CDATA[OUTLINE_LEAF(@"SEL$1")]]><![CDATA[INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T1"."FLAG"))]]>
-----------------

PL/SQL procedure successfully completed.

As you can see there are more columns inside the staging table and more importantly COMP_DATA column has got the outline information.

This way outline data is a bit hard to read. You can paste the information to an xml parser or you can use the query below to get a better output.

 COMP_DATA content  (outline information)

select
    extractvalue(value(d), '/hint') as outline_hints
    from
    xmltable('/outline_data/hint'
    passing (
    select
    xmltype(comp_data) as xmlval
    from
    spm_staging
   )    ) d;

OUTLINE_HINTS
--------------------------------------------------------
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
DB_VERSION('11.2.0.1')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T1"."FLAG"))

It is simple for this query but It can be hard to guess the plan from this output what you can do is paste the outline information the the sql as hint like below.
Only line you need to remove is IGNORE_OPTIM_EMBEDDED_HINTS to avoid ignoring hints.

var n varchar2(2)
exec :n := 'N1'
select
/*+
OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
DB_VERSION('11.2.0.1')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T1"."FLAG"))
*/
count(*), max(col2) from t1 where flag = :n;

PL/SQL procedure successfully completed.

HR@ORACOS>   2    3    4    5    6    7    8    9

  COUNT(*) MAX(COL2)
---------- --------------------------------------------------
     49999 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Generated plan with outline

HR@ORACOS>  select * from table(dbms_xplan.display_cursor(null,null,'BASIC +OUTLINE'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
EXPLAINED SQL STATEMENT:
------------------------
select /*+ OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
DB_VERSION('11.2.0.1') ALL_ROWS OUTLINE_LEAF(@"SEL$1")
INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T1"."FLAG")) */ count(*),
max(col2) from t1 where flag = :n

Plan hash value: 3625400295

---------------------------------------------
| Id  | Operation                    | Name |
---------------------------------------------
|   0 | SELECT STATEMENT             |      |
|   1 |  SORT AGGREGATE              |      |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1   |
|   3 |    INDEX RANGE SCAN          | I1   |
---------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      DB_VERSION('11.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T1"."FLAG"))
      END_OUTLINE_DATA
  */

As you can see with this method we can find see outline for the baseline and with using that outline the plan inside the baseline (%99 of the time-sql should be same and also underlying objects)

You can use this method (till oracle provides outlines) when you hit the bugs below as well.
As long as you have a baseline it can be used. I also think about couple of other usages but I will keep them for future posts so you can stay tuned.

Bug 8366552: ORA-6502 WHEN TRYING TO DISPLAY PLANS FROM SQL BASELINE
Bug 12335250: DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE CAUSES ORA-979 IF CURSOR_SHARING=FORCE/SIMI
Bug 12833789: DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE RAISING ERROR 22905
Bug 10637509: DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE RECEIVES ORA-6502

Bug 10254875: DBMS_XPLAN.DISPLAY_SQL_PLAN_BASELINE RETURNS: ERROR CODE: 1435

Bonus Material for Geeks and SYSDBA access owners

Same information is also available with the sql below –this sql is multi purpose so you can use it for sql_patches and sql_profiles


select * from (
SELECT
  sa.signature,
  st.sql_handle,
  so.name obj_name,
  decode(sa.obj_type, 1,'SQL_PROFILE', decode(sa.obj_type, 2,'SQL_PLAN_BASELINE',decode(sa.obj_type, 3,'SQL_PATCH', null))) obj_type,
  sa.plan_id,
  st.sql_text,
  sa.creator,
  dbms_smb_internal.map_origin_base_to_view(sa.origin, sa.obj_type)
                         origin,
  sa.description,
  sa.version db_version,
  sa.created,
  sa.last_modified,
  so.last_executed,
  sa.last_verified,
  so.flags status,
  sa.optimizer_cost,
  sa.module,
  sa.action,
  sa.executions,
  sa.elapsed_time,
  sa.cpu_time,
  sa.buffer_gets,
  sa.disk_reads,
  sa.direct_writes,
  sa.rows_processed,
  sa.fetches,
  sa.end_of_fetch_count,
  sa.category,
  s.flags sqlflags,
  sa.task_id,
  sa.task_exec_name,
  sa.task_obj_id,
  sa.task_fnd_id,
  sa.task_rec_id,
  s.inuse_features,
  sa.parse_cpu_time,
  sa.priority,
  sa.optimizer_env,
  sa.bind_data,
  sa.parsing_schema_name,
  sd.comp_data
FROM
  sql$ s,
  sqlobj$auxdata sa,
  sqlobj$ so,
  sql$text st,
  sqlobj$data sd
WHERE
  sa.signature = s.signature and
  sa.signature = so.signature and
  sa.category = so.category and
  sa.obj_type = so.obj_type and
  sa.plan_id = so.plan_id and
  sa.signature = st.signature and
  sa.signature = sd.signature and
  sa.category = sd.category and
  sa.obj_type = sd.obj_type and
  sa.plan_id = sd.plan_id) v
 WHERE
 obj_name='SQL_PLAN_13w748wknkcwd8576eb1f'        ----PLAN NAME
 and category is not null
 and sql_handle = 'SYS_SQL_11f0e4472549338d'      -----PLAN HANDLE
 and obj_type = 'SQL_PLAN_BASELINE'
 and sql_text is not null

March 23, 2012

Adaptive Cursor Sharing with SQL Plan Baselines – Bind Sensitiveness

Filed under: Performance, Plan Stability — coskan @ 11:16 am

Today I saw a post from my friend Dominic Brooks (orastory) whom I work together at the same place for different environments (Wish he is working on the environments I look after together we could have made a great performance team :(). His post is about how Adaptive Cursor Sharing and SQL Plan Baselines work together . Post looks  long but easy to read and enlightening about the issue with one small addition/correction/observation needed. I think he covered well enough but finished his tests a bit soon. I will not repeat what he has done but I need to cover the additional observation/correction part. After the introduction, half way through wrote a section “How then do ACS and SQL Plan Baselines work together?” (wish he uses some marking for subtopics – thats why we have bold and headings Dom). At the end of that section he says “We’re using one of our baseline plans but we’ve lost our bind-awareness.” which is not complete  truth according to  my tests on 11.2.0.1.

I assume table is created, index is created, stats are gathered, two sql_plan_baseline is generated and cursor is flushed as below

HR@ORACOS> select sql_id
    ,      child_number
    ,      is_bind_aware
    ,      is_bind_sensitive
    ,      is_shareable
    ,      to_char(exact_matching_signature) sig
    ,      executions
   ,      plan_hash_value,
   sql_plan_baseline
   from   v$sql
  2    3    4    5    6    7    8    9   10   11    where  sql_id = '731b98a8u0knf';

no rows selected

HR@ORACOS> select to_char(signature) signature, sql_handle, plan_name, enabled, accepted
    from   dba_sql_plan_baselines
  2    3     where   signature = 1292784087274697613;

SIGNATURE                                SQL_HANDLE                     PLAN_NAME                      ENA ACC
---------------------------------------- ------------------------------ ------------------------------ --- ---
1292784087274697613                      SYS_SQL_11f0e4472549338d       SQL_PLAN_13w748wknkcwd616acf47 YES YES
1292784087274697613                      SYS_SQL_11f0e4472549338d       SQL_PLAN_13w748wknkcwd8576eb1f YES YES

First Run — Full table scan —To make the post short I named v$sql query as sqlstat

var n varchar2(2)
exec :n := 'N1'
select count(*), max(col2) from t1 where flag = :n;
HR@ORACOS>
PL/SQL procedure successfully completed.

HR@ORACOS>
  COUNT(*) MAX(COL2)
---------- --------------------------------------------------
     49999 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

HR@ORACOS> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  731b98a8u0knf, child number 0
-------------------------------------
select count(*), max(col2) from t1 where flag = :n

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   203 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    30 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   | 49298 |  1444K|   203   (1)| 00:00:02 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("FLAG"=:N)

Note
-----
   - SQL plan baseline SQL_PLAN_13w748wknkcwd616acf47 used for this statement

23 rows selected.

HR@ORACOS> @sqlstat

SQL_ID        CHILD_NUMBER I I I SIG                                      EXECUTIONS PLAN_HASH_VALUE SQL_PLAN_BASELINE
------------- ------------ - - - ---------------------------------------- ---------- --------------- ------------------------------
731b98a8u0knf            0 N Y Y 1292784087274697613                               1      3724264953 SQL_PLAN_13w748wknkcwd616acf47

Statement is not bind aware which is expected but is bind sensitive and sharable and using baseline

Second Run — Index Scan

  exec :n := 'Y1'
select count(*), max(col2) from t1 where flag = :n;

PL/SQL procedure successfully completed.

HR@ORACOS>
  COUNT(*) MAX(COL2)
---------- --------------------------------------------------
         1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

HR@ORACOS> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  731b98a8u0knf, child number 0
-------------------------------------
select count(*), max(col2) from t1 where flag = :n

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   203 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    30 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   | 49298 |  1444K|   203   (1)| 00:00:02 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("FLAG"=:N)

Note
-----
   - SQL plan baseline SQL_PLAN_13w748wknkcwd616acf47 used for this statement

23 rows selected.

HR@ORACOS> @sqlstat

SQL_ID        CHILD_NUMBER I I I SIG                                      EXECUTIONS PLAN_HASH_VALUE SQL_PLAN_BASELINE
------------- ------------ - - - ---------------------------------------- ---------- --------------- ------------------------------
731b98a8u0knf            0 N Y Y 1292784087274697613                               2      3724264953 SQL_PLAN_13w748wknkcwd616acf47

Statement is not bind aware which is “normal” as Optimizer group already wrote about in their optimizer blog and cursor is still sharable.
Optimizer group said

"Oracle monitors the behavior of the queries, and determined that the different bind values caused the data volumes
manipulated by the query to be significantly different. Based on this difference, Oracle "adapts" its behavior so that
the same plan is not always shared for this query.
Hence on the next execution, a new plan is generated based on the current bind value."

Lets see if it is like that when baselines are in place

Third Run — Index Scan

  exec :n := 'Y1'
select count(*), max(col2) from t1 where flag = :n;

PL/SQL procedure successfully completed.

HR@ORACOS>
  COUNT(*) MAX(COL2)
---------- --------------------------------------------------
         1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

HR@ORACOS> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  731b98a8u0knf, child number 1
-------------------------------------
select count(*), max(col2) from t1 where flag = :n

Plan hash value: 3625400295

-------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |      |     1 |    30 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1   |     1 |    30 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | I1   |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("FLAG"=:N)

Note
-----
   - SQL plan baseline SQL_PLAN_13w748wknkcwd8576eb1f used for this statement

24 rows selected.

HR@ORACOS> @sqlstat

SQL_ID        CHILD_NUMBER I I I SIG                                      EXECUTIONS PLAN_HASH_VALUE SQL_PLAN_BASELINE
------------- ------------ - - - ---------------------------------------- ---------- --------------- ------------------------------
731b98a8u0knf            0 N Y Y 1292784087274697613                               2      3724264953 SQL_PLAN_13w748wknkcwd616acf47
731b98a8u0knf            1 Y Y Y 1292784087274697613                               1      3625400295 SQL_PLAN_13w748wknkcwd8576eb1f

Now on the second run of index favouring bind, new cursor is generated and bind aware also using the index scan baseline which is correct behaviour according to the optimizer groups statement.

Fourth Run — Table Scan

exec :n := 'N1'
select count(*), max(col2) from t1 where flag = :n;

PL/SQL procedure successfully completed.

HR@ORACOS>
  COUNT(*) MAX(COL2)
---------- --------------------------------------------------
     49999 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

HR@ORACOS> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  731b98a8u0knf, child number 2
-------------------------------------
select count(*), max(col2) from t1 where flag = :n

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   203 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    30 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   | 49298 |  1444K|   203   (1)| 00:00:02 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("FLAG"=:N)

Note
-----
   - SQL plan baseline SQL_PLAN_13w748wknkcwd616acf47 used for this statement

23 rows selected.

HR@ORACOS> @sqlstat

SQL_ID        CHILD_NUMBER I I I SIG                                      EXECUTIONS PLAN_HASH_VALUE SQL_PLAN_BASELINE
------------- ------------ - - - ---------------------------------------- ---------- --------------- ------------------------------
731b98a8u0knf            0 N Y N 1292784087274697613                               2      3724264953 SQL_PLAN_13w748wknkcwd616acf47
731b98a8u0knf            1 Y Y Y 1292784087274697613                               1      3625400295 SQL_PLAN_13w748wknkcwd8576eb1f
731b98a8u0knf            2 Y Y Y 1292784087274697613                               1      3724264953 SQL_PLAN_13w748wknkcwd616acf47

Now on the second run of table scan again new cursor is generated and again bind aware also using the right full table scan baseline. Please also look at the first cursor which is not sharable anymore.

Fifth Run — Index Scan

   exec :n := 'Y1'
select count(*), max(col2) from t1 where flag = :n;

PL/SQL procedure successfully completed.

HR@ORACOS>
  COUNT(*) MAX(COL2)
---------- --------------------------------------------------
         1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

HR@ORACOS> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  731b98a8u0knf, child number 1
-------------------------------------
select count(*), max(col2) from t1 where flag = :n

Plan hash value: 3625400295

-------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |      |     1 |    30 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1   |     1 |    30 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | I1   |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("FLAG"=:N)

Note
-----
   - SQL plan baseline SQL_PLAN_13w748wknkcwd8576eb1f used for this statement

24 rows selected.

HR@ORACOS> @sqlstat

SQL_ID        CHILD_NUMBER I I I SIG                                      EXECUTIONS PLAN_HASH_VALUE SQL_PLAN_BASELINE
------------- ------------ - - - ---------------------------------------- ---------- --------------- ------------------------------
731b98a8u0knf            0 N Y N 1292784087274697613                               2      3724264953 SQL_PLAN_13w748wknkcwd616acf47
731b98a8u0knf            1 Y Y Y 1292784087274697613                               2      3625400295 SQL_PLAN_13w748wknkcwd8576eb1f
731b98a8u0knf            2 Y Y Y 1292784087274697613                               1      3724264953 SQL_PLAN_13w748wknkcwd616acf47

ORacle uses correct child cursor which is first child and hasn’t got any of is bind awareness or sensitivity changed

Sixth Run — Table Scan

HR@ORACOS>
  COUNT(*) MAX(COL2)
---------- --------------------------------------------------
     49999 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

HR@ORACOS> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  731b98a8u0knf, child number 2
-------------------------------------
select count(*), max(col2) from t1 where flag = :n

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   203 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    30 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   | 49298 |  1444K|   203   (1)| 00:00:02 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("FLAG"=:N)

Note
-----
   - SQL plan baseline SQL_PLAN_13w748wknkcwd616acf47 used for this statement

23 rows selected.

HR@ORACOS> @sqlstat

SQL_ID        CHILD_NUMBER I I I SIG                                      EXECUTIONS PLAN_HASH_VALUE SQL_PLAN_BASELINE
------------- ------------ - - - ---------------------------------------- ---------- --------------- ------------------------------
731b98a8u0knf            0 N Y N 1292784087274697613                               2      3724264953 SQL_PLAN_13w748wknkcwd616acf47
731b98a8u0knf            1 Y Y Y 1292784087274697613                               2      3625400295 SQL_PLAN_13w748wknkcwd8576eb1f
731b98a8u0knf            2 Y Y Y 1292784087274697613                               2      3724264953 SQL_PLAN_13w748wknkcwd616acf47

ORacle uses correct child cursor which is second child and hasn’t got any of is bind awareness or sensitivity changed

Bonus Run with different Bind N2 which should again favour full scan

exec :n := 'N2'
select count(*), max(col2) from t1 where flag = :n;
select * from table(dbms_xplan.display_cursor);

@sqlstat

PL/SQL procedure successfully completed.

HR@ORACOS>
  COUNT(*) MAX(COL2)
---------- --------------------------------------------------
     49999 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

HR@ORACOS>
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  731b98a8u0knf, child number 2
-------------------------------------
select count(*), max(col2) from t1 where flag = :n

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |   203 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    30 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   | 49298 |  1444K|   203   (1)| 00:00:02 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("FLAG"=:N)

Note
-----
   - SQL plan baseline SQL_PLAN_13w748wknkcwd616acf47 used for this statement

23 rows selected.

HR@ORACOS> HR@ORACOS>
SQL_ID        CHILD_NUMBER I I I SIG                                      EXECUTIONS PLAN_HASH_VALUE SQL_PLAN_BASELINE
------------- ------------ - - - ---------------------------------------- ---------- --------------- ------------------------------
731b98a8u0knf            0 N Y N 1292784087274697613                               2      3724264953 SQL_PLAN_13w748wknkcwd616acf47
731b98a8u0knf            1 Y Y Y 1292784087274697613                               2      3625400295 SQL_PLAN_13w748wknkcwd8576eb1f
731b98a8u0knf            2 Y Y Y 1292784087274697613                               3      3724264953 SQL_PLAN_13w748wknkcwd616acf47

Oracle is doing the right thing by using the child no 1 with right baseline

Bonus Run with different Bind Y2 which should again favour Index Scan

   exec :n := 'Y2'
select count(*), max(col2) from t1 where flag = :n;
select * from table(dbms_xplan.display_cursor);
HR@ORACOS>
PL/SQL procedure successfully completed.

HR@ORACOS>
  COUNT(*) MAX(COL2)
---------- --------------------------------------------------
         1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

HR@ORACOS>
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  731b98a8u0knf, child number 1
-------------------------------------
select count(*), max(col2) from t1 where flag = :n

Plan hash value: 3625400295

-------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE              |      |     1 |    30 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1   |     1 |    30 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | I1   |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("FLAG"=:N)

Note
-----
   - SQL plan baseline SQL_PLAN_13w748wknkcwd8576eb1f used for this statement

24 rows selected.

HR@ORACOS> @sqlstat

SQL_ID        CHILD_NUMBER I I I SIG                                      EXECUTIONS PLAN_HASH_VALUE SQL_PLAN_BASELINE
------------- ------------ - - - ---------------------------------------- ---------- --------------- ------------------------------
731b98a8u0knf            0 N Y N 1292784087274697613                               2      3724264953 SQL_PLAN_13w748wknkcwd616acf47
731b98a8u0knf            1 Y Y Y 1292784087274697613                               3      3625400295 SQL_PLAN_13w748wknkcwd8576eb1f
731b98a8u0knf            2 Y Y Y 1292784087274697613                               3      3724264953 SQL_PLAN_13w748wknkcwd616acf47

Oracle is doing the right thing by using the second child no 2 with right baseline.

Long story short I personally think they work perfectly fine together but I also wish if Oracle gives option to hold this runtime monitoring info in SYSAUX for env where people can accommodate more data in SYSAUX. This will save a lot of time for the initial loads.

To do a fair comparison I needed to do the tests on 11.2.0.1 but Don’t have 11.2.0.2 right now but I’m assuming it should be the same (I think I lost my appetite of installing new oracle version just for fun of it :) )

January 13, 2012

Session Snapper on Duty 2 – (how to support your answers, with proof)

Filed under: Basics, Performance, Tips — coskan @ 10:51 pm

Warning : This long blog is for educational and quality purposes and especially targeting Junior DBAs and DBAs who does not know how to use the Session Snapper
(greatest Oracle Tuning Tool yet) by Tanel Poder . If you feel comfortable with snapper and you think your relationship with your
clients just rocks then there is no need to read the post but you may still give it a go :)

Story started with a long email thread forward by a  developer who is asking question below for a system I am responsible but not feeling perfectly comfortable about how things really work.

Hi Josh (note to reader: Coskan is pronounced as Joshkahn so I use Josh at office)
Any ideas on how we can work with dbas to find out why our database is only committing at a range of 
2-30 sec for 100 commits instead of the ideal 100 commits a sec for a third party process?

I answered

From my point of view 

I will be requesting 

1-	Time interval when this process run
2-	If you have SID information when this process run 
3-	If it gives same results on UAT as well then can we run the operation while tracing is on (which need Change Request-hassle on production)

Then we can contione to understand what can be the problem

Developer replied

It runs 7/24 and UAT testing is not possible since we can't point the feeds to UAT but username is USER_XXX 
if it helps.

Long story short all I understood was in simplest words  they want to be able to process more data in same amount of time,  so I replied with a long mail below

BEGINNING OF THE EMAIL

I did some sampling on your sessions to understand what is happening and I think I can come up with rough conclusions about improvement suggestions.

Lets walk through on what is/maybe happening

This is 500 second sampling of one of your sessions

---Some important Statistics
-------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,     HDELTA, HDELTA/SEC,    %TIME, GRAPH
-------------------------------------------------------------------------------------------------------------------------------------
     71, USER_XXX  , STAT, user commits                                              ,        326,        .65,
     71, USER_XXX  , STAT, session logical reads                                     ,      1.36M,      2.71k,

—Wait profile

     71, USER_XXX  , WAIT, buffer busy waits                                         ,     1.18ms,     2.36us,      .0%, |          |
     71, USER_XXX  , WAIT, log file sync                                             ,     36.12s,    72.24ms,     7.2%, |@         |
     71, USER_XXX  , WAIT, db file sequential read                                   ,    276.17s,   552.34ms,    55.2%, |@@@@@@    |
     71, USER_XXX  , WAIT, flashback buf free by RVWR                                ,   403.28ms,   806.55us,      .1%, |          |
     71, USER_XXX  , WAIT, undo segment extension                                    ,    72.93ms,   145.86us,      .0%, |          |
     71, USER_XXX  , WAIT, SQL*Net message to client                                 ,     2.23ms,     4.47us,      .0%, |          |
     71, USER_XXX  , WAIT, SQL*Net message from client                               ,     178.2s,    356.4ms,    35.6%, |@@@@      |
     71, USER_XXX  , WAIT, SQL*Net more data from client                             ,   710.26ms,     1.42ms,      .1%, |          |

—Wait profile breakdown per sql

---------------------------------------------------------------------------------
Active% | SQL_ID          | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------
    20% | ggjkmtv8cxxh3   | db file sequential read             | User I/O
    19% | 64djbhqv5v93u   | db file sequential read             | User I/O
    13% | 3b8gt3dqtn345   | db file sequential read             | User I/O
     8% |                 | log file sync                       | Commit
     2% | 8zt375nzk26rh   | db file sequential read             | User I/O
     1% | 8ztz7p814q9u2   | db file sequential read             | User I/O
     1% | ggjkmtv8cxxh3   | ON CPU                              | ON CPU
     1% | 64djbhqv5v93u   | ON CPU                              | ON CPU
     0% | 64djbhqv5v93u   | flashback buf free by RVWR          | Configuration
     0% | 64djbhqv5v93u   | SQL*Net more data from client       | Network

--  End of ASH snap 1, end=2012-01-11 21:51:46, seconds=500, samples_taken=499

When we check the stats, you have done 326 commits in 500 seconds which makes 0.65 commit per second but if we take 180 second of idle events (SQL*Net message to client + SQL*Net message from client) out, it is 326 commits in 320 (500-180) seconds active time which is roughly 1 commit per second.

When we look at the wait profile we can see 276 seconds of “db file sequential read” (physical read) over 320 second ofnon-iddle period and 36  second  “log file sync” which we will come later

If we focus on the “db file sequential read” we can see it is %86 (276/320) of the active time and there are 3 main contributer sqls for this case.

SQLs are below

SQL> @sqlid2 ggjkmtv8cxxh3

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- ---------------------------------------------------------------------------------------------------------------
ggjkmtv8cxxh3 3503224323 DELETE FROM TABLE_T WHERE ESMP = :1

1 row selected.

  PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS AVG_FETCH_PE  AVG_ROWS_PE AVG_ELA_SECONDS      AVG_LIO_PE      AVG_PIO_PE
---------- -------- ---------- ---------- ---------- ------------ ------------ --------------- --------------- ---------------
4033870845        4          0         21      20219           .0        392.7       .26172575          1198.3            28.1


Plan hash value: 4033870845

-----------------------------------------------------------------------------------------------
| Id  | Operation         | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |                           |       |       |     4 (100)|          |
|   1 |  DELETE           | TABLE_T                   |       |       |            |          |
|*  2 |   INDEX RANGE SCAN| PK_TABLE_T                |     2 |    50 |     4   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
SQL> @sqlid2 64djbhqv5v93u

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
64djbhqv5v93u 3059590266 some complex update statement which does not need to be shared

PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS AVG_FETCH_PE  AVG_ROWS_PE AVG_ELA_SECONDS      AVG_LIO_PE      AVG_PIO_PE
---------- -------- ---------- ---------- ---------- ------------ ------------ --------------- --------------- ---------------
1856144614        3         26          2     473152           .0         85.4      .270458035           454.4            25.4

Plan hash value: 1856144614

-------------------------------------------------------------------------------------------
| Id  | Operation          | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |                      |       |       |     3 (100)|          |
|   1 |  UPDATE            | TABLE_XXXXXXX        |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| PK_TABLE_XXXXXXX     |     1 |   192 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

SQL> @sqlid2 3b8gt3dqtn345

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
3b8gt3dqtn345 1838812293 some complex update statement which does not need to be shared

PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS AVG_FETCH_PE  AVG_ROWS_PE AVG_ELA_SECONDS      AVG_LIO_PE      AVG_PIO_PE
--------- -------- ---------- ---------- ---------- ------------ ------------ --------------- --------------- ---------------
861798698        3          0          7      22308           .0         97.5      .113096364           382.6            10.9

Plan hash value: 861798698

------------------------------------------------------------------------------------------------
| Id  | Operation          | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |                           |       |       |     3 (100)|          |
|   1 |  UPDATE            | TABLE_ZZZZ                |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| PK_TABLE_ZZZZ             |     1 |    69 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

There is nothing much we can do about index unique scans other than changing the table structure but I think there is room for improve on index range scan for sql_id ggjkmtv8cxxh3 for delete statement which is waitig on db file sequential read for  %31 (500*%20 / 320 ) of the active time.

The reason the delete is doing index range scan is because we are trying to access 3 column composite index PK_TABLE_T by only using 1 column which is COL1. If business logic lets you also give other 2 columns as filter as well we will be doing much throughput with less LIO and PIO

TABLE_OWNER          TABLE_NAME                     INDEX_NAME                     POS# COLUMN_NAME
-------------------- ------------------------------ ------------------------------ ---- ------------------------------
APP_OWNER            TABLE_T                        PK_TABLE_T                        1 COL1
                                                                                      2 COL2
                                                                                      3 COL3

While I was doing sampling on above session at the same time I also did sampling on another session which is also having similar behaviour with different sqls like below.

—Some important Statistics

-------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,     HDELTA, HDELTA/SEC,    %TIME, GRAPH
-------------------------------------------------------------------------------------------------------------------------------------
    192, USER_XXX  , STAT, user commits                                              ,        343,        .69,
    192, USER_XXX  , STAT, session logical reads                                     ,      2.78M,      5.56k,

—Wait profile

  192, USER_XXX  , WAIT, buffer busy waits                                         ,      702us,      1.4us,      .0%, |          |
  192, USER_XXX  , WAIT, log file switch completion                                ,    94.34ms,   188.68us,      .0%, |          |
  192, USER_XXX  , WAIT, log file sync                                             ,     28.26s,    56.53ms,     5.7%, |@         |
  192, USER_XXX  , WAIT, db file sequential read                                   ,     364.3s,    728.6ms,    72.9%, |@@@@@@@@  |
  192, USER_XXX  , WAIT, flashback buf free by RVWR                                ,   348.79ms,   697.57us,      .1%, |          |
  192, USER_XXX  , WAIT, undo segment extension                                    ,    492.4ms,    984.8us,      .1%, |          |
  192, USER_XXX  , WAIT, SQL*Net message to client                                 ,     1.03ms,     2.07us,      .0%, |          |
  192, USER_XXX  , WAIT, SQL*Net message from client                               ,     94.02s,   188.05ms,    18.8%, |@@        |
  192, USER_XXX  , WAIT, SQL*Net more data from client                             ,   368.33ms,   736.67us,      .1%, |          |

—Wait profile breakdown per sql

---------------------------------------------------------------------------------
Active% | SQL_ID          | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------
    37% | dkh6b1nhnhfcu   | db file sequential read             | User I/O
    32% | 29zz9n1b9sppr   | db file sequential read             | User I/O
     7% |                 | log file sync                       | Commit
     1% | dkh6b1nhnhfcu   | ON CPU                              | ON CPU
     1% | 29zz9n1b9sppr   | ON CPU                              | ON CPU
     0% | 29zz9n1b9sppr   | undo segment extension              | Configuration
     0% | 29zz9n1b9sppr   | flashback buf free by RVWR          | Configuration

--  End of ASH snap 1, end=2012-01-11 21:52:17, seconds=500, samples_taken=499

This time 94 seconds idle with 406 (500-94) second processing time which gives us 0.84 (343/406) commit per second and 392 seconds of wait
(364 “db file sequential read”+28 “log file sync”) wait.

When we look at the sqls below this time we can see 160 (500*%32) of 364 seconds (%43 of active time) is being spent on again index range scan (sqlid 29zz9n1b9sppr) on delete statement again due to the lack of composite index column filtering.

I again believe we can improve the access if we are able to give all columns as filter and maintain index unique scan.


TABLE_OWNER          TABLE_NAME                     INDEX_NAME                     POS# COLUMN_NAME                    DSC
-------------------- ------------------------------ ------------------------------ ---- ------------------------------ ----
APP_OWNER            TABLE_Z                        PK_TABLE_Z                      1    COL1
                                                                                    2    COL2
                                                                                    3    COL3

SQL> @sqlid2 dkh6b1nhnhfcu

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
dkh6b1nhnhfcu  558381466 INSERT INTO APP_OWNER.TABLE_Z where bla bla bla bla

PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS AVG_FETCH_PE  AVG_ROWS_PE AVG_ELA_SECONDS      AVG_LIO_PE      AVG_PIO_PE
--------- -------- ---------- ---------- ---------- ------------ ------------ --------------- --------------- ---------------
        0        1        778         15     102566           .0       1065.3      .177031281          5673.4            30.4

-------------------------------------------------
| Id  | Operation                | Name | Cost  |
-------------------------------------------------
|   0 | INSERT STATEMENT         |      |     1 |
|   1 |  LOAD TABLE CONVENTIONAL |      |       |
-------------------------------------------------

SQL> @sqlid2 29zz9n1b9sppr

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- -------------------------------------------------------------------------------------------------------
29zz9n1b9sppr 1453086391 DELETE FROM APP_OWNER.TABLE_Z WHERE ESMP = :1

1 row selected.

PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS AVG_FETCH_PE  AVG_ROWS_PE AVG_ELA_SECONDS      AVG_LIO_PE      AVG_PIO_PE
---------- -------- ---------- ---------- ---------- ------------ ------------ --------------- --------------- ---------------
3229062859        4          0         30      21308           .0       1136.6      .642183193          2081.5            68.1

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |                  |       |       |     4 (100)|          |
|   1 |  DELETE           | TABLE_Z          |       |       |            |          |
|*  2 |   INDEX RANGE SCAN| PK_TABLE_Z       |     5 |   135 |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

I think I managed to make it clear about the index range scan (for more information see the bonus material below), Now Lets see what the log file sync is all about and why we are seeing it .
<p

Log File sync event has got definition below in Oracle Documentation.

log file sync

When a user session commits, the session’s redo information needs to be flushed to the redo logfile. The user session will post the LGWR to write the log buffer to the redo log file. When the LGWR has finished writing, it will post the user session.

Wait Time: The wait time includes the writing of the log buffer and the post.

Parameter Description
buffer# The number of the physical buffer in the redo log buffer that needs to be synchronized

Potential causes

Wait Event General Area Possible Causes Look for / Examine
log file sync I/O, over- committing Slow disks that store the online logs

Un-batched commits

Check the disks that house the online redo logs for resource contention. Check the number of transactions (commits + rollbacks) each second, from V$SYSSTAT.

First rule of tuning log file sync is doing less commit which means instead of commiting every dml or every single operation like

begin
for i in 300001..600000 loop
delete from t1 where id1=to_char(i);
commit;
end loop;
end;
/

Doing commits in batch like

begin
for i in 300001..600000 loop
delete from t1 where id1=to_char(i);
end loop;
commit;
end;
/

To prove how you are doing commits, I just get before sql statistics then get sampling of the session then again get after sql statistics and to see if number of user commits statistics during sampling interval are matching with number of increase in the sql executions

—-note to blog reader: this is, how I did,  no rocket science  (ones you arm yourself with right tools) just run the block as whole

select sql_id,executions from v$sql where sql_id in ('29zz9n1b9sppr','ggjkmtv8cxxh3');
@snapper ash=sql_id+event,stats,gather=stw,sinclude=user%commits 20 1 71,192
select sql_id,executions from v$sql where sql_id in ('29zz9n1b9sppr','ggjkmtv8cxxh3');

Results are

SQL> @test1

SQL_ID        EXECUTIONS
------------- ----------
29zz9n1b9sppr      28897
ggjkmtv8cxxh3      28211

Sampling SID 71,192 with interval 100 seconds, taking 1 snapshots...

-- Session Snapper v3.53 by Tanel Poder ( http://blog.tanelpoder.com )

-------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,     HDELTA, HDELTA/SEC,    %TIME, GRAPH
-------------------------------------------------------------------------------------------------------------------------------------
     71, USER_XXX  , STAT, user commits                                              ,         12,        .12,
     71, USER_XXX  , TIME, repeated bind elapsed time                                ,       59us,      .59us,      .0%, |          |
     71, USER_XXX  , TIME, parse time elapsed                                        ,      127us,     1.27us,      .0%, |          |
     71, USER_XXX  , TIME, DB CPU                                                    ,   278.96ms,     2.79ms,      .3%, |          |
     71, USER_XXX  , TIME, sql execute elapsed time                                  ,     36.32s,   363.21ms,    36.3%, |@@@@      |
     71, USER_XXX  , TIME, DB time                                                   ,     40.75s,   407.51ms,    40.8%, |@@@@@     |
     71, USER_XXX  , WAIT, log file sync                                             ,      4.42s,    44.22ms,     4.4%, |@         |
     71, USER_XXX  , WAIT, db file sequential read                                   ,     35.04s,   350.43ms,    35.0%, |@@@@      |
     71, USER_XXX  , WAIT, SQL*Net message to client                                 ,      242us,     2.42us,      .0%, |          |
     71, USER_XXX  , WAIT, SQL*Net message from client                               ,     60.11s,   601.05ms,    60.1%, |@@@@@@    |
     71, USER_XXX  , WAIT, SQL*Net more data from client                             ,     3.94ms,    39.44us,      .0%, |          |
    192, USER_XXX  , STAT, user commits                                              ,         12,        .12,
    192, USER_XXX  , TIME, DB CPU                                                    ,   207.97ms,     2.08ms,      .2%, |          |
    192, USER_XXX  , TIME, sql execute elapsed time                                  ,     30.74s,   307.37ms,    30.7%, |@@@@      |
    192, USER_XXX  , TIME, DB time                                                   ,     35.14s,    351.4ms,    35.1%, |@@@@      |
    192, USER_XXX  , WAIT, log file sync                                             ,       4.4s,    43.98ms,     4.4%, |@         |
    192, USER_XXX  , WAIT, db file sequential read                                   ,     29.95s,   299.51ms,    30.0%, |@@@       |
    192, USER_XXX  , WAIT, SQL*Net message to client                                 ,      113us,     1.13us,      .0%, |          |
    192, USER_XXX  , WAIT, SQL*Net message from client                               ,     64.55s,   645.48ms,    64.5%, |@@@@@@@   |
    192, USER_XXX  , WAIT, SQL*Net more data from client                             ,    19.36ms,   193.62us,      .0%, |          |
--  End of Stats snap 1, end=2012-01-12 22:27:59, seconds=100

---------------------------------------------------------------
Active% | SQL_ID          | EVENT
---------------------------------------------------------------
    28% | 29zz9n1b9sppr   | db file sequential read             |
    12% | ggjkmtv8cxxh3   | db file sequential read             |
    11% | 3b8gt3dqtn345   | db file sequential read             |
     9% |                 | log file sync                       |
     9% | 64djbhqv5v93u   | db file sequential read             |
     4% | dkh6b1nhnhfcu   | db file sequential read             |
     1% | 8zt375nzk26rh   | db file sequential read             |
     1% | b513mwk4mf9rt   | db file sequential read             |
     1% | 0h01fy07y1c1p   | db file sequential read             |

--  End of ASH snap 1, end=2012-01-12 22:27:59, seconds=100, samples_taken=99

PL/SQL procedure successfully completed.

SQL_ID        EXECUTIONS
------------- ----------
29zz9n1b9sppr      28909
ggjkmtv8cxxh3      28223

As you can see, both sql increase by 12  (after-before execution times) and both sql show user commits statistics delta as 12 which means  there is a commit per execution that leads log file sync event.

After proving that the app is not doing batch commits,  I wonder if this is something avoidable or not,  in terms of business logic. Doing batch commits  will definitely help to increase your overall throughput plus whole system health.

So in my humble opinion we have two options without me seeing any code logic.

1- Providing unique index access for the delete operations by giving all columns of the index
2- Decreasing the number of commits and try to do batch commit.

For you to test all my actions, I have added a a simple synthetic test case attached to understand how these implementations can help us.

Cases tested.

1-Nonbatch commit range scan
2-Nonbatch commit unique scan

3-Batch commit range scan
4-Batch commit unique scan

Timings for the test are as follows for deleting 100K row sets as you can see

Elapsed Improvement
nonbatch-commit-range 84 0%
nonbatch-commit-unique 47 44%
batch-commit-range 55 34%
batch-commit-unique 35 58%

END OF THE EMAIL


Note to blog reader : (In case they are still reading), I wish you found something interesting about how I used the right tool right time
and how I answered the development questions without talking bollocks but numbers and proofs. One thing I need to admit there were only 2 sessions running this operation which made my life easy.

If you wonder the duration of whole diagnose: It took 15 mins in total to find out what is going on and 1 hour to write the mail but that 1 hour save at least 3-4 unnecessary email ping pong just because nobody can prove their fact with numbers.

Quick Tip: If you are sending any output to anybody use Courier New character set and make everybodies eyes happy.

Session Snapper and its usage can be found at Tanels website
Test case can be found  here

November 18, 2011

On demand AWR Report Generator, for Many Node Cluster

Filed under: Basics, Performance, RAC — coskan @ 11:34 am

update ( 10/10/2012 ): I have changed the script to make work better with hourly requests at it was causing issues. Thanks to Dominic Brooks for the trick
update ( 11/10/2012 ): Added ADDM and Global report generation

Since I got turbo boost (over 1000 hits a day-wooow) from Tanel Poder the Great, better I write something to deserve the fame. This one is a real quick one which made my life easier.

On my second week at current company, I had to cover Exadata for 2 weeks which were completely nightmare due to some problems. I had to work with Oracle Support too much and most of the time they ask AWR reports from 8 nodes for x hour duration. First day I was not armed with anything automatic, so tried to do it manually but it was proper nightmare for 8 nodes. Then I said this is complete rubbish way of wasting my time and I need to automate the process otherwise my life will be miserable. Finally I came up with something after couple of attempts

The script below is the result of this automation attempts.

update ( 11/10/2012 ): –not equal sign may have been corrupted during paste for (next_snap_id”0 AND previous_snap_id”0)

-- Name:    gen_snap.sql
--
-- Purpose: Automating ADDM and AWR and Global RAC report generation 

--Usage:  copy the script your script directory. Edit the parameters section then  run it ,  
---it will generate gen_awr.sql  script . Call that script for generating awr reports
---
--- ALWAYS CHECK gen_awr.sql script before you run !!!!!!!!! 
---
-- @gen_snap.sql  
-- @gen_awr 

-- Versioning:
-- Version    Modified by	Date		What Modified
-- -------    ---------------	----------	--------------
-- 1.0	      Coskan Gundogar	11/10/2012	Initial 


------------PARAMETERS - BEGIN -------------------------------

define _interval=2				   --1 for half hour 2 for 1 hour (30 mins snapshots) / 1 for  hour 2 for 2 hour (60 mins snapshots)
define _database_name='DATABASE_NAME'		 ---- database name 
define _begin_interval_date='111012 07:00:00'	 ----begin date DDMMYY HH24:MI:SS
define _end_interval_date='111012 10:00:00'      ----end date   DDMMYY HH24:MI:SS
define _folder='C:\awr\DATABASE_NAME\111012'  ---report output location
define _option=2 ---1 without ADDM 2 with ADDM 
define _global=1 ---0 if you don't want Global Reports for RAC - It will not generate report for Single Instance with any setting 

------------PARAMETERS - END-------------------------------


SET termout OFF 
host mkdir &_folder
SET termout ON
--host dir &_folder
prompt 
prompt 
prompt GEN_AWR.SQL script generation started
prompt 
SET termout OFF 
SET heading off
SET feedback OFF 
SET echo OFF 
SET linesize 155
SET verify OFF 

spool gen_awr.SQL
SELECT 'set veri off;'||CHR(10)||
'set feedback off;' ||CHR(10)||
'set linesize 1500;' ||CHR(10)||
'prompt '||CHR(10)||
'prompt '||CHR(10)||
'prompt REPORT GENERATION STARTED'||CHR(10)||
'prompt '||CHR(10)||
'SET TIME OFF'||CHR(10)||
'set termout off;'||CHR(10)||
'set long 1000000 pagesize 0 longchunksize 1000;'||CHR(10)||
'COLUMN get_clob format a80'||CHR(10)
FROM dual;
with driver as (
select *
from (
select rownum rn,dbid,snap_id,previous_snap_id,next_snap_id,begin_interval_time  from (
SELECT * FROM (
select instance_number,dbid,snap_id,
LEAD(snap_id, &_interval, 0) OVER (PARTITION BY instance_number ORDER BY snap_id DESC NULLS LAST) previous_snap_id,
LAG(snap_id, &_interval, 0) OVER (PARTITION BY instance_number ORDER BY snap_id DESC NULLS LAST) next_snap_id,
begin_interval_time 
from dba_hist_snapshot 
where dbid=(select dbid from v$database where name='&_database_name')
AND instance_number= (select min(inst_id) from gV$database)
AND begin_interval_time>=TO_DATE('&_begin_interval_date','DDMMYY HH24:MI:SS')-1/24
AND begin_interval_time<TO_DATE('&_end_interval_date','DDMMYY HH24:MI:SS')+1/24
) d WHERE next_snap_id0 AND previous_snap_id0 ---not equal sign may have been corrupted during paste
ORDER BY 3
)  s ) where (
CASE WHEN &_INTERVAL = 2 THEN mod(rn,2)  
WHEN &_INTERVAL=1 THEN 1
END)=1
)
SELECT  
CASE &_option
WHEN 1 THEN 
'spool &_folder\awrrptto_&_database_name'||CHR(95)||TO_CHAR(instance_number)||'_'||TO_CHAR(snap_id)||'_'||TO_CHAR(next_snap_id)||'.html'||CHR(10)|| CHR(10)|| 
'SELECT * FROM TABLE(dbms_workload_repository.awr_report_html('||TO_CHAR(dbid)||','||TO_CHAR(instance_number)||','||TO_CHAR(snap_id)||','||TO_CHAR(next_snap_id)||',8));'||CHR(10)
||CHR(10)||'spool off'
WHEN 2 THEN 
'spool &_folder\awrrptto_&_database_name'||CHR(95)||TO_CHAR(instance_number)||'_'||TO_CHAR(snap_id)||'_'||TO_CHAR(next_snap_id)||'.html'||CHR(10)|| CHR(10)|| 
'SELECT * FROM TABLE(dbms_workload_repository.awr_report_html('||TO_CHAR(dbid)||','||TO_CHAR(instance_number)||','||TO_CHAR(snap_id)||','||TO_CHAR(next_snap_id)||',8));'||CHR(10)
||CHR(10)||'spool off'||CHR(10)||
'BEGIN'||CHR(10)||
'DBMS_ADVISOR.create_task (advisor_name      => ''ADDM'',task_name=> ''ADDM_'||TO_CHAR(snap_id)||'_'||TO_CHAR(next_snap_id)||'_AWR_SNAPSHOT'');'||CHR(10)||
'DBMS_ADVISOR.set_task_parameter (task_name => ''ADDM_'||TO_CHAR(snap_id)||'_'||TO_CHAR(next_snap_id)||'_AWR_SNAPSHOT'',parameter => ''START_SNAPSHOT'',value=> '||TO_CHAR(snap_id)||');'||CHR(10)||
'DBMS_ADVISOR.set_task_parameter (task_name => ''ADDM_'||TO_CHAR(snap_id)||'_'||TO_CHAR(next_snap_id)||'_AWR_SNAPSHOT'',parameter => ''END_SNAPSHOT'', value=> '||TO_CHAR(next_snap_id)||');'||CHR(10)||
'DBMS_ADVISOR.set_task_parameter (task_name => ''ADDM_'||TO_CHAR(snap_id)||'_'||TO_CHAR(next_snap_id)||'_AWR_SNAPSHOT'',parameter => ''INSTANCE'', value=> '||TO_CHAR(instance_number)||');'||CHR(10)||
'DBMS_ADVISOR.set_task_parameter (task_name => ''ADDM_'||TO_CHAR(snap_id)||'_'||TO_CHAR(next_snap_id)||'_AWR_SNAPSHOT'',parameter => ''DB_ID'', value=> '||TO_CHAR(dbid)||');'||CHR(10)||
'DBMS_ADVISOR.execute_task(task_name => ''ADDM_'||TO_CHAR(snap_id)||'_'||TO_CHAR(next_snap_id)||'_AWR_SNAPSHOT'');'||CHR(10)||
'END;'||CHR(10)||
'/'||CHR(10)||
'spool &_folder\addm_&_database_name'||CHR(95)||TO_CHAR(instance_number)||'_'||TO_CHAR(snap_id)||'_'||TO_CHAR(next_snap_id)||'_AWR_SNAPSHOT.txt'||CHR(10)||
'SELECT DBMS_ADVISOR.get_task_report(''ADDM_'||TO_CHAR(snap_id)||'_'||TO_CHAR(next_snap_id)||'_AWR_SNAPSHOT'') AS report'||CHR(10)||
'FROM   dual;'||CHR(10)||
'spool off'||CHR(10)
END
FROM (
select ai.inst_id instance_number,d.dbid,d.snap_id,d.previous_snap_id,d.next_snap_id,d.begin_interval_time from driver d,
gv$database ai
order by 1,3);




SELECT 
CASE &_global 
WHEN 1 then
'set timing off space 1 flush on pause off numwidth 10;' ||CHR(10)||
'set echo off feedback off   newpage 1 recsep off;' ||CHR(10)||
'set trimspool on trimout on define "&" concat "." serveroutput on;'||CHR(10)||
'set underline on;'||CHR(10)||
'set heading off;'||CHR(10)||
'set pagesize 50000;'||CHR(10)||
'set echo off;'||CHR(10)||
'set feedback off;'||CHR(10)||
'set linesize 8000;'||CHR(10)||
'set veri off;'||CHR(10)
WHEN 0 THEN '' 
END 
FROM dual;
with driver as (
select *
from (
select rownum rn,dbid,snap_id,previous_snap_id,next_snap_id,begin_interval_time  from (
SELECT * FROM (
select instance_number,dbid,snap_id,
LEAD(snap_id, &_interval, 0) OVER (PARTITION BY instance_number ORDER BY snap_id DESC NULLS LAST) previous_snap_id,
LAG(snap_id, &_interval, 0) OVER (PARTITION BY instance_number ORDER BY snap_id DESC NULLS LAST) next_snap_id,
begin_interval_time 
from dba_hist_snapshot 
where dbid=(select dbid from v$database where name='&_database_name')
AND instance_number= (select min(inst_number) from gV$active_instances)
AND begin_interval_time>=TO_DATE('&_begin_interval_date','DDMMYY HH24:MI:SS')-&_interval/24
AND begin_interval_time<TO_DATE('&_end_interval_date','DDMMYY HH24:MI:SS')+&_interval/24
) d WHERE next_snap_id0 AND previous_snap_id0   --not equal sign may have been corrupted during paste
 ORDER BY 3
)  s ) where (
CASE WHEN &_INTERVAL = 2 THEN mod(rn,2)  
WHEN &_INTERVAL=1 THEN 1
END)=1
)
SELECT  
CASE &_global 
WHEN 1 then
'spool &_folder\global_awrrptto_&_database_name'||CHR(95)||'_'||TO_CHAR(snap_id)||'_'||TO_CHAR(next_snap_id)||'.html'||CHR(10)|| CHR(10)|| 
'SELECT * FROM TABLE(dbms_workload_repository.awr_global_report_html('||TO_CHAR(dbid)||','''','||TO_CHAR(snap_id)||','||TO_CHAR(next_snap_id)||',8));'||CHR(10)
||CHR(10)||'spool off' 
WHEN 0 THEN '' 
END 
FROM driver;



SELECT
'set termout on' ||CHR(10)||
'SET TIME ON' ||CHR(10)||
'prompt '||CHR(10)||
'prompt '||CHR(10)||
'prompt REPORT GENERATION FINISHED'||CHR(10)||
'prompt '||CHR(10)||
'SET TIME OFF'||CHR(10)
FROM dual;
SET termout ON 

spool off

prompt 
prompt 
prompt GEN_AWR.SQL script generation finished
prompt 
prompt 



!!!To save it click on the view source link on the sourcode upright corner

Nothing rocket science, just using dbms_workload_repository.awr_report_html with some junior level analytic.

All you have to do is give the date, interval (only tested for half hour or one hour intervals)and folder to be saved then rest will be done by the script.

It will create a gen_awr.sql script which you will run and go have a coffee then when you come back your awr reports are ready to be sent.

so the sequence is

@gen_snap  ---assume you save it with this name
@gen_awr 
--coffee break
--zip+upload

I would like to automate it further by doing the same for rac comparison or group of instance but I think I am too lazy or maybe busy with some other non-automatable rubbish :( (I also had some issues with rac comparison report formatting with the DBMS_WORKLOAD_REPOSITORY package so I gave up on that one)

Hope this tiny script save your time as much as it saves mine.

November 11, 2011

Where is the sql_id of active session?

Filed under: Bugs, Performance, Plan Stability, Tips — coskan @ 3:39 pm

In this post I will give information on quick ways to find the sql_id’s of sessions for which you can’t see any entry when you look at v$session.

I usually see this behavior when sql is burning CPU in a nested loop and after a long time of running sql_id column suddenly not updated anymore. When I tweeted about it , ,Tanel Poder answered (If I a recall it right damn twitter does not have a proper search and history) something like ” this happens when oracle does too many recursive calls and loses track of the sql_id” (Tanel, if you are reading, please correct me if I am wrong about what you said)

On Metalink there are two notes I could find

Reason for Apparently Incorrect or Null Sql_id in v$sql [ID 406452.1]

This is caused by an architectural change in the 10g release: from 10.2 onwards, the v$sql columns (sql_id, sql_address and so on) will only show the top level user cursor,  if any (ie it will not display the recursive cursors),.
When using a remote procedure call, such as when a rpc call from a client (eg forms client) requests that a plsql object is executed on the server, there is no top level sql and, consequently, there is no sql_id in v$session.

BUG:5146994 - CANNOT FIND RECURSIVE SQL IN V$SQLAREA FOR A RUNNING SNAPSHOT REFRESH JOB
BUG 5724661 Abstract: WRONG VALUES FOR SQL_ID IN V$SESSION

On Null SQL_ID For Active Sessions In V$Session [ID 1343503.1]

From 10.2 onwards sql_id will only show the top level user cursor ( if any). 

Sometimes,Session does a plsql rpc call ie a client ( eg forms client) can request a plsql object to be executed at the server. In this case there is no top level sql and as such we don't see the sql_id in v$session.

Currently, oracle don't store the current sql thru any v$ views. The only way to find that is thru event 10046.
This is covered in Unpublished bug 5528670: "NULL VALUES FOR SQL_ID, SQL_HASH_VALUE, SQL_ADDRESS AND OTHERS IN V$SESSION", Which was closed as "Not a Bug".

When I have this issues I sometimes get it even on a sql which I run from sql_plus without a pl/sql call so I think limitation on these notes is a bit misleading or I am interpreting the information wrong. Anyway finding the reasons needs too much tracing research bla bla bla which you will never have time to go through when you hit the issue on a production system so best I cut the chase and start giving information how to find the sql_id

Notes says there is no v$ view which is a a bit misleading and notes also say 10046 is the only way to find it which is also another documentation error ignoring method 2 below.

On 10G there are 2 ways that I know will usualls work one of which needs tuning and diagnostics pack license other does not

On 11G there is one more way which again needs tuning and diagnostics pack license.

Method 1- using ASH/AWR data + v$sql combination (10g-11g license needed)

1a- Find the latest not null sql_id column for the given session and serial in v$active_session_history

1b- If you cant find it at first step, due to the reason sql_id information already lost in v$active_session_history, then you need to check similar way using dba_hist_active_sess_history

2- Once you find the sql_id,plan_hash_value double check it on v$sql if number on users_executing column is matching in current v$session then you have what you looking for

—Method-1 on Action

DATABASE1> @swact

    SID USERN STATE   EVENT                        SEQ# SEC_IN_STATE         P1         P2         P3  SQL_ID         CHN
------- ----- ------- ---------------------- ---------- ------------ ---------- ---------- ----------  ------------- ----
   1352 XXWEB WAITING db file scattered read      40906            0         67    2638509         32                       --->sql_id empty
    294 XXWEB WAITING db file scattered read      16727            0         82     234468         32                       --->sql_id empy
   1263 COSKA WORKING On CPU / runqueue              44            0 1413697536          1          0  f5cqythck1gfz    1

DATABASE1> @last_sql 1352 %

SAMPLE_TIME                           SESSION_ID SESSION_SERIAL# SQL_ID		PLAN_HASH_VALUE
------------------------------------- ---------- --------------- -------------  ---------------
31-JAN-11 11.00.35.524                      1352              72 3p1b93fq81x0f  2118159443	   ----> very likely our sql
31-JAN-11 11.00.34.514                      1352              72 3p1b93fq81x0f  2118159443
31-JAN-11 11.00.33.514                      1352              72 3p1b93fq81x0f  2118159443
31-JAN-11 11.00.32.514                      1352              72 3p1b93fq81x0f  2118159443
31-JAN-11 11.00.31.514                      1352              72 3p1b93fq81x0f  2118159443
31-JAN-11 11.00.30.514                      1352              72 3p1b93fq81x0f  2118159443
31-JAN-11 11.00.29.504                      1352              72 3p1b93fq81x0f  2118159443
31-JAN-11 11.00.28.494                      1352              72 3p1b93fq81x0f  2118159443
31-JAN-11 11.00.27.494                      1352              72 3p1b93fq81x0f  2118159443

9 rows selected.

DATABASE1> @last_sql 294

SAMPLE_TIME                            SESSION_ID SESSION_SERIAL# SQL_ID	PLAN_HASH_VALUE
-------------------------------------- ---------- --------------- ------------- ---------------
31-JAN-11 10.50.40.071                        294              39 3p1b93fq81x0f 2118159443	  ----> very likely our sql
31-JAN-11 10.50.39.061                        294              39 3p1b93fq81x0f 2118159443
31-JAN-11 10.50.38.061                        294              39 3p1b93fq81x0f 2118159443
31-JAN-11 10.50.37.061                        294              39 3p1b93fq81x0f 2118159443
31-JAN-11 10.50.36.061                        294              39 3p1b93fq81x0f 2118159443
31-JAN-11 10.50.35.061                        294              39 3p1b93fq81x0f 2118159443
31-JAN-11 10.50.34.051                        294              39 3p1b93fq81x0f 2118159443
31-JAN-11 10.50.33.051                        294              39 3p1b93fq81x0f 2118159443
31-JAN-11 10.50.32.051                        294              39 3p1b93fq81x0f 2118159443

9 rows selected.

DATABASE1>-----LETS CHECK THE SQL

DATABASE1> @sqlid 3p1b93fq81x0f

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- --------------------------------------------------------------------------------------------------------
3p1b93fq81x0f 2894132238 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

 CH#   PLAN_HASH OPT_COST EXECUTIONS    FETCHES ROWS_PROCESSED     CPU_MS     ELA_MS    AVG_ELA       LIOS       PIOS USERS_EXECUTING
----  ---------- -------- ---------- ---------- -------------- ---------- ---------- ---------- ---------- ---------- --------------
   0  2118159443   956322          3          1              0 861533.028    5876120 1958.70667   21851873   21739212          2

Elapsed times + users_executing columns saying that these sqls has been running for a very long time and they are %99.999 the ones we are looking for

There are tiny challenges with this approach but not that much challeging.

I used it very much in RAC env with a system where multiple users and child cursors are also running
so if you are careful enough to compare apples with apples (same instance, same plan hash value,right session serial, longer elapsed time cursor) you are very unlikely to be mislead

code used for last_sql is like below (modified a bit for rac aware)


select inst_id,sample_time,session_id,session_serial#,sql_id from (
select inst_id,sample_time,session_id,session_serial#,sql_id from
gv$active_session_history
where
sql_id is not null
 and session_id=&1 and inst_id=&2
 order by 1 desc
) where rownum

Method 2- Oradebug dump error stack (10g-11g no license needed)- Most reliable and certain way it is %100 accurate and good part is it does not need any input from me since Tanel Poder already covered how to go through error stack on his post how to read errorstack output

I tried 10046 and never seen any sql_id information written for the 10 mins period I gathered the trace so I’m not %100 sure if it needs whole call to be finished before it writes the sql_id information. I personally won’t use 10046 for this case.

Method-3- SQL Monitoring (11g onwards-license needed). This is my favourite and its damn quick and %99.9 it works if the sql was running long enough to get monitored and then lost the track. It is damn simple like this piece of code

select distinct inst_id,sid,status,sql_id,sql_plan_hash_value,sql_child_address,sql_exec_id
 from gv$sql_plan_monitor
 where sid in (&1) and status='EXECUTING';

—Method 3 On Action

DATABASE1> @swact

    SID USERN STATE   EVENT                        SEQ# SEC_IN_STATE         P1         P2         P3  SQL_ID         CHN
------- ----- ------- ---------------------- ---------- ------------ ---------- ---------- ----------  ------------- ----
   1352 XXWEB WAITING db file scattered read      40906            0         67    2638509         32                       --->sql_id empty
    294 XXWEB WAITING db file scattered read      16727            0         82     234468         32                       --->sql_id empy
   1263 COSKA WORKING On CPU / runqueue              44            0 1413697536          1          0  f5cqythck1gfz    1

---Check sql_monitoring
SQL> @mon_sql 1352,294

INST_ID  SID STATUS              SQL_ID        SQL_PLAN_HASH_VALUE SQL_CHILD_ADDRES SQL_EXEC_ID
-------	---- ------------------- ------------- ------------------- ---------------- -----------
1        1352 EXECUTING           3p1b93fq81x0f          2118159443 00000002D88378E8 16777216
1        1352 EXECUTING           2kgnch9h5hbkz          1874321290 00000002D6223418 16777216  ---paren pl/sql
1         294 EXECUTING           3p1b93fq81x0f          2118159443 00000002D88378E8 16777217
1         294 EXECUTING           2kgnch9h5hbkz          1874321290 00000002D6223418 16777217

The second sql_id 2kgnch9h5hbkz is from pl_sql parent call which was calling 3p1b93fq81x0f so they both get monitored by v$sql_monitor

Apart from all these again if you have diagnostics pack license you can always use OEM/Grid Control to check back the sessions history but comparing with querying v$sql_monitor it is as slow as driving an electric car when you have ferrari.

As you can see there are more ways then listed if your session lost track of the sql_ids. If you have more ways (especially with tracing -like dump sql if there is an option like that) I would really like to hear.

Update after first comment

Method-4 Using v$open_cursor

First comment on which Michael Dinh mentioned he is using toad open cursors option to hunt down the sqls but he was not sure for the sqlplus.
I had a chance to to check with one of our systems. Looks like v$open_cursor is giving us information which not as clear as the other methods but still can be very
useful for standart edition or users without diagnostics pack and also can be easier than error stack which usually needs a change request on PRD systems.

SQL> @sw 207,980

    SID USERN STATE   EVENT                    SEQ# SEC_IN_STATE SQL_ID         CHN                        P1                        P2                        P3
------- ----- ------- ------------------ ---------- ------------ ------------- ---- ------------------------- ------------------------- -------------------------
    207 XXXXX WORKING On CPU / runqueue       62033            3                    cellhash#=203CB5D5         diskhash#= 349428727      bytes= 16384
    980 XXXXX WORKING On CPU / runqueue         759            1                    cellhash#=D84BEC46         diskhash#=56AC941E       bytes= 16384

SQL> @mon_sql 207,980

   INST_ID        SID STATUS              SQL_ID        SQL_PLAN_HASH_VALUE SQL_CHILD_ADDRES SQL_EXEC_ID
---------- ---------- ------------------- ------------- ------------------- ---------------- -----------
         2        980 EXECUTING           91uc7cvw2rqab          4166729024 000000040D4560F8    33554458
         2        207 EXECUTING           91uc7cvw2rqab          4166729024 000000040D4560F8    33554456

SQL> select * from v$open_cursor where sid in (207,980) order by 2;

SADDR                   SID USER_NAME  ADDRESS          HASH_VALUE SQL_ID        SQL_TEXT LAST_SQL_ SQL_EXEC_ID CURSOR_TYPE
---------------- ---------- ---------- ---------------- ---------- ------------- -------- --------- ----------- -----------------------------------
00000004508BA640        207 XXXXXXX    00000003A72340A0 2292456266 gw771824a86ua Begin PK              33554456 OPEN
00000004508BA640        207 XXXXXXX    00000003E6F0C230 2181403846 5bvw0ka10b566 Begin PK                       OPEN
00000004508BA640        207 XXXXXXX    000000040E81D128 4111302712 6k03snbuhv01s Begin PK                       OPEN
00000004508BA640        207 XXXXXXX    00000003EA88E4F0 1535077424 68u7tx5dryv1h Begin PK                       OPEN
00000004508BA640        207 XXXXXXX    00000004592E3690  914163366 4vs91dcv7u1p6 insert i                       OPEN-RECURSIVE
00000004508BA640        207 XXXXXXX    0000000411FC64C0 1407509578 8nwyam59y9t2a Begin PK                       OPEN
00000004508BA640        207 XXXXXXX    00000004117E1340 2399958818 62h75aq7hsxt2 Begin PK                       OPEN
00000004508BA640        207 XXXXXXX    00000004125C3DC0 4163623243 91uc7cvw2rqab INSERT I              33554456 OPEN-PL/SQL
0000000452AA1BB8        980 XXXXXXX    00000004125C3DC0 4163623243 91uc7cvw2rqab INSERT I              33554458 OPEN-PL/SQL
0000000452AA1BB8        980 XXXXXXX    00000004592E3690  914163366 4vs91dcv7u1p6 insert i                       OPEN-RECURSIVE
0000000452AA1BB8        980 XXXXXXX    000000040E81D128 4111302712 6k03snbuhv01s Begin PK                       OPEN
0000000452AA1BB8        980 XXXXXXX    0000000411FC64C0 1407509578 8nwyam59y9t2a Begin PK                       OPEN
0000000452AA1BB8        980 XXXXXXX    00000003E6F0C230 2181403846 5bvw0ka10b566 Begin PK                       OPEN
0000000452AA1BB8        980 XXXXXXX    00000004117E1340 2399958818 62h75aq7hsxt2 Begin PK                       OPEN
0000000452AA1BB8        980 XXXXXXX    00000003EA88E4F0 1535077424 68u7tx5dryv1h Begin PK                       OPEN
0000000452AA1BB8        980 XXXXXXX    00000003A72340A0 2292456266 gw771824a86ua Begin PK              33554458 OPEN

Once we spot the sqls we need to check all of them one by one on v$sql where if users_executing column (UEX) populated

SQL> @sqlid gw771824a86ua   ---->bingo with two users and high ELAPSED TIMES. this is a pl/sql package calling the sql below

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- --------------------------------------------------------------------------------------------------------------------------------
gw771824a86ua 2292456266 xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

INS  CH#  UEX  PLAN_HASH   OPT_COST        CPU_MS     ELA_MS    AVG_ELA OFFLOAD IO_SAVED_% IO_ELIG_GB IO_INTER_GB       LIOS       PIOS
--- ---- ---- ---------- ----------  - ---------- ---------- ---------- ------- ---------- ---------- ----------- ---------- ----------
  2    0    2          0          0     412784696  414481976  16579.279 No             .00        .00      58.007 2653518789    3248794
  3    0    0          0          0    3215639.15 3307820.05 1653.91002 No             .00        .00       3.212 1480568056     184192

----
---- Tried for all other SQLS but thet do not have users_executing column populated
----

SQL> @sqlid 91uc7cvw2rqab   -->two users and high ELAPSED TIMES on 4th child

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- -----------------------------------------------------------------------------------------------------------------------------------
91uc7cvw2rqab 4163623243 xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

INS  CH#  UEX  PLAN_HASH   OPT_COST       CPU_MS     ELA_MS    AVG_ELA OFFLOAD IO_SAVED_% IO_ELIG_GB IO_INTER_GB       LIOS       PIOS
--- ---- ---- ---------- ----------   ---------- ---------- ---------- ------- ---------- ---------- ----------- ---------- ----------
  2    0    0 4166729024      13081   91245882.5 91690487.2 18338.0974 No             .00        .00      11.644 3739546208     657923
  2    1    0 4166729024      13081    120056449  120491925 15061.4907 No             .00        .00      15.889 1250311783     883494
  2    2    0 4166729024      13081   64616814.8 64899824.1 12979.9648 No             .00        .00      10.564  424413836     587066
  2    3    0 4166729024      13081   58282307.7 58454183.2 19484.7277 No             .00        .00       5.442 1809526416     303954
  2    4    2 4166729024      13081   78886102.5 79233558.3 19808.3896 No             .00        .00      13.555 4163654789     756589

Thanks Michael for the information so we have 1 more method which does not need license

August 15, 2011

What happens, when you did not do a load test before release upgrades ?

Filed under: Performance — coskan @ 11:55 pm

This is what happened on 11.2.0.2 upgrade from 10.2.0.5

Picture is taken from production (tool used is for this picture is my recent favourite Database Time Monitor from Dominic Giles – creator of swingbench) today, which was upgraded to 11.2.0.2 on last saturday.

What happens here is I think self-explanatory, our system and underlying I/O is hammered with direct path reads.

Explaining this will not satisfy the customer so question is Why it is happening ?

1- It is obvious that upgrade was not tested good enough

2- Enough research about behaviour change has not been made for 10g to 11g migration

All these are political answers, real technical answer is below

Once I checked the system there were roughly 9 distinct sqlids running with many sessions,  all waiting on the same event.

The sqls were more or less  centralised on querying Users table with filtering email where there is no index and there has never been an index.  Why wasn’t system suffering from this lack of index when we are on 10GR2?

It is because 11G is now using direct path reads whenever it can for full table scans  (for more information check the references at the bottom of this table). This means when we run this query and if the table is eligible for direct path reads it will use direct path reads from disk. When you query with 2-3 sessions this is ok but when you query with 20 sessions then it becomes disaster like here and your disk run queue gets bigger and bigger

What was happening on 10GR2? Because Oracle was using scattered reads and cache when it can we were doing more logical I/O without hammering disk subsystem and this contention never revealed and because this application is third party  no index was created by us even there was unnecessary logical I/O.

Because we only functionality and vendor certifies the production on 11G we just upgraded the database.

Let me replicate the issue with simple test case on 11G by first with default direct path read behaviour and then reverting back to pre 11G behaviour by setting 10949 event at system level (ref: Dion Cho)

Test Case

—Running 20 sessions of simple select each for 10 times. Total 200 runs. (Idea taken from -Andrey S. Nikolaev -)


create table test_direct as select * from dba_source;

---single_thread.sql --- 20 lines of the code below
select  * from test_directpath2 where name like 'TEST%';

---multi_runner.sql
set echo off
set verify off
set head off:
define parallelism="&1"

spool many_threads.sql
select 'host sqlplus hr/hr @single_thread.sql &'
from dba_objects where rownum <= ¶llelism;
spool off
@many_threads.sql

---Running
@multi_runner 20

Default 11G


Pre-10G behaviour

during the issue-Only slight increase during the caching then all LIO

After the issue happened – It lasted only 1 minute

As you can see we have a sudden CPU increased due to high number of logical reads in short time but no long term I/O load,  once oracle caches it burns CPU instead waiting on I/O. Meanwhile I think DB was in maintenance window so resource manager kicked in to avoid hammering CPU–resmgr:cpu quantum event.

Here are the other comparisons between two runs

Session waits – 11G

HR@ORACOS> @swact

    SID USERN STATE   EVENT                                          SEQ# SEC_IN_STATE SQL_ID         CHN                        P1                        P2                        P3 P1TRANSL                                   BLCKNG_SID BLCKNG_SID_S BLCKNG_INST
------- ----- ------- ---------------------------------------- ---------- ------------ ------------- ---- ------------------------- ------------------------- ------------------------- ------------------------------------------ ---------- ------------ -----------
     15 HR    WAITING direct path read                                606            0 ba40q4n8w1myy    0 file number= 10           first dba= 493120         block cnt= 64                                                                   UNKNOWN
    134 HR    WAITING direct path read                                444            0 ba40q4n8w1myy    0 file number= 10           first dba= 493314         block cnt= 62                                                                   UNKNOWN
     75 HR    WAITING direct path read                                483            0 ba40q4n8w1myy    0 file number= 10           first dba= 494656         block cnt= 64                                                                   UNKNOWN
    200 HR    WAITING direct path read                                491            0 ba40q4n8w1myy    0 file number= 10           first dba= 494912         block cnt= 64                                                                   UNKNOWN
    196 HR    WAITING direct path read                                493            0 ba40q4n8w1myy    0 file number= 10           first dba= 494912         block cnt= 64                                                                   UNKNOWN
     12 HR    WAITING direct path read                                504            0 ba40q4n8w1myy    0 file number= 10           first dba= 495234         block cnt= 62                                                                   UNKNOWN
    132 HR    WAITING direct path read                                507            0 ba40q4n8w1myy    0 file number= 10           first dba= 495234         block cnt= 62                                                                   UNKNOWN
     13 HR    WAITING direct path read                                515            0 ba40q4n8w1myy    0 file number= 10           first dba= 495617         block cnt= 63                                                                   UNKNOWN
    201 HR    WAITING direct path read                                532            0 ba40q4n8w1myy    0 file number= 10           first dba= 496192         block cnt= 64                                                                   UNKNOWN
    137 HR    WAITING direct path read                                547            0 ba40q4n8w1myy    0 file number= 10           first dba= 496704         block cnt= 64                                                                   UNKNOWN
    136 HR    WAITING direct path read                                453            0 ba40q4n8w1myy    0 file number= 13           first dba= 100032         block cnt= 64                                                                   UNKNOWN
    202 HR    WAITING direct path read                                464            0 ba40q4n8w1myy    0 file number= 13           first dba= 100354         block cnt= 62                                                                   UNKNOWN
    135 HR    WAITING direct path read                                472            0 ba40q4n8w1myy    0 file number= 13           first dba= 100610         block cnt= 62                                                                   UNKNOWN
    199 HR    WAITING direct path read                                485            0 ba40q4n8w1myy    0 file number= 13           first dba= 101056         block cnt= 64                                                                   UNKNOWN
     71 HR    WAITING direct path read                                502            0 ba40q4n8w1myy    0 file number= 13           first dba= 101506         block cnt= 62                                                                   UNKNOWN
     16 HR    WAITING direct path read                                505            0 ba40q4n8w1myy    0 file number= 13           first dba= 101696         block cnt= 64                                                                   UNKNOWN
     73 HR    WAITING direct path read                                521            0 ba40q4n8w1myy    0 file number= 13           first dba= 102208         block cnt= 64                                                                   UNKNOWN
    133 HR    WAITING direct path read                                591            0 ba40q4n8w1myy    0 file number= 13           first dba= 98946          block cnt= 62                                                                   UNKNOWN
     74 HR    WAITING direct path read                                432            0 ba40q4n8w1myy    0 file number= 13           first dba= 99330          block cnt= 62                                                                   UNKNOWN
     72 HR    WAITING direct path read                                608            0 ba40q4n8w1myy    0 file number= 13           first dba= 99520          block cnt= 64                                                                   UNKNOWN
    192 HR    WORKING On CPU / runqueue                                43            0 90sp00nt0sspn    0 driver id=                #bytes= 1                 0                                                                               NOT IN WAIT
                                                                                                          0x0000000054435000

21 rows selected.

Session waits – Pre 11G


HR@ORACOS> @swact

    SID USERN STATE   EVENT                                          SEQ# SEC_IN_STATE SQL_ID         CHN                        P1                        P2                        P3 P1TRANSL                                   BLCKNG_SID BLCKNG_SID_S BLCKNG_INST
------- ----- ------- ---------------------------------------- ---------- ------------ ------------- ---- ------------------------- ------------------------- ------------------------- ------------------------------------------ ---------- ------------ -----------
    136       WAITING jobq slave wait                                  10            0                  0 0                         0                         0                                                                               NO HOLDER
     20       WAITING jobq slave wait                                  10            0                  0 0                         0                         0                                                                               NO HOLDER
     23 HR    WAITING resmgr:cpu quantum                               53            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
     25 HR    WAITING resmgr:cpu quantum                               60            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
     22 HR    WAITING resmgr:cpu quantum                               53            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    210 HR    WAITING resmgr:cpu quantum                               50            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
     14 HR    WAITING resmgr:cpu quantum                               56            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    208 HR    WAITING resmgr:cpu quantum                               53            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    206 HR    WAITING resmgr:cpu quantum                               54            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    144 HR    WAITING resmgr:cpu quantum                               58            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    141 HR    WAITING resmgr:cpu quantum                               54            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    140 HR    WAITING resmgr:cpu quantum                               50            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
     74 HR    WAITING resmgr:cpu quantum                               55            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
     81 HR    WAITING resmgr:cpu quantum                               56            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
     85 HR    WAITING resmgr:cpu quantum                               56            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    132 HR    WAITING resmgr:cpu quantum                               54            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
    137 HR    WAITING resmgr:cpu quantum                               57            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             UNKNOWN
     73 HR    WORKING On CPU / runqueue                               506            0 90sp00nt0sspn    0 driver id=                #bytes= 1                 0                                                                               NOT IN WAIT
                                                                                                          0x0000000054435000

     80 HR    WORKING On CPU / runqueue                                56            0 ba40q4n8w1myy    0 driver id=                #bytes= 1                 0                                                                               NOT IN WAIT
                                                                                                          0x0000000062657100

    199 HR    WORKING On CPU / runqueue                                52            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             NOT IN WAIT
     82 HR    WORKING On CPU / runqueue                                57            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             NOT IN WAIT
     13 HR    WORKING On CPU / runqueue                                56            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             NOT IN WAIT
    205 HR    WORKING On CPU / runqueue                                54            0 ba40q4n8w1myy    0 location= 3               = 0                       = 0                                                                             NOT IN WAIT

23 rows selected.

SQL Stats -Pre-11G


HR@ORACOS> @sqlid ba40q4n8w1myy

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
ba40q4n8w1myy  297848798 select  * from test_directpath2 where name like 'TEST%'

       INS  CH#   UEX  PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS    FETCHES ROWS_PROCESSED     CPU_MS     ELA_MS    AVG_ELA       LIOS       PIOS
---------- ----  ---- ---------- -------- ---------- ---------- ---------- ---------- -------------- ---------- ---------- ---------- ---------- ----------
         1    0     0 3436878245     2147        200          1        200        200              0      52000  5333438.4  26.667192    2103470    2102605

SQL Stats -Pre-11G  –Check Elapsed times and PIO difference


HR@ORACOS> @sqlid ba40q4n8w1myy

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
ba40q4n8w1myy  297848798 select  * from test_directpath2 where name like 'TEST%'

       INS  CH#  UEX  PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS    FETCHES ROWS_PROCESSED     CPU_MS     ELA_MS    AVG_ELA       LIOS       PIOS
---------- ---- ---- ---------- -------- ---------- ---------- ---------- ---------- -------------- ---------- ---------- ---------- ---------- ----------
         1    0    0 3436878245     2147        200          2        200        200              0      16120 102802.028  .51401014    2104800      10513

HR@ORACOS>

Snapper -11G


HR@ORACOS> @sn 10 15
Sampling...

-- Session Snapper v3.10 by Tanel Poder @ E2SN ( http://tech.e2sn.com )

----------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                               ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------
     15, HR        , STAT, session logical reads                   ,          3338,      333.8,    ===>check the difference
     15, HR        , STAT, user I/O wait time                      ,          1040,        104,
     15, HR        , STAT, non-idle wait time                      ,          1039,      103.9,
     15, HR        , STAT, non-idle wait count                     ,            53,        5.3,
     15, HR        , STAT, physical read total IO requests         ,            54,        5.4,
     15, HR        , STAT, physical read total multi block requests,            54,        5.4,
     15, HR        , STAT, physical read total bytes               ,      27869184,      2.79M,    ===>check the difference
     15, HR        , STAT, cell physical IO interconnect bytes     ,      27869184,      2.79M,
     15, HR        , STAT, consistent gets                         ,          3338,      333.8,
     15, HR        , STAT, consistent gets direct                  ,          3338,      333.8,
     15, HR        , STAT, physical reads                          ,          3338,      333.8,
     15, HR        , STAT, physical reads direct                   ,          3338,      333.8,
     15, HR        , STAT, physical read IO requests               ,            53,        5.3,
     15, HR        , STAT, physical read bytes                     ,      27344896,      2.73M,
     15, HR        , STAT, file io wait time                       ,          1010,        101,
     15, HR        , STAT, Number of read IOs issued               ,            53,        5.3,
     15, HR        , STAT, no work - consistent read gets          ,          3338,      333.8,
     15, HR        , STAT, table scan rows gotten                  ,        164188,     16.42k,
     15, HR        , STAT, table scan blocks gotten                ,          3338,      333.8,
     15, HR        , TIME, DB CPU                                  ,         70000,        7ms,      .7%, |@         |
     15, HR        , TIME, sql execute elapsed time                ,      10107298,      1.01s,   101.1%, |@@@@@@@@@@|
     15, HR        , TIME, DB time                                 ,      10107298,      1.01s,   101.1%, |@@@@@@@@@@|
     15, HR        , WAIT, direct path read                        ,       9535205,   953.52ms,    95.4%, |@@@@@@@@@@|
--  End of Stats snap 1, end=2011-08-16 00:02:14, seconds=10

-----------------------------------------------------------------------
Active% | SQL_ID          | EVENT                     | WAIT_CLASS
-----------------------------------------------------------------------
    99% | ba40q4n8w1myy   | direct path read          | User I/O
     1% | ba40q4n8w1myy   | ON CPU                    | ON CPU

--  End of ASH snap 1, end=2011-08-16 00:02:14, seconds=10, samples_taken=99

Snapper – Pre-11G  –check logical I/O per second 


HR@ORACOS> @sn 30 192
Sampling...

-- Session Snapper v3.10 by Tanel Poder @ E2SN ( http://tech.e2sn.com )

----------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                               ,         DELTA, HDELTA/SEC,    %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------
    192, HR        , STAT, opened cursors cumulative               ,            77,       2.57,
    192, HR        , STAT, opened cursors current                  ,            -1,       -.03,
    192, HR        , STAT, user calls                              ,           116,       3.87,
    192, HR        , STAT, pinned cursors current                  ,             1,        .03,
    192, HR        , STAT, session logical reads                   ,        406695,     13.56k,	  ===>check the difference
    192, HR        , STAT, CPU used when call started              ,           300,         10,
    192, HR        , STAT, CPU used by this session                ,           300,         10,
    192, HR        , STAT, DB time                                 ,          1926,       64.2,
    192, HR        , STAT, concurrency wait time                   ,             6,         .2,
    192, HR        , STAT, scheduler wait time                     ,          1471,      49.03,
    192, HR        , STAT, non-idle wait time                      ,          1479,       49.3,
    192, HR        , STAT, non-idle wait count                     ,           201,        6.7,
    192, HR        , STAT, enqueue requests                        ,             1,        .03,
    192, HR        , STAT, enqueue releases                        ,             1,        .03,
    192, HR        , STAT, consistent gets                         ,        406695,     13.56k,
    192, HR        , STAT, consistent gets from cache              ,        406695,     13.56k,
    192, HR        , STAT, consistent gets from cache (fastpath)   ,        406695,     13.56k,
    192, HR        , STAT, calls to kcmgcs                         ,           426,       14.2,
    192, HR        , STAT, calls to get snapshot scn: kcmgss       ,            39,        1.3,
    192, HR        , STAT, no work - consistent read gets          ,        406269,     13.54k,
    192, HR        , STAT, table scans (long tables)               ,            39,        1.3,
    192, HR        , STAT, table scan rows gotten                  ,      24488652,    816.29k,
    192, HR        , STAT, table scan blocks gotten                ,        406269,     13.54k,
    192, HR        , STAT, session cursor cache hits               ,            76,       2.53,
    192, HR        , STAT, parse count (total)                     ,            77,       2.57,
    192, HR        , STAT, execute count                           ,            77,       2.57,
    192, HR        , STAT, bytes sent via SQL*Net to client        ,         24169,     805.63,
    192, HR        , STAT, bytes received via SQL*Net from client  ,         19066,     635.53,
    192, HR        , STAT, SQL*Net roundtrips to/from client       ,            76,       2.53,
    192, HR        , TIME, parse time elapsed                      ,          2510,    83.67us,      .0%, |          |
    192, HR        , TIME, PL/SQL execution elapsed time           ,          4570,   152.33us,      .0%, |          |
    192, HR        , TIME, DB CPU                                  ,       3160000,   105.33ms,    10.5%, |@@        |
    192, HR        , TIME, sql execute elapsed time                ,      19257156,   641.91ms,    64.2%, |@@@@@@@   |
    192, HR        , TIME, DB time                                 ,      19757917,    658.6ms,    65.9%, |@@@@@@@   |
    192, HR        , WAIT, Disk file operations I/O                ,            38,     1.27us,      .0%, |          |
    192, HR        , WAIT, latch: cache buffers chains             ,         68244,     2.27ms,      .2%, |          |
    192, HR        , WAIT, resmgr:cpu quantum                      ,      14801390,   493.38ms,    49.3%, |@@@@@     |
    192, HR        , WAIT, SQL*Net message to client               ,           309,     10.3us,      .0%, |          |
    192, HR        , WAIT, SQL*Net message from client             ,       9924100,    330.8ms,    33.1%, |@@@@      |
    192, HR        , WAIT, events in waitclass Other               ,         11340,      378us,      .0%, |          |
--  End of Stats snap 1, end=2011-08-16 00:20:32, seconds=30

-----------------------------------------------------------------------
Active% | SQL_ID          | EVENT                     | WAIT_CLASS
-----------------------------------------------------------------------
    43% | ba40q4n8w1myy   | resmgr:cpu quantum        | Scheduler
    20% | ba40q4n8w1myy   | ON CPU                    | ON CPU

--  End of ASH snap 1, end=2011-08-16 00:20:32, seconds=30, samples_taken=87

I/O Stats – 11G–high runque size

coskan@coskan-lucid:~$ sar -d 1
Linux 2.6.32-33-generic (coskan-lucid) 	16/08/11 	_x86_64_	(4 CPU)

00:00:55          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
00:00:56       dev8-0    132.00 127200.00    104.00    964.42     21.18    130.15      7.58    100.00
00:00:56      dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:00:56      dev8-32    105.00   1680.00     34.00     16.32      0.68      6.38      6.00     63.00
00:00:56      dev8-48      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:00:56      dev8-64     84.00   3240.00   8400.00    138.57     79.66    109.88      9.64     81.00

00:00:56          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
00:00:57       dev8-0    131.00 128320.00     64.00    980.03     22.76    136.87      7.63    100.00
00:00:57      dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:00:57      dev8-32    158.00   2640.00      0.00     16.71      0.79      5.00      5.00     79.00
00:00:57      dev8-48      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:00:57      dev8-64     56.00      0.00  13440.00    240.00    144.52   1049.64     17.86    100.00

I/O Stats – PRe-11G  —no runque


00:11:57          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
00:11:58       dev8-0      2.00      0.00     32.00     16.00      0.03     15.00     15.00      3.00
00:11:58      dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:11:58      dev8-32     10.00    160.00     64.00     22.40      0.04      4.00      4.00      4.00
00:11:58      dev8-48      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:11:58      dev8-64    275.00   2464.00   4736.00     26.18    144.05    193.24      3.64    100.00

00:11:58          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
00:11:59       dev8-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:11:59      dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:11:59      dev8-32      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:11:59      dev8-48      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
00:11:59      dev8-64    440.00   3344.00   2984.00     14.38    146.16    295.02      2.27    100.00

There are two  solutions in this case one is  creating an index to avoid full table scan or disabling direct path read at system level which is not right at all.
Moral Of the story is TEST YOUR SYSTEM WITH SAME/SIMILAR LOAD BEFORE YOU UPGRADE !!!!!

Being certified on a platform won’t mean it will run fine
If you havent read already, these are reference posts about direct path reads on 11GR2

http://blog.tanelpoder.com/2011/08/11/full-scans-direct-path-reads-and-ora-8103-error-hacking-session-video-here-plus-itunes-podcast-address-2/

http://dioncho.wordpress.com/2009/07/21/disabling-direct-path-read-for-the-serial-full-table-scan-11g/

http://afatkulin.blogspot.com/2009/01/11g-adaptive-direct-path-reads-what-is.html

http://oraclue.com/2009/07/17/direct-path-reads-and-serial-table-scans-in-11g/

http://oracledoug.com/serendipity/index.php?/archives/1321-11g-and-direct-path-reads.html

August 13, 2011

dbms_xplan.display_sql_plan_baseline can also lie !

Filed under: Bugs, CBO, Performance, Plan Stability — coskan @ 6:40 pm

This blog post is about another buggy/funny/misleading behaviour which you MUST know if you are using baselines ( especially in a non-controlled fashion by let oracle manage them
) It looks like a long post but it is actually an easy read since nothing rocket science in the post.

All story started when one of our batches was not using the plan stored in the sql plan baseline repository.
In these cases app support check if the baseline is used by checking v$sql.sql_plan_baseline value.

If it is not used then they check the enabled, accepted baselines by using the statement below as in documentation

dbms_xplan.display_sql_plan_baseline(sql_handle=>’&sql_handle’,plan_name=>’&sql_plan’, format=>’BASIC’));

Once they find the plan_hash_value they are looking for they try to fix it and expect Oracle use the baseline which it should use in theory.

In our case it didn’t. When I checked the plan baseline was created in september 2010 and never actually used since 14 January 2011.
Fun part is some occasions same plan hash value in the baseline is being generated but Oracle is not reporting that baseline is used. But that’s the whole point of baselines having a constant plan hash value. Why oracle is not using the plan baseline even it generates the same plan with same plan hash value

We raised the issue with Oracle and oracle come back with couple of request

1- 10053 trace
2- plan baseline trace( alter session set events ‘trace [SQL_Plan_Management.*]‘;)

We provided the requests and they said if there was a change in the table structure at some point which is causing the plan can’t be reproduced. App team confirmed that there was a structural change to drop global index and recreate it as local on actually the date when the baseline was last used.
The problem is we changed a global index to local index on 14 January 2011 with an index name change, but display_sql_plan_baseline shows us a plan which uses a local index with the current name.This cannot be possible since the baseline is create on September 2010. So what was wrong?

Support could not managed to give me a decent answer or accept that it is a bug so it is time for me to create a test case and reveal the truth why this issue is happening.


HR@ORACOS> ---simple table creation
HR@ORACOS> create table baseline_test as select * from dba_objects;

Table created.

HR@ORACOS> ---index creation
create unique index baseline_test_idx_1
  2  on baseline_test(object_id);

Index created.

HR@ORACOS> ---gather table stats
HR@ORACOS> @gts baseline_test

PL/SQL procedure successfully completed.

HR@ORACOS> HR@ORACOS>
HR@ORACOS>
HR@ORACOS> ---start session level baseline capturing process
HR@ORACOS> alter session set optimizer_capture_sql_plan_baselines=TRUE;

Session altered.

HR@ORACOS> ---run the query below several times for plan capturing happens
HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> @e1

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  910wg4cc9fbwn, child number 0
-------------------------------------
select object_name from baseline_test  where object_id=5 and
data_object_id=10

Plan hash value: 3566649846

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |     1 |    32 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_1 |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("DATA_OBJECT_ID"=10)
   2 - access("OBJECT_ID"=5)

Note
-----
   - SQL plan baseline SQL_PLAN_g64u8gfpwnfr028052575 used for this statement

25 rows selected.

HR@ORACOS> @sql_plan_baselines2 910wg4cc9fbwn

##################
#active baselines#
##################

   INST_ID   UE SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER SQL_PLAN_BASELINE              SQL_HANDLE                     PLAN_NAME                      ENABLED ACCEPTED FIXED CREATED                   LAST_EXECUTED
---------- ---- ------------- --------------- ------------ ------------------------------ ------------------------------ ------------------------------ ------- -------- ----- ------------------------- -------------------------
         1    0 910wg4cc9fbwn      3566649846            0 SQL_PLAN_g64u8gfpwnfr028052575 SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr028052575 YES     YES      NO    13-AUG-11 15.45.42.000000 13-AUG-11 15.45.42.000000

Now it is time to check the plan output


HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'BASIC'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr028052575

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr028052575         Plan id: 671425909
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

Plan hash value: 3566649846

-----------------------------------------------------------
| Id  | Operation                   | Name                |
-----------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |
|   2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_1 |
-----------------------------------------------------------

21 rows selected.

Since we have a baseline now we can change the table index structure to replicate the issue


HR@ORACOS> ---now drop the index
HR@ORACOS> drop index baseline_test_idx_1
  2  ;

Index dropped.

HR@ORACOS> create unique index baseline_test_idx_2 on baseline_test(data_object_id,object_id);

Index created.

HR@ORACOS> ---Run it couple of times for baseline generation

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> @e1

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  910wg4cc9fbwn, child number 0
-------------------------------------
select object_name from baseline_test  where object_id=5 and
data_object_id=10

Plan hash value: 3304552195

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |       |       |     2 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |     1 |    32 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_2 |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DATA_OBJECT_ID"=10 AND "OBJECT_ID"=5)

20 rows selected.

As you can see baseline no baseline is used. Lets check if a baseline captured ?


HR@ORACOS> @sql_plan_baselines2 910wg4cc9fbwn

##################
#active baselines#
##################

   INST_ID   UE SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER SQL_PLAN_BASELINE              SQL_HANDLE                     PLAN_NAME                      ENABLED ACCEPTED FIXED CREATED                   LAST_EXECUTED
---------- ---- ------------- --------------- ------------ ------------------------------ ------------------------------ ------------------------------ ------- -------- ----- ------------------------- -------------------------
         1    0 910wg4cc9fbwn      3304552195            0                                SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr028052575 YES     YES      NO    13-AUG-11 15.45.42.000000 13-AUG-11 15.45.42.000000
         1    0 910wg4cc9fbwn      3304552195            0                                SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr09db5a957 YES     NO       NO    13-AUG-11 15.49.54.000000

Two baselines are captured in the system, one is accepted which is the one initially created before the index recreation but the other one is not accepted ??

Now it is time to see what the plans of these baselines

HR@ORACOS> --ACCEPTED PLAN
HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'BASIC'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr028052575

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr028052575         Plan id: 671425909
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

Plan hash value: 3304552195

-----------------------------------------------------------
| Id  | Operation                   | Name                |
-----------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |
|   2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_2 |
-----------------------------------------------------------

21 rows selected.

HR@ORACOS> ---NON ACCEPTED NEWLY GENERATED PLAN

HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'BASIC'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr09db5a957

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr09db5a957         Plan id: 2645928279
Enabled: YES     Fixed: NO      Accepted: NO      Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

Plan hash value: 3304552195

-----------------------------------------------------------
| Id  | Operation                   | Name                |
-----------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |
|   2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_2 |
-----------------------------------------------------------

21 rows selected.

They are same ???? How come the plan can be same with the one which has different index names ??
If you are a careful reader you might have spotted The plan_id is different with same plan hash value

Lets check with ADVANCED option instead BASIC


HR@ORACOS> ----ACCEPTED BASELINE

HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'ADVANCED'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr028052575

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr028052575         Plan id: 671425909
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Outline Data from SMB:

  /*+
      BEGIN_OUTLINE_DATA
      INDEX_RS_ASC(@"SEL$1" "BASELINE_TEST"@"SEL$1" ("BASELINE_TEST"."OBJECT_ID"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */
--------------------------------------------------------------------------------

Plan hash value: 3304552195

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |     1 |    32 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |     1 |    32 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_2 |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1 / BASELINE_TEST@SEL$1
   2 - SEL$1 / BASELINE_TEST@SEL$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      INDEX_RS_ASC(@"SEL$1" "BASELINE_TEST"@"SEL$1" ("BASELINE_TEST"."DATA_OBJECT_ID"
              "BASELINE_TEST"."OBJECT_ID"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DATA_OBJECT_ID"=10 AND "OBJECT_ID"=5)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "OBJECT_NAME"[VARCHAR2,128]
   2 - "BASELINE_TEST".ROWID[ROWID,10]

68 rows selected.

Did you spot something ? Look carefully because I did not look carefully and wasted 1 day. Once you know how to get the higher detail of information better you should look carefully


HR@ORACOS>---NON ACCEPTED NEWLY GENERATED PLAN
hR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'ADVANCED'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr028052575

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr028052575         Plan id: 671425909
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Outline Data from SMB:

  /*+
      BEGIN_OUTLINE_DATA
      INDEX_RS_ASC(@"SEL$1" "BASELINE_TEST"@"SEL$1" ("BASELINE_TEST"."OBJECT_ID"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */
--------------------------------------------------------------------------------

Plan hash value: 3304552195

---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |     1 |    32 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| BASELINE_TEST       |     1 |    32 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | BASELINE_TEST_IDX_2 |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1 / BASELINE_TEST@SEL$1
   2 - SEL$1 / BASELINE_TEST@SEL$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      INDEX_RS_ASC(@"SEL$1" "BASELINE_TEST"@"SEL$1" ("BASELINE_TEST"."DATA_OBJECT_ID"
              "BASELINE_TEST"."OBJECT_ID"))
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      DB_VERSION('11.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DATA_OBJECT_ID"=10 AND "OBJECT_ID"=5)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - "OBJECT_NAME"[VARCHAR2,128]
   2 - "BASELINE_TEST".ROWID[ROWID,10]

68 rows selected.

If you haven’t spotted already, the problem with the accepted plan output is the difference between OUTLINE DATA FROM SMB and OUTLINE DATA sections.
There are two outline sections per baseline output. one is OUTLINE DATA FROM SMB and other is OUTLINE DATA

Now you can do line by line comparison and see the difference.

Outline Data from SMB:

/*+
BEGIN_OUTLINE_DATA
INDEX_RS_ASC(@”SEL$1″ “BASELINE_TEST”@”SEL$1″ (“BASELINE_TEST”.”OBJECT_ID”))
OUTLINE_LEAF(@”SEL$1″)
ALL_ROWS
DB_VERSION(‘11.2.0.1′)
OPTIMIZER_FEATURES_ENABLE(‘11.2.0.1′)
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/

Outline Data
————-

/*+
BEGIN_OUTLINE_DATA
INDEX_RS_ASC(@”SEL$1″ “BASELINE_TEST”@”SEL$1″ (“BASELINE_TEST”.”DATA_OBJECT_ID”
“BASELINE_TEST”.”OBJECT_ID”))
OUTLINE_LEAF(@”SEL$1″)
ALL_ROWS
DB_VERSION(‘11.2.0.1′)
OPTIMIZER_FEATURES_ENABLE(‘11.2.0.1′)
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/

What happens here is that Oracle only stores the outline information and when you run the display_sql_plan_baseline
Oracle tries to generate the plan output from the outline but problem starts there because it tries to use the index which is not existent on the system anymore so it can’t reproduce the plan and instead of giving any warning or error it generates a plan for you by lying you.

To confirm what we said lets drop the secondary index and see what dbms_xplan.display_sql_plan_baseline will generate for us for the same baseline


HR@ORACOS> ---dropping index
HR@ORACOS>
HR@ORACOS> drop index baseline_test_idx_2;

Index dropped.

------FIRST BASELINE CREATED

HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'BASIC'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr09db5a957

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr09db5a957         Plan id: 2645928279
Enabled: YES     Fixed: NO      Accepted: NO      Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

Plan hash value: 2027741861

-------------------------------------------
| Id  | Operation         | Name          |
-------------------------------------------
|   0 | SELECT STATEMENT  |               |
|   1 |  TABLE ACCESS FULL| BASELINE_TEST |
-------------------------------------------

20 rows selected.

As you can see now it generates a new plan for us for the same baseline :) What a liar

What is the output from the valid but not accepted baseline, once we dropped the index it was using


HR@ORACOS> select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>'&sql_handle',plan_name=>'&sql_plan', format=>'BASIC'));
Enter value for sql_handle: SYS_SQL_f313487babca3ae0
Enter value for sql_plan: SQL_PLAN_g64u8gfpwnfr028052575

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL handle: SYS_SQL_f313487babca3ae0
SQL text: select object_name from baseline_test  where object_id=5 and
          data_object_id=10
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
Plan name: SQL_PLAN_g64u8gfpwnfr028052575         Plan id: 671425909
Enabled: YES     Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
--------------------------------------------------------------------------------

Plan hash value: 2027741861

-------------------------------------------
| Id  | Operation         | Name          |
-------------------------------------------
|   0 | SELECT STATEMENT  |               |
|   1 |  TABLE ACCESS FULL| BASELINE_TEST |
-------------------------------------------

20 rows selected.

Same issue happened again.

Lets run the same sql to see if it generates and accepts a baseline with full table scan


HR@ORACOS> ---Lets run the sql again to force a plan
HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> select object_name from baseline_test  where object_id=5 and data_object_id=10;

no rows selected

HR@ORACOS> @e1

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  910wg4cc9fbwn, child number 0
-------------------------------------
select object_name from baseline_test  where object_id=5 and
data_object_id=10

Plan hash value: 2027741861

-----------------------------------------------------------------------------------
| Id  | Operation         | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |               |       |       |   217 (100)|          |
|*  1 |  TABLE ACCESS FULL| BASELINE_TEST |     1 |    32 |   217   (1)| 00:00:02 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(("OBJECT_ID"=5 AND "DATA_OBJECT_ID"=10))

19 rows selected.

The baseline is generated as below but it is still not accepted.


HR@ORACOS> @sql_plan_baselines2 910wg4cc9fbwn

##################
#active baselines#
##################

   INST_ID   UE SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER SQL_PLAN_BASELINE              SQL_HANDLE                     PLAN_NAME                      ENABLED ACCEPTED FIXED CREATED                   LAST_EXECUTED
---------- ---- ------------- --------------- ------------ ------------------------------ ------------------------------ ------------------------------ ------- -------- ----- ------------------------- -------------------------
         1    0 910wg4cc9fbwn      2027741861            0                                SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr028052575 YES     YES      NO    13-AUG-11 15.45.42.000000 13-AUG-11 15.45.42.000000
         1    0 910wg4cc9fbwn      2027741861            0                                SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr09db5a957 YES     NO       NO    13-AUG-11 15.49.54.000000
         1    0 910wg4cc9fbwn      2027741861            0                                SYS_SQL_f313487babca3ae0       SQL_PLAN_g64u8gfpwnfr018837053 YES     NO       NO    13-AUG-11 15.59.15.000000

I assume You already understood the issue. This is a proper buggy/dodgy behaviour for me.

Now lets try some workaround to see how we can get over from this situation.

There are 2 ways to get around from this problem and all of them needs DBA input. I have tested all and they are working but you can go ahead and test them. I don’t want to make this blog post longer than it is so I’m only listing the options.

1- Evolve the baseline for the sql_handle so Oracle accepts the plans (will keep the invalid ones in the system)

2- Drop the invalid baselines and evolve the handle (this way you will get rid of the invalid ones)

3- Drop all baselines related with the objects altered and allow capturing to the trick for you. (Less headache than second but you may lose the good plans- still would be my approach for releases )

Even Oracle warns in documentation “SQL plan baselines cannot help in cases where an event has caused irreversible execution plan changes, such as dropping an index.” and also ” if a change in the system (such as a dropped index) causes all accepted plans to become non-reproducible, then the optimizer selects the best-cost plan. “
The did not give much information about the behaviour.

From my point of view there are two major problems which needs to be addressed for this specific table structure change case

1- Instead of showing a plan which is not same as baseline, display_sql_plan_baseline must report an error that stored baseline is not reproduce-able

2- Baselines are not invalidated there must be an invalidation flag which makes the baseline invalidated at very first time CBO can’t reproduce the plan via baseline and set enabled and accepted flag to no. Since the purging of the plans is 53 weeks by
default for non used baselines this is a needed enhancement. If a plan is accepted and not produceable with smaller cost it avoids other plans accepted.

Another problem with baselines is also covered in this clear post by Maxym Kharchenko which can be related with my first point (that very same post series from Maxym  is one of the reasons I did not continue my series since author nearly covered all -but I still have something to write) What happens if both objects are under two schemas like some development groups uses for different testing phases. One drops the index one did not and you have system level capturing on. ?

I think Oracle must address these issues by introducing another complexity for sql signatures which involves schema owner or something similar.

Long story short you might be easily fooled by baselines with the current structure. If I were you I will always check the plans with at least OUTLINE option or go for ADVANCED.

Confession:n I am always using advanced option but as usual I did not pay that much attention to the output at first point :( That was my biggest mistake there which cost 1 day for analysis. knowing and using an option but not paying attention to the output of the option.)

All tests are done on 11.2.0.1

Bonus material

For the ones who likes tracefiles this issue is shown as this

=======TRACEFILE ======

------- START SPM Plan Dump -------
SPM: failed to reproduce the plan using the following info:
  parse_schema name        : HR
  plan_baseline signature  : 17515423071774325472
  plan_baseline plan_id    : 671425909
  plan_baseline hintset    :
    hint num  1 len 37 text: OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
SPM: generated non-matching plan:
----- Explain Plan Dump -----
----- Plan Table -----

============
Plan Table
============
------------------------------------------+-----------------------------------+
| Id  | Operation          | Name         | Rows  | Bytes | Cost  | Time      |
------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT   |              |       |       |   217 |           |
| 1   |  TABLE ACCESS FULL | BASELINE_TEST|     1 |    32 |   217 |  00:00:02 |
------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
1 - filter(("OBJECT_ID"=5 AND "DATA_OBJECT_ID"=10))

Content of other_xml column
===========================
  db_version     : 11.2.0.1
  parse_schema   : HR
  plan_hash      : 2027741861
  plan_hash_2    : 411267155
  Outline Data:
  /*+
    BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      DB_VERSION('11.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "BASELINE_TEST"@"SEL$1")
    END_OUTLINE_DATA
  */

------- END SPM Plan Dump -------
SPM: couldn't reproduce any enabled+accepted plan so using the cost-based plan, planId = 411267155
Older Posts »

Theme: Silver is the New Black. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 203 other followers