Note: This post is about diagnosing real life scenario caused by direct path decision change on 11.2.0.3 which already covered by Tanel Poder on his awesome Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_statistics_driven) post
If you already read and know how to approach the scenario you can easily skip this post and wait another year till I write another post while I’m working at the bank.
Story
Number of Oracle database in our estate is a high number and they are on shared storage arrays which means if one of them is naughty on I/O it has a potential impact on others.
To be proactive, from Time to time I run bunch of queries on all of the databases in a loop to check what is the wait profiles from dba_hist_active_sess_history and v$active_session_history
to see if we can optimize some of the work and offload those unnecessary I/O to unused CPU by utilizing more memory or getting rid of direct path reads when necessary.
Query is really simple and it works for my needs.
All I do is on the first query get the count(*) from ASH grouped by event. By this way I can get the overall wait profile within the ASH.
For the second query get count(*) for wait_class I/O grouped by sql_id ,sql_plan_hash_value,force_matching_signature,event. By this sql I can get the top sql_id for I/O waits (not top elapsed which is all matters for end users)
I’m not saying this is the right approach but practically it works for the purpose.
SQL: RR columns are ratio to report
prompt ************************************ prompt **** ASH OVERALL WAIT PROFILE prompt ************************************ SELECT MIN(sample_time) min_ash_available,sysdate-MIN(sample_time) available duration FROM v$active_session_history; select * from ( select NVL(event,'CPU') event,count(*), round((ratio_to_report(sum(1)) over ()*100),1) rr from gv$active_session_history WHERE user_id<>0 AND sample_time<trunc(SYSDATE+1) AND sample_time>trunc(sysdate-1) group by event order by 2 desc ) where rownum<10; prompt ************************************ prompt **** ASH I/O by SQL_ID prompt ************************************ COLUMN force_matching_signature FOR 999999999999999999999999999 select * from ( select sql_id ,sql_plan_hash_value,force_matching_signature, NVL(event,'CPU') Event, count(*), round((ratio_to_report(sum(1)) over ()*100),1) rr from gv$active_session_history where 1=1 AND wait_class LIKE '%I/O' --AND event IS null and user_id<>0 AND sql_id IS NOT NULL group by sql_id,sql_plan_hash_value,event,force_matching_signature order by 5 desc ) where rownum<30;
For one of the databases which is not looked after by me the output was like this
************************************ **** ASH OVERALL ************************************ EVENT COUNT(*) RR --------------------------------------------- ---------- ------ CPU 109648 63.1 direct path read 35914 20.7 log file sync 26136 15.1 enq: KO - fast object checkpoint 1089 .6 db file sequential read 263 .2 control file sequential read 188 .1 log buffer space 149 .1 asynch descriptor resize 112 .1 enq: HW - contention 31 .0 ************************************ **** ASH I/O ************************************ SQL_ID SQL_PLAN_HASH_VALUE FORCE_MATCHING_SIGNATURE EVENT COUNT(*) RR --------------- ------------------- ---------------------------- --------------------------------------------- ---------- ------ 804t23t68dbmw 1520416513 10621052756171684880 direct path read 11689 32.1 9wthcvamy608f 1764147946 10621052756171684880 direct path read 11059 30.4 7scfmpxgv5mu5 3641059178 6835008959050889988 direct path read 6448 17.7 804t23t68dbmw 2550353655 10621052756171684880 direct path read 3060 8.4 9wthcvamy608f 2550353655 10621052756171684880 direct path read 2787 7.7 9wthcvamy608f 3119703268 10621052756171684880 direct path read 777 2.1 1q21k2wqgw8m3 3586661573 29420080321854211 control file sequential read 135 .4
From this output, we can say, for the period I checked, overall sessions are %63 on CPU and %20 on direct path read. If we checke the ASH I/O we can see same sqls with different plans (804t23t68dbmw,9wthcvamy608f) are contributing in total 77% of that %20 direct path read. Then next step check how much I/O per run the sql (804t23t68dbmw) is doing
SQL> l
1 select
2 plan_hash_value plan_hash,
3 executions,
4 PX_SERVERS_EXECUTIONS pxe,
5 cpu_time/1000 cpu_ms,
6 elapsed_time/1000 ela_ms,
7 (elapsed_time/1000000)/decode(nvl(executions,0),0,1,executions) avg_ela,
8 IO_INTERCONNECT_BYTES/1024/1024/1024 io_inter_gb,
9 PHYSICAL_READ_BYTES/1024/1024/1024 PHYSICAL_READ_GB,
10 PHYSICAL_READ_BYTES/1024/1024/decode(nvl(executions,0),0,1,executions) PIO_MB_PE,
11 buffer_gets/decode(nvl(executions,0),0,1,executions) LIOS_PE,
12 disk_reads/decode(nvl(executions,0),0,1,executions) PIOS_PE
13 from
14 gv$sql
15 where
16 sql_id = ('&1')
17 order BY
18 inst_id,
19 sql_id,
20 hash_value,
21* child_number
SQL>
PLAN_HASH EXECUTIONS PXE CPU_MS ELA_MS AVG_ELA IO_INTER_GB PHYSICAL_READ_GB PIO_MB_PE LIOS_PE PIOS_PE
---------- ---------- ---------- ---------- ---------- ---------- ---------------- ---------------- ---------------- ------------- -------------
2550353655 45 0 88018.62 147748.384 3.28329742 9.37 9.37 213.14 27402.60 27282.00
2550353655 17142 0 16035460.2 34342974 2.00344032 2208.88 2208.71 131.94 16973.79 16901.79
131MB per run (PIO_MB_PE) is looking small at first, but If you check the total reads and executions it looks a bit scary (Note: I use I/O_INTER_GB as we have exadata environments as well, in this case it is not relevant) I checked how many times in a day it runs and how much total being read
SQL> l
1 with ds as (
2 select min(snap_id) min_snap,max(snap_id) max_snap from
3 dba_hist_snapshot
4 where begin_interval_time between trunc(sysdate-1) and trunc(sysdate))
5 select sum(executions_delta) total_execution_yesterday,sum(executions_delta)*&_tmp_pio_pe/1024/1024 total_pio_pd_TB
6 from dba_hist_sqlstat,ds
7 where snap_id between ds.min_snap and ds.max_snap
8* and sql_id='&1'
TOTAL_EXECUTION_YESTERDAY TOTAL_PIO_PD_TB
------------------------- ---------------
18689 2.39794088
As you can see daily read is 2.3 TB with 18K executions read per day from a single query reads innocently 131MB per execution. Table size is not that big
SQL> select sum(bytes)/1024/1024 total_mb,sum(blocks) total_blocks
from dba_segments where segment_name='PROBLEM_TABLE';
TOTAL_MB TOTAL_BLOCKS
---------- ------------
109 15360
SGA is way above to go via direct path read
SQL> show parameter sga_target NAME TYPE VALUE ------------------------------------ ----------- ------ sga_target big integer 9G
It looks like our table (all partitions) is in the cache. 15.2K blocks over total 15.3K blocks
SQL> select count(distinct block#) from v$bh
2 where objd in (select data_object_id from dba_objects where object_name='PROBLEM_TABLE' and object_type like 'TABLE%');
COUNT(DISTINCTBLOCK#)
---------------------
15260
So, table Size is small to fit in cache also Table is already in cache then Why we are scanning via direct path read ?
To know the answer you need to be a good blog follower like me and immediately recall
Tanels awesome post “Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_statistics_driven),
which is new change on 11.2.0.3 where our version is.
To make sure this behaviour change is the case, I run the test with _direct_read_decision_statistics_driven=TRUE (default) and then FALSE
SQL>alter session set "_direct_read_decision_statistics_driven"=TRUE; SQL>@mon_sqlid2 bcfqr520321mf DBMS_SQLTUNE.REPORT_SQL_MONITOR(SQL_ID=>'BCFQR520321MF') ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL Monitoring Report SQL Text ------------------------------ select /*+monitor*//*coskan2*/.......... from VIEW_XXXXXXXXXXXXXXX where COLUMN='ZZZ' Global Stats ========================================================================================= | Elapsed | Cpu | IO | Application | Other | Fetch | Buffer | Read | Read | | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Calls | Gets | Reqs | Bytes | ========================================================================================= | 4.68 | 1.17 | 2.76 | 0.23 | 0.53 | 2 | 21228 | 171 | 164MB | ========================================================================================= SQL Plan Monitoring Details (Plan Hash Value=2550353655) ================================================================================================================================================================================================= | Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows | Read | Read | Mem | Activity | Activity Detail | | | | | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes | (Max) | (%) | (# samples) | ================================================================================================================================================================================================= | 0 | SELECT STATEMENT | | | | 1 | +5 | 1 | 16 | | | | | | | 1 | VIEW | VIEW_XXXXXXXXXXXXXXX | 3 | 25252 | 1 | +5 | 1 | 16 | | | | | | | 2 | SORT ORDER BY | | 3 | 25252 | 1 | +5 | 1 | 16 | | | 4096 | | | | 3 | HASH GROUP BY | | 3 | 25252 | 1 | +5 | 1 | 16 | | | 1M | | | | 4 | VIEW | | 3 | 25250 | 1 | +5 | 1 | 289K | | | | | | | 5 | UNION-ALL | | | | 1 | +5 | 1 | 289K | | | | | | | 6 | FILTER | | | | 1 | +5 | 1 | 289K | | | | | | | 7 | HASH GROUP BY | | 2 | 19143 | 2 | +4 | 1 | 289K | | | 62M | | | | 8 | HASH JOIN | | 990K | 19090 | 4 | +2 | 1 | 289K | | | 30M | 25.00 | Cpu (1) | | 9 | PART JOIN FILTER CREATE | :BF0000 | 988K | 6099 | 3 | +2 | 1 | 289K | | | | | | | 10 | PARTITION LIST SINGLE | | 988K | 6099 | 3 | +2 | 1 | 289K | | | | | | | 11 | TABLE ACCESS FULL | PROBLEM_TABLE | 988K | 6099 | 4 | +1 | 1 | 289K | 57 | 55MB | | 25.00 | direct path read (1) | | 12 | PARTITION LIST SINGLE | | 988K | 6102 | 2 | +4 | 1 | 289K | | | | | | | 13 | TABLE ACCESS FULL | PROBLEM_TABLE | 988K | 6102 | 3 | +3 | 1 | 289K | 57 | 55MB | | 50.00 | direct path read (2) | | 14 | FILTER | | | | | | 1 | | | | | | | | 15 | HASH GROUP BY | | 1 | 6107 | | | 1 | | | | | | | | 16 | FILTER | | | | | | 1 | | | | | | | | 17 | NESTED LOOPS | | | | | | 1 | | | | | | | | 18 | NESTED LOOPS | | 1 | 6102 | | | 1 | | | | | | | | 19 | PARTITION LIST SINGLE | | 1 | 6099 | | | 1 | | | | | | | | 20 | TABLE ACCESS FULL | PROBLEM_TABLE | 1 | 6099 | | | 1 | | 57 | 55MB | | | | | 21 | PARTITION LIST SINGLE | | 1 | 2 | | | | | | | | | | | 22 | INDEX RANGE SCAN | PROBLEM_TABLE _IX3 | 1 | 2 | | | | | | | | | | | 23 | TABLE ACCESS BY LOCAL INDEX ROWID | PROBLEM_TABLE | 1 | 3 | | | | | | | | | | | 24 | SORT AGGREGATE | | 1 | | | | | | | | | | | | 25 | PARTITION LIST SINGLE | | 1 | 4 | | | | | | | | | | | 26 | TABLE ACCESS BY LOCAL INDEX ROWID | PROBLEM_TABLE | 1 | 4 | | | | | | | | | | | 27 | INDEX RANGE SCAN | PROBLEM_TABLE _IX | 1 | 3 | | | | | | | | | | =================================================================================================================================================================================================
This time we read 164MB and if you check line 11,13 and 20
where we access PROBLEM_TABLE via full partition scan, you can see all the direct path reads are because of that access.
When we revert the parameter to pre 11.2.0.3 behaviour.
SQL>alter session set "_direct_read_decision_statistics_driven"=false; Session altered. SQL>@mon_sqlid2 fds2rw8vuutcn DBMS_SQLTUNE.REPORT_SQL_MONITOR(SQL_ID=>'FDS2RW8VUUTCN') ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL Monitoring Report SQL Text ------------------------------ select /*+monitor*//*coskan*/........................ from VIEW_XXXXXXXXXXXXXXX where COLUMN='ZZZ' Global Stats ================================================= | Elapsed | Cpu | Other | Fetch | Buffer | | Time(s) | Time(s) | Waits(s) | Calls | Gets | ================================================= | 0.93 | 0.93 | 0.00 | 2 | 21043 | ================================================= SQL Plan Monitoring Details (Plan Hash Value=2550353655) ============================================================================================================================================================================= | Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows | Mem | Activity | Activity Detail | | | | | (Estim) | | Active(s) | Active | | (Actual) | (Max) | (%) | (# samples) | ============================================================================================================================================================================= | 0 | SELECT STATEMENT | | | | 1 | +1 | 1 | 16 | | | | | 1 | VIEW | VIEW_XXXXXXXXXXXXXXX | 3 | 25252 | 1 | +1 | 1 | 16 | | | | | 2 | SORT ORDER BY | | 3 | 25252 | 1 | +1 | 1 | 16 | 4096 | | | | 3 | HASH GROUP BY | | 3 | 25252 | 1 | +1 | 1 | 16 | 1M | | | | 4 | VIEW | | 3 | 25250 | 1 | +1 | 1 | 287K | | | | | 5 | UNION-ALL | | | | 1 | +1 | 1 | 287K | | | | | 6 | FILTER | | | | 1 | +1 | 1 | 287K | | | | | 7 | HASH GROUP BY | | 2 | 19143 | 1 | +1 | 1 | 287K | 62M | 100.00 | Cpu (1) | | 8 | HASH JOIN | | 990K | 19090 | 1 | +1 | 1 | 287K | 31M | | | | 9 | PART JOIN FILTER CREATE | :BF0000 | 988K | 6099 | 1 | +1 | 1 | 287K | | | | | 10 | PARTITION LIST SINGLE | | 988K | 6099 | 1 | +1 | 1 | 287K | | | | | 11 | TABLE ACCESS FULL | PROBLEM_TABLE | 988K | 6099 | 1 | +1 | 1 | 287K | | | | | 12 | PARTITION LIST SINGLE | | 988K | 6102 | 1 | +1 | 1 | 287K | | | | | 13 | TABLE ACCESS FULL | PROBLEM_TABLE | 988K | 6102 | 1 | +1 | 1 | 287K | | | | | 14 | FILTER | | | | | | 1 | | | | | | 15 | HASH GROUP BY | | 1 | 6107 | | | 1 | | | | | | 16 | FILTER | | | | | | 1 | | | | | | 17 | NESTED LOOPS | | | | | | 1 | | | | | | 18 | NESTED LOOPS | | 1 | 6102 | | | 1 | | | | | | 19 | PARTITION LIST SINGLE | | 1 | 6099 | | | 1 | | | | | | 20 | TABLE ACCESS FULL | PROBLEM_TABLE | 1 | 6099 | | | 1 | | | | | | 21 | PARTITION LIST SINGLE | | 1 | 2 | | | | | | | | | 22 | INDEX RANGE SCAN | PROBLEM_TABLE_IX3 | 1 | 2 | | | | | | | | | 23 | TABLE ACCESS BY LOCAL INDEX ROWID | PROBLEM_TABLE | 1 | 3 | | | | | | | | | 24 | SORT AGGREGATE | | 1 | | | | | | | | | | 25 | PARTITION LIST SINGLE | | 1 | 4 | | | | | | | | | 26 | TABLE ACCESS BY LOCAL INDEX ROWID | PROBLEM_TABLE | 1 | 4 | | | | | | | | | 27 | INDEX RANGE SCAN | PROBLEM_TABLE_IX | 1 | 3 | | | | | | | | =============================================================================================================================================================================
Suddenly, all reads are from cache and runtime dropped from 4.5 to 0.98 seconds
To make myself more sure, I checked the stats why this is happening.
Statistics think that it has 2 million rows and 55K blocks.
SQL>select num_rows,blocks,last_analyzed from dba_tables where table_name='PROBLEM_TABLE'; NUM_ROWS BLOCKS LAST_ANALYZED ---------- ---------- ------------------- 2086124 55981 18/04/2013 23:42:56
In reality oracle has 600K rows and 15K blocks
SQL>select count(*) from ptd_owner.PROBLEM_TABLE ;
COUNT(*)
----------
611960
SQL>select sum(blocks) from dba_segments where segment_name='PROBLEM_TABLE ';
SUM(BLOCKS)
-----------
15360
Our problem is definitely that When the direct path read decision driven from statistics (which are not representative) it goes wrong access path.
I did not stop here and check the all_tab_modifications to see what happened to table since last analyze
#################################################### ################TABLE MODIFICATIONS################# #################################################### TABLE_OWNER TABLE_NAME PARTITION_NAME INSERTS UPDATES DELETES TIMESTAMP TRUNCATED DROP_SEGMENTS ------------- --------------- ----------------------- ---------------- ---------------- --------- --------- ------------- PTD_OWNER PROBLEM_TABLE 0 0 2086124 18-APR-13 NO 0 PTD_OWNER PROBLEM_TABLE XXXXXXXXXXXX 0 0 267418 18-APR-13 YES 0 PTD_OWNER PROBLEM_TABLE YYYYYYY 0 0 830434 18-APR-13 YES 0 PTD_OWNER PROBLEM_TABLE ZZZ 0 0 988272 18-APR-13 YES 0
It looks like table partitions are truncated after the nightly stats gathering and starting to get load till the next truncate.
When we check with business, they also confirmed the behaviour.
So what is next,
As the filter is the partition key which needs full partition access, indexing is not a solution.
To increase sga, will help but then why would we, if the table can fit to the cache most of the time (till it grows to a certain point during the day)
To change the parameter session wide, is not possible via trigger
To change the parameter in the whole database is also not something we want to implement
Last option is to have representative statistics in the database gathered when the table is full in half way and then lock them. (or set the stats manually like Tanel)
By this way we can avoid using undocumented parameters, we can still get buffered reads till a certain point throughout the day (oracle may decide to change the path if the table gets bigger in size)
Well thats about it. Finally I manage to write a blog after 11 months right before a year ended (banking is killing productivity and increasing spam comments)
I have to admit, when the scripts are ready, it takes about 10 mins to sort the problem but it took 2 hours to share it , so I hope you enjoyed it.
Morals of the story are check your systems regularly, read blogs and god bless Tanel:)
