Note: This post is about diagnosing real life scenario caused by direct path decision change on 11.2.0.3 which already covered by Tanel Poder on his awesome Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_statistics_driven) post
If you already read and know how to approach the scenario you can easily skip this post and wait another year till I write another post while I’m working at the bank.
Story
Number of Oracle database in our estate is a high number and they are on shared storage arrays which means if one of them is naughty on I/O it has a potential impact on others.
To be proactive, from Time to time I run bunch of queries on all of the databases in a loop to check what is the wait profiles from dba_hist_active_sess_history and v$active_session_history
to see if we can optimize some of the work and offload those unnecessary I/O to unused CPU by utilizing more memory or getting rid of direct path reads when necessary.
Query is really simple and it works for my needs.
All I do is on the first query get the count(*) from ASH grouped by event. By this way I can get the overall wait profile within the ASH.
For the second query get count(*) for wait_class I/O grouped by sql_id ,sql_plan_hash_value,force_matching_signature,event. By this sql I can get the top sql_id for I/O waits (not top elapsed which is all matters for end users)
I’m not saying this is the right approach but practically it works for the purpose.
SQL: RR columns are ratio to report
prompt ************************************ prompt **** ASH OVERALL WAIT PROFILE prompt ************************************ SELECT MIN(sample_time) min_ash_available,sysdate-MIN(sample_time) available duration FROM v$active_session_history; select * from ( select NVL(event,'CPU') event,count(*), round((ratio_to_report(sum(1)) over ()*100),1) rr from gv$active_session_history WHERE user_id<>0 AND sample_time<trunc(SYSDATE+1) AND sample_time>trunc(sysdate-1) group by event order by 2 desc ) where rownum<10; prompt ************************************ prompt **** ASH I/O by SQL_ID prompt ************************************ COLUMN force_matching_signature FOR 999999999999999999999999999 select * from ( select sql_id ,sql_plan_hash_value,force_matching_signature, NVL(event,'CPU') Event, count(*), round((ratio_to_report(sum(1)) over ()*100),1) rr from gv$active_session_history where 1=1 AND wait_class LIKE '%I/O' --AND event IS null and user_id<>0 AND sql_id IS NOT NULL group by sql_id,sql_plan_hash_value,event,force_matching_signature order by 5 desc ) where rownum<30;
For one of the databases which is not looked after by me the output was like this
************************************ **** ASH OVERALL ************************************ EVENT COUNT(*) RR --------------------------------------------- ---------- ------ CPU 109648 63.1 direct path read 35914 20.7 log file sync 26136 15.1 enq: KO - fast object checkpoint 1089 .6 db file sequential read 263 .2 control file sequential read 188 .1 log buffer space 149 .1 asynch descriptor resize 112 .1 enq: HW - contention 31 .0 ************************************ **** ASH I/O ************************************ SQL_ID SQL_PLAN_HASH_VALUE FORCE_MATCHING_SIGNATURE EVENT COUNT(*) RR --------------- ------------------- ---------------------------- --------------------------------------------- ---------- ------ 804t23t68dbmw 1520416513 10621052756171684880 direct path read 11689 32.1 9wthcvamy608f 1764147946 10621052756171684880 direct path read 11059 30.4 7scfmpxgv5mu5 3641059178 6835008959050889988 direct path read 6448 17.7 804t23t68dbmw 2550353655 10621052756171684880 direct path read 3060 8.4 9wthcvamy608f 2550353655 10621052756171684880 direct path read 2787 7.7 9wthcvamy608f 3119703268 10621052756171684880 direct path read 777 2.1 1q21k2wqgw8m3 3586661573 29420080321854211 control file sequential read 135 .4
From this output, we can say, for the period I checked, overall sessions are %63 on CPU and %20 on direct path read. If we checke the ASH I/O we can see same sqls with different plans (804t23t68dbmw,9wthcvamy608f) are contributing in total 77% of that %20 direct path read. Then next step check how much I/O per run the sql (804t23t68dbmw) is doing
SQL> l 1 select 2 plan_hash_value plan_hash, 3 executions, 4 PX_SERVERS_EXECUTIONS pxe, 5 cpu_time/1000 cpu_ms, 6 elapsed_time/1000 ela_ms, 7 (elapsed_time/1000000)/decode(nvl(executions,0),0,1,executions) avg_ela, 8 IO_INTERCONNECT_BYTES/1024/1024/1024 io_inter_gb, 9 PHYSICAL_READ_BYTES/1024/1024/1024 PHYSICAL_READ_GB, 10 PHYSICAL_READ_BYTES/1024/1024/decode(nvl(executions,0),0,1,executions) PIO_MB_PE, 11 buffer_gets/decode(nvl(executions,0),0,1,executions) LIOS_PE, 12 disk_reads/decode(nvl(executions,0),0,1,executions) PIOS_PE 13 from 14 gv$sql 15 where 16 sql_id = ('&1') 17 order BY 18 inst_id, 19 sql_id, 20 hash_value, 21* child_number SQL> PLAN_HASH EXECUTIONS PXE CPU_MS ELA_MS AVG_ELA IO_INTER_GB PHYSICAL_READ_GB PIO_MB_PE LIOS_PE PIOS_PE ---------- ---------- ---------- ---------- ---------- ---------- ---------------- ---------------- ---------------- ------------- ------------- 2550353655 45 0 88018.62 147748.384 3.28329742 9.37 9.37 213.14 27402.60 27282.00 2550353655 17142 0 16035460.2 34342974 2.00344032 2208.88 2208.71 131.94 16973.79 16901.79
131MB per run (PIO_MB_PE) is looking small at first, but If you check the total reads and executions it looks a bit scary (Note: I use I/O_INTER_GB as we have exadata environments as well, in this case it is not relevant) I checked how many times in a day it runs and how much total being read
SQL> l 1 with ds as ( 2 select min(snap_id) min_snap,max(snap_id) max_snap from 3 dba_hist_snapshot 4 where begin_interval_time between trunc(sysdate-1) and trunc(sysdate)) 5 select sum(executions_delta) total_execution_yesterday,sum(executions_delta)*&_tmp_pio_pe/1024/1024 total_pio_pd_TB 6 from dba_hist_sqlstat,ds 7 where snap_id between ds.min_snap and ds.max_snap 8* and sql_id='&1' TOTAL_EXECUTION_YESTERDAY TOTAL_PIO_PD_TB ------------------------- --------------- 18689 2.39794088
As you can see daily read is 2.3 TB with 18K executions read per day from a single query reads innocently 131MB per execution. Table size is not that big
SQL> select sum(bytes)/1024/1024 total_mb,sum(blocks) total_blocks from dba_segments where segment_name='PROBLEM_TABLE'; TOTAL_MB TOTAL_BLOCKS ---------- ------------ 109 15360
SGA is way above to go via direct path read
SQL> show parameter sga_target NAME TYPE VALUE ------------------------------------ ----------- ------ sga_target big integer 9G
It looks like our table (all partitions) is in the cache. 15.2K blocks over total 15.3K blocks
SQL> select count(distinct block#) from v$bh 2 where objd in (select data_object_id from dba_objects where object_name='PROBLEM_TABLE' and object_type like 'TABLE%'); COUNT(DISTINCTBLOCK#) --------------------- 15260
So, table Size is small to fit in cache also Table is already in cache then Why we are scanning via direct path read ?
To know the answer you need to be a good blog follower like me and immediately recall
Tanels awesome post “Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_statistics_driven),
which is new change on 11.2.0.3 where our version is.
To make sure this behaviour change is the case, I run the test with _direct_read_decision_statistics_driven=TRUE (default) and then FALSE
SQL>alter session set "_direct_read_decision_statistics_driven"=TRUE; SQL>@mon_sqlid2 bcfqr520321mf DBMS_SQLTUNE.REPORT_SQL_MONITOR(SQL_ID=>'BCFQR520321MF') ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL Monitoring Report SQL Text ------------------------------ select /*+monitor*//*coskan2*/.......... from VIEW_XXXXXXXXXXXXXXX where COLUMN='ZZZ' Global Stats ========================================================================================= | Elapsed | Cpu | IO | Application | Other | Fetch | Buffer | Read | Read | | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Calls | Gets | Reqs | Bytes | ========================================================================================= | 4.68 | 1.17 | 2.76 | 0.23 | 0.53 | 2 | 21228 | 171 | 164MB | ========================================================================================= SQL Plan Monitoring Details (Plan Hash Value=2550353655) ================================================================================================================================================================================================= | Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows | Read | Read | Mem | Activity | Activity Detail | | | | | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes | (Max) | (%) | (# samples) | ================================================================================================================================================================================================= | 0 | SELECT STATEMENT | | | | 1 | +5 | 1 | 16 | | | | | | | 1 | VIEW | VIEW_XXXXXXXXXXXXXXX | 3 | 25252 | 1 | +5 | 1 | 16 | | | | | | | 2 | SORT ORDER BY | | 3 | 25252 | 1 | +5 | 1 | 16 | | | 4096 | | | | 3 | HASH GROUP BY | | 3 | 25252 | 1 | +5 | 1 | 16 | | | 1M | | | | 4 | VIEW | | 3 | 25250 | 1 | +5 | 1 | 289K | | | | | | | 5 | UNION-ALL | | | | 1 | +5 | 1 | 289K | | | | | | | 6 | FILTER | | | | 1 | +5 | 1 | 289K | | | | | | | 7 | HASH GROUP BY | | 2 | 19143 | 2 | +4 | 1 | 289K | | | 62M | | | | 8 | HASH JOIN | | 990K | 19090 | 4 | +2 | 1 | 289K | | | 30M | 25.00 | Cpu (1) | | 9 | PART JOIN FILTER CREATE | :BF0000 | 988K | 6099 | 3 | +2 | 1 | 289K | | | | | | | 10 | PARTITION LIST SINGLE | | 988K | 6099 | 3 | +2 | 1 | 289K | | | | | | | 11 | TABLE ACCESS FULL | PROBLEM_TABLE | 988K | 6099 | 4 | +1 | 1 | 289K | 57 | 55MB | | 25.00 | direct path read (1) | | 12 | PARTITION LIST SINGLE | | 988K | 6102 | 2 | +4 | 1 | 289K | | | | | | | 13 | TABLE ACCESS FULL | PROBLEM_TABLE | 988K | 6102 | 3 | +3 | 1 | 289K | 57 | 55MB | | 50.00 | direct path read (2) | | 14 | FILTER | | | | | | 1 | | | | | | | | 15 | HASH GROUP BY | | 1 | 6107 | | | 1 | | | | | | | | 16 | FILTER | | | | | | 1 | | | | | | | | 17 | NESTED LOOPS | | | | | | 1 | | | | | | | | 18 | NESTED LOOPS | | 1 | 6102 | | | 1 | | | | | | | | 19 | PARTITION LIST SINGLE | | 1 | 6099 | | | 1 | | | | | | | | 20 | TABLE ACCESS FULL | PROBLEM_TABLE | 1 | 6099 | | | 1 | | 57 | 55MB | | | | | 21 | PARTITION LIST SINGLE | | 1 | 2 | | | | | | | | | | | 22 | INDEX RANGE SCAN | PROBLEM_TABLE _IX3 | 1 | 2 | | | | | | | | | | | 23 | TABLE ACCESS BY LOCAL INDEX ROWID | PROBLEM_TABLE | 1 | 3 | | | | | | | | | | | 24 | SORT AGGREGATE | | 1 | | | | | | | | | | | | 25 | PARTITION LIST SINGLE | | 1 | 4 | | | | | | | | | | | 26 | TABLE ACCESS BY LOCAL INDEX ROWID | PROBLEM_TABLE | 1 | 4 | | | | | | | | | | | 27 | INDEX RANGE SCAN | PROBLEM_TABLE _IX | 1 | 3 | | | | | | | | | | =================================================================================================================================================================================================
This time we read 164MB and if you check line 11,13 and 20
where we access PROBLEM_TABLE via full partition scan, you can see all the direct path reads are because of that access.
When we revert the parameter to pre 11.2.0.3 behaviour.
SQL>alter session set "_direct_read_decision_statistics_driven"=false; Session altered. SQL>@mon_sqlid2 fds2rw8vuutcn DBMS_SQLTUNE.REPORT_SQL_MONITOR(SQL_ID=>'FDS2RW8VUUTCN') ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL Monitoring Report SQL Text ------------------------------ select /*+monitor*//*coskan*/........................ from VIEW_XXXXXXXXXXXXXXX where COLUMN='ZZZ' Global Stats ================================================= | Elapsed | Cpu | Other | Fetch | Buffer | | Time(s) | Time(s) | Waits(s) | Calls | Gets | ================================================= | 0.93 | 0.93 | 0.00 | 2 | 21043 | ================================================= SQL Plan Monitoring Details (Plan Hash Value=2550353655) ============================================================================================================================================================================= | Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows | Mem | Activity | Activity Detail | | | | | (Estim) | | Active(s) | Active | | (Actual) | (Max) | (%) | (# samples) | ============================================================================================================================================================================= | 0 | SELECT STATEMENT | | | | 1 | +1 | 1 | 16 | | | | | 1 | VIEW | VIEW_XXXXXXXXXXXXXXX | 3 | 25252 | 1 | +1 | 1 | 16 | | | | | 2 | SORT ORDER BY | | 3 | 25252 | 1 | +1 | 1 | 16 | 4096 | | | | 3 | HASH GROUP BY | | 3 | 25252 | 1 | +1 | 1 | 16 | 1M | | | | 4 | VIEW | | 3 | 25250 | 1 | +1 | 1 | 287K | | | | | 5 | UNION-ALL | | | | 1 | +1 | 1 | 287K | | | | | 6 | FILTER | | | | 1 | +1 | 1 | 287K | | | | | 7 | HASH GROUP BY | | 2 | 19143 | 1 | +1 | 1 | 287K | 62M | 100.00 | Cpu (1) | | 8 | HASH JOIN | | 990K | 19090 | 1 | +1 | 1 | 287K | 31M | | | | 9 | PART JOIN FILTER CREATE | :BF0000 | 988K | 6099 | 1 | +1 | 1 | 287K | | | | | 10 | PARTITION LIST SINGLE | | 988K | 6099 | 1 | +1 | 1 | 287K | | | | | 11 | TABLE ACCESS FULL | PROBLEM_TABLE | 988K | 6099 | 1 | +1 | 1 | 287K | | | | | 12 | PARTITION LIST SINGLE | | 988K | 6102 | 1 | +1 | 1 | 287K | | | | | 13 | TABLE ACCESS FULL | PROBLEM_TABLE | 988K | 6102 | 1 | +1 | 1 | 287K | | | | | 14 | FILTER | | | | | | 1 | | | | | | 15 | HASH GROUP BY | | 1 | 6107 | | | 1 | | | | | | 16 | FILTER | | | | | | 1 | | | | | | 17 | NESTED LOOPS | | | | | | 1 | | | | | | 18 | NESTED LOOPS | | 1 | 6102 | | | 1 | | | | | | 19 | PARTITION LIST SINGLE | | 1 | 6099 | | | 1 | | | | | | 20 | TABLE ACCESS FULL | PROBLEM_TABLE | 1 | 6099 | | | 1 | | | | | | 21 | PARTITION LIST SINGLE | | 1 | 2 | | | | | | | | | 22 | INDEX RANGE SCAN | PROBLEM_TABLE_IX3 | 1 | 2 | | | | | | | | | 23 | TABLE ACCESS BY LOCAL INDEX ROWID | PROBLEM_TABLE | 1 | 3 | | | | | | | | | 24 | SORT AGGREGATE | | 1 | | | | | | | | | | 25 | PARTITION LIST SINGLE | | 1 | 4 | | | | | | | | | 26 | TABLE ACCESS BY LOCAL INDEX ROWID | PROBLEM_TABLE | 1 | 4 | | | | | | | | | 27 | INDEX RANGE SCAN | PROBLEM_TABLE_IX | 1 | 3 | | | | | | | | =============================================================================================================================================================================
Suddenly, all reads are from cache and runtime dropped from 4.5 to 0.98 seconds
To make myself more sure, I checked the stats why this is happening.
Statistics think that it has 2 million rows and 55K blocks.
SQL>select num_rows,blocks,last_analyzed from dba_tables where table_name='PROBLEM_TABLE'; NUM_ROWS BLOCKS LAST_ANALYZED ---------- ---------- ------------------- 2086124 55981 18/04/2013 23:42:56
In reality oracle has 600K rows and 15K blocks
SQL>select count(*) from ptd_owner.PROBLEM_TABLE ; COUNT(*) ---------- 611960 SQL>select sum(blocks) from dba_segments where segment_name='PROBLEM_TABLE '; SUM(BLOCKS) ----------- 15360
Our problem is definitely that When the direct path read decision driven from statistics (which are not representative) it goes wrong access path.
I did not stop here and check the all_tab_modifications to see what happened to table since last analyze
#################################################### ################TABLE MODIFICATIONS################# #################################################### TABLE_OWNER TABLE_NAME PARTITION_NAME INSERTS UPDATES DELETES TIMESTAMP TRUNCATED DROP_SEGMENTS ------------- --------------- ----------------------- ---------------- ---------------- --------- --------- ------------- PTD_OWNER PROBLEM_TABLE 0 0 2086124 18-APR-13 NO 0 PTD_OWNER PROBLEM_TABLE XXXXXXXXXXXX 0 0 267418 18-APR-13 YES 0 PTD_OWNER PROBLEM_TABLE YYYYYYY 0 0 830434 18-APR-13 YES 0 PTD_OWNER PROBLEM_TABLE ZZZ 0 0 988272 18-APR-13 YES 0
It looks like table partitions are truncated after the nightly stats gathering and starting to get load till the next truncate.
When we check with business, they also confirmed the behaviour.
So what is next,
As the filter is the partition key which needs full partition access, indexing is not a solution.
To increase sga, will help but then why would we, if the table can fit to the cache most of the time (till it grows to a certain point during the day)
To change the parameter session wide, is not possible via trigger
To change the parameter in the whole database is also not something we want to implement
Last option is to have representative statistics in the database gathered when the table is full in half way and then lock them. (or set the stats manually like Tanel)
By this way we can avoid using undocumented parameters, we can still get buffered reads till a certain point throughout the day (oracle may decide to change the path if the table gets bigger in size)
Well thats about it. Finally I manage to write a blog after 11 months right before a year ended (banking is killing productivity and increasing spam comments)
I have to admit, when the scripts are ready, it takes about 10 mins to sort the problem but it took 2 hours to share it , so I hope you enjoyed it.
Morals of the story are check your systems regularly, read blogs and god bless Tanel:)
Nice post, Thanks.
Comment by Amos — April 19, 2013 @ 7:59 pm
Another option can be a creating sql profile/patch with hints index_stats/table_stats, i’ve test it recently: http://orasql.org/2013/03/18/example-of-controlling-direct-path-reads-decision-through-sql-profile-hints/
Comment by Sayan Malakshinov — April 19, 2013 @ 8:32 pm
I read your post yesterday indeed it is another method but I usually try to use profile as last resort due to hard maintenance especially when developer doesn’t know anything about them or does not handover them to the next guy.
Thanks for sharing the link
Comment by coskan — April 19, 2013 @ 8:44 pm
You are absolutely right! It was just an test
Comment by Sayan Malakshinov — April 19, 2013 @ 9:08 pm
Nice case study, dude.
Blog more.
Comment by Dom Brooks — April 20, 2013 @ 7:36 am
Hi Coskan,
Nice post …Thanks for sharing. I believe you are missing here “sum(executions_delta)*&_tmp_pio_pe/1024/1024 total_pio_pd_TB”
Comment by Anand — April 23, 2013 @ 10:55 am
actually they are one single script (that one and the one above it) so _tmp_pio_pe was accepted as new value from the script above.
Comment by coskan — April 23, 2013 @ 11:03 am
Hi Coskan,
That is a nice post.
The presence of the enq: KO – fast object checkpoint wait event in this case indicates that the direct path read is preceded by a DBWR process which writes dirty buffers (of the table to be read) from the buffer cache to the disk. This is to ensure reading clean blocks physically. And it explains the partition truncate your application is doing.
This enq: KO – fast object checkpoint represents an extra penalty to be added to the direct path read.
I completely agree that having representative statistics is the way to go with here. But, what if you are using exadata and you want smart scan to kick off? They will not if reads are not via direct path read at least. And this is why sometimes(in exadata) you may have to go the other way: change the statistics to make Oracle (not the CBO) thinking that the object is big enough to be read into the buffer cache and hence go for a direct path read. At least I see Tanel Poder hacking the number of blocks in one of his hacking sessions about exadata to allow smart scan kicking off
Best regards
Mohamed
Comment by hourim — May 6, 2013 @ 8:36 am
Hi Mohamed,
Thanks for your informative comment and pointing out the fast object checkpoint and truncate relationship.
I also think you are absolutely right about Exadata. I think the feature is really good and needed for Exadata (possibly it is implemented for Exadata and added to normal systems as bonus). Having said that when you consider the number of executions and amount of work done for such a small activity it is still sub-optimal on exadata as well , which is where Oracle is doing wrong they should be checking the number of executions rather than blindly thinking it is a query which pollutes the cache. In my opinion, For this many execution even smart scan is unnecessary as it is not faster than cached reads and why would you make your storage cells work like crazy just because Oracle did not think the side effects of their new features
Comment by coskan — May 6, 2013 @ 5:37 pm
Hi Coskan,
Nice Post, I think there is a typo “When we revert the parameter to pre 11.2.0.3 behaviour”, shouldn’t it be 11.2.0.2.
Comment by Aveek — September 7, 2013 @ 4:46 pm