Todays story is about, how dynamic sampling can effect your plan stability during upgrade and how you can recover from it as long as you know what your are doing.
A little bit of background, I’m currently in the middle of migrate and upgrade cycle for 12TB warehouse from 10.2.0.3 Solaris to 11.2.0.3 (yet) Linux.
Plan stability on this system is maintained by locking the main driving table statistics and hinting the queries in case CBO does not pick the right driving table.
The system uses temporary tables a lot for staging operations and they rely on dynamic sampling Level 2 which 2 is enough. They also use parallelism for the heavy batch operations. Good part of the system is, since they use partitioning heavily they can simulate the close to real workload on cut down version of the database. What we are doing is getting the cut down version 10.2.0.3 export from solaris imported into 11.2.0.3 linux and run the batches for the close of business date with same memory and other settings. Initial load test done on OFE(optimizer_features_enable)=11.2.0.3 and we saw many parallel queries started to suffer. Our aim was not to check what is slow just collect the necessary data so we did not tune anything. Then we did another test on OFE=10.2.0.3 and saw that 10.2.0.3 did way better for parallel queries but it also suffer for other issues which is not subject of the current post. We have many other runs with different options for OFE and new stats but they will also be covered later if I come up something interesting. Todays story is why our parallel queries suddenly went bad on OFE=11.2.0.3
After this background now I can start the actual story.
Once I check the plans I saw something interesting. Oracle suddenly started to use dynamic sampling 7 without any manual setting. I did ask it on twitter and thankfully Timur Akhmadeev pointed me the right direction on the manual (RTFM rocks)
About where the dynamic sampling is used
Oracle 10GR2 documentation says
14.5.6 Estimating Statistics with Dynamic Sampling This dynamic sampling feature is controlled by the OPTIMIZER_DYNAMIC_SAMPLING parameter. For dynamic sampling to automatically gather the necessary statistics, this parameter should be set to a value of 2 or higher. The default value is 2. See "Dynamic Sampling Levels" for information about the sampling levels that can be set.
Oracle 11GR2 documentation says
13.6.2.2 When the Optimizer Uses Dynamic Sampling During compilation, the optimizer decides whether to use dynamic sampling based on a number of factors, including whether the statements use parallel processing or serial processing. For parallel statements, the optimizer automatically decides whether to use dynamic sampling and which level to use. The decision depends on the size of the tables and the complexity of the predicates. The optimizer expects parallel statements to be resource-intensive, so the additional overhead at compile time is worth it to ensure the best plan. The database ignores the OPTIMIZER_DYNAMIC_SAMPLING setting unless set to a nondefault value, in which case the value is honored. For serially processed SQL statements, the dynamic sampling level depends on the value of the OPTIMIZER_DYNAMIC_SAMPLING parameter and is not triggered automatically by the optimizer. Serial statements are typically short-running, so that any overhead at compile time could have a huge impact on their performance.
Did you spot the difference ? on 10GR2 the decision of the level completely depends on the optimizer_dynamic_sampling parameter (system,session,hint level),
however on 11GR2 for parallel statements optimizer “automatically” decides whether to use dynamic sampling and which level to use.
Since 2005 where my Oracle career starts, I learned one thing, When there is something “automagically” adjusted and decided, there is always big possibility for less stability.
Below is the difference in a simple case where I create a table, do not gather statistics, keep optimizer_dynamic_sampling=2 and run explain plan with parallel hint.
HR@ORACOSL> create table ds_test as with generator as ( select--+ materialize rownum id from dual connect by level explain plan for select * from ds_test; Explained. HR@ORACOSL> @e PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 4280396642 ----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 185M| 15G| 469K (1)| 01:33:57 | | 1 | TABLE ACCESS FULL| DS_TEST | 185M| 15G| 469K (1)| 01:33:57 | ----------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): ------------------------------------------------------------- 1 - SEL$1 / DS_TEST@SEL$1 Note ----- - dynamic sampling used for this statement (level=2) 17 rows selected. -----WITH PARALLELISM HR@ORACOSL> explain plan for select /*+parallel*/ * from ds_test; Explained. HR@ORACOSL> @e PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 791174141 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 92M| 7654M| 260K (1)| 00:52:02 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM)| :TQ10000 | 92M| 7654M| 260K (1)| 00:52:02 | Q1,00 | P->S | QC (RAND) | | 3 | PX BLOCK ITERATOR | | 92M| 7654M| 260K (1)| 00:52:02 | Q1,00 | PCWC | | | 4 | TABLE ACCESS FULL| DS_TEST | 92M| 7654M| 260K (1)| 00:52:02 | Q1,00 | PCWP | | -------------------------------------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): ------------------------------------------------------------- 1 - SEL$1 4 - SEL$1 / DS_TEST@SEL$1 Note ----- - dynamic sampling used for this statement (level=6) - automatic DOP: skipped because of IO calibrate statistics are missing 22 rows selected. -----WITH OFE=10.2.0.3 HR@ORACOSL> alter session set optimizer_features_enable='10.2.0.3'; Session altered. HR@ORACOSL> explain plan for select /*+parallel*/ * from ds_test; Explained. HR@ORACOSL> @e PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 791174141 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 108M| 8965M| 260K (1)| 00:08:41 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM)| :TQ10000 | 108M| 8965M| 260K (1)| 00:08:41 | Q1,00 | P->S | QC (RAND) | | 3 | PX BLOCK ITERATOR | | 108M| 8965M| 260K (1)| 00:08:41 | Q1,00 | PCWC | | | 4 | TABLE ACCESS FULL| DS_TEST | 108M| 8965M| 260K (1)| 00:08:41 | Q1,00 | PCWP | | -------------------------------------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): ------------------------------------------------------------- 1 - SEL$1 4 - SEL$1 / DS_TEST@SEL$1 Note ----- - dynamic sampling used for this statement (level=2) - automatic DOP: Computed Degree of Parallelism is 2 because of degree limit 22 rows selected.
As you can see Oracle adjusted my default dynamic sampling level to 6 on OFE 11.2.0.3 when the statement is parallel and keeps it as Level 2 when statement is serial or OFE is lower than 11.2.0.1. Since this is a simple case there is no effect but when you have complex joins which you are sure about the plan you can suddenly start to see changes and got heavily hit by the feature. I can assure, I had the issue with tables that has got statistics as well so it is not the case for tables that need dynamic sampling.
Since I introduced you with the feature, next question is how can we control the feature?. I think optimizer guys did a mistake here and did not give any control via an optimizer parameter. How do I know it ? If you follow what I have done in the optimizer features post you can see there is no controlling on this feature via a parameter. So we only have optimizer bugfixes as controller.
Lets check the bugfixes related with dynamic sampling.
For this I use the sql below for bugfix.sql
column sql_feaure for a34 column ofe for a10 column description for a64 select bugno,optimizer_feature_enable ofe ,sql_feature,description,value from v$system_fix_control where (sql_feature like upper('%&1%') or upper(description) like '%&1%') order by 2 desc nulls last,1 asc;
Results for calling it with search key as ‘dynamic’
HR@ORACOSL> @bugfix dynamic BUGNO OFE SQL_FEATURE DESCRIPTION VALUE ---------- ---------- ---------------------------------------------------------------- ---------------------------------------------------------------- ---------- 12399886 11.2.0.3 QKSFM_DYNAMIC_SAMPLING_12399886 update the index clustering factor (DS) if statistics do not exi 1 6408301 11.2.0.2 QKSFM_DYNAMIC_SAMPLING_6408301 use recursive idx selectivity for partitioned table as well 1 8767442 11.2.0.2 QKSFM_DYNAMIC_SAMPLING_8767442 compute NDV for all columns in a multi-column join key using DS 1 9272549 11.2.0.2 QKSFM_DYNAMIC_SAMPLING_9272549 do not sample columns which have statistics 1 7452863 11.2.0.1 QKSFM_DYNAMIC_SAMPLING_7452863 adjust DS level for large parallel tables based on the size 1 6766962 11.1.0.7 QKSFM_DYNAMIC_SAMPLING_6766962 sample size in DS is at most equal to the partition sample size 1 6956212 10.2.0.5 QKSFM_DYNAMIC_SAMPLING_6956212 allow dynamic sampling when OR-expansion is used 1 7592673 10.2.0.5 QKSFM_DYNAMIC_SAMPLING_7592673 do not use dyn sampling index selectivity if not all keys sample 1 6608941 QKSFM_DYNAMIC_SAMPLING_6608941 allow dynamic sampling on fixed tables 0 6708183 QKSFM_DYNAMIC_SAMPLING_6708183 allow dynamic sampling on table functions 0 10 rows selected.
If you look carefully to the ones which are introduced on 11.2.0.* you can see the 7452863 with description adjust DS level for large parallel tables based on the size.
Now time to try to disable the feature ( alter session set “_fix_control”=’7452863:OFF’;) and see what the effect is
HR@ORACOSL> alter session set optimizer_features_enable='11.2.0.3'; Session altered. HR@ORACOSL> alter session set "_fix_control"='7452863:OFF'; Session altered. HR@ORACOSL> explain plan for select /*+parallel*/ * from ds_test; Explained. HR@ORACOSL> @e PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 791174141 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 108M| 8965M| 260K (1)| 00:52:03 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM)| :TQ10000 | 108M| 8965M| 260K (1)| 00:52:03 | Q1,00 | P->S | QC (RAND) | | 3 | PX BLOCK ITERATOR | | 108M| 8965M| 260K (1)| 00:52:03 | Q1,00 | PCWC | | | 4 | TABLE ACCESS FULL| DS_TEST | 108M| 8965M| 260K (1)| 00:52:03 | Q1,00 | PCWP | | -------------------------------------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): ------------------------------------------------------------- 1 - SEL$1 4 - SEL$1 / DS_TEST@SEL$1 Note ----- - dynamic sampling used for this statement (level=2) - automatic DOP: skipped because of IO calibrate statistics are missing 22 rows selected.
As you can see dynamic sampling is back to the default level 2. You can use this fix control till oracle comes up with a better idea for your parallel queries where you want plan stability and you are happy with your defauly dynamic sampling level.
Attention I must also warn the a bit of unexpected behaviour that this automagical adjustment is working a bit weird. When I set the dynamic sampling to level higher than 2 Oracle accepts the new level as adjusted value like below
Update: After reading the blog posts Dominic shared in comments, I ‘m now sure feature is only active with default optimizer_dynamic_sampling=2
HR@ORACOSL> alter system set optimizer_dynamic_sampling=3; System altered. HR@ORACOSL> explain plan for select /*+parallel*/ * from ds_test; Explained. HR@ORACOSL> @e PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 791174141 -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 108M| 8965M| 260K (1)| 00:52:03 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM)| :TQ10000 | 108M| 8965M| 260K (1)| 00:52:03 | Q1,00 | P->S | QC (RAND) | | 3 | PX BLOCK ITERATOR | | 108M| 8965M| 260K (1)| 00:52:03 | Q1,00 | PCWC | | | 4 | TABLE ACCESS FULL| DS_TEST | 108M| 8965M| 260K (1)| 00:52:03 | Q1,00 | PCWP | | -------------------------------------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): ------------------------------------------------------------- 1 - SEL$1 4 - SEL$1 / DS_TEST@SEL$1 Note ----- - dynamic sampling used for this statement (level=3) - automatic DOP: skipped because of IO calibrate statistics are missing
10053 trace has got the info below when it adjust dynamic_sampling to 6 when the default is 2
Dynamic sampling level auto-adjusted from 6 to 6 *** 2012-05-31 20:23:02.408 ** Performing dynamic sampling initial checks. ** ** Dynamic sampling initial checks returning TRUE (level = 6). ** Dynamic sampling updated table stats.: blocks=1726021
I still don’t get why it says from 6 to 6 normally it should be from 2 to 6 ()
and info below when we set it to level 3 either session or system level
************************************* PARAMETERS WITH ALTERED VALUES ****************************** Compilation Environment Dump sqlstat_enabled = true optimizer_dynamic_sampling = 3 statistics_level = all parallel_query_default_dop = 2 parallel_hinted = default _optimizer_use_feedback = false Bug Fix Control Environment *** 2012-05-31 20:25:34.191 ** Performing dynamic sampling initial checks. ** ** Dynamic sampling initial checks returning TRUE (level = 3). ** Dynamic sampling updated table stats.: blocks=1726021
With this behaviour I started to think it only kicks in when dynamic sampling is level 2 but I need to confirm it. I will update the post if I get something different than my final thought.
For more information you can check the MOS note below
Different Level for Dynamic Sampling used than the Specified Level [ID 1102413.1]
Bugs:
Bug 11841491 : POOR PERFORMANCE WITH FIX 7452863
Bug 10623519 : SQL3 – INCORRECT CARDINALITY ESTIMATES IN 11G WITH MULTIPLE FILTERS ON SAME TABLE
Bug 14059307 : AUTO-ADJUST IMPOSED DYNAMIC SAMPLING INACCURATE FOR LARGE PARTITIONED TABLE
Bug 7832981 : WITH FIX 7676927 MERGE SQL TAKES LONG AND CONSUMES LOT OF TEMP SPACE
Update:
Additional more scientific info is in the blog posts below (Thanks to Dominic’s comment )
Randolf Geist touches on it here:
http://allthingsoracle.com/dynamic-sampling-ii-controlling-the-activity-part-2/
Also Jean Pierre-Dijcks here:
https://blogs.oracle.com/datawarehousing/entry/explaining_explain_plan_note