Coskan’s Approach to Oracle

May 31, 2012

Plan stability through Upgrade to 11G-Why is my plan changed?-Auto Adjusted Dynamic Sampling

Filed under: CBO, Performance, Plan Stability, Upgrade — Tags: — coskan @ 7:59 pm

Todays story is about, how dynamic sampling can effect your plan stability during upgrade and how you can recover from it as long as you know what your are doing.

A little bit of background, I’m currently in the middle of migrate and upgrade cycle for 12TB warehouse from 10.2.0.3 Solaris to 11.2.0.3 (yet) Linux.
Plan stability on this system is maintained by locking the main driving table statistics and hinting the queries in case CBO does not pick the right driving table.
The system uses temporary tables a lot for staging operations and they rely on dynamic sampling Level 2 which 2 is enough. They also use parallelism for the heavy batch operations. Good part of the system is, since they use partitioning heavily they can simulate the close to real workload on cut down version of the database. What we are doing is getting the cut down version 10.2.0.3 export from solaris imported into 11.2.0.3 linux and run the batches for the close of business date with same memory and other settings. Initial load test done on OFE(optimizer_features_enable)=11.2.0.3 and we saw many parallel queries started to suffer. Our aim was not to check what is slow just collect the necessary data so we did not tune anything. Then we did another test on OFE=10.2.0.3 and saw that 10.2.0.3 did way better for parallel queries but it also suffer for other issues which is not subject of the current post. We have many other runs with different options for OFE and new stats but they will also be covered later if I come up something interesting. Todays story is why our parallel queries suddenly went bad on OFE=11.2.0.3

After this background now I can start the actual story.

Once I check the plans I saw something interesting. Oracle suddenly started to use dynamic sampling 7 without any manual setting. I did ask it on twitter and thankfully Timur Akhmadeev pointed me the right direction on the manual (RTFM rocks)

About where the dynamic sampling is used

Oracle 10GR2 documentation says

14.5.6 Estimating Statistics with Dynamic Sampling

This dynamic sampling feature is controlled by the OPTIMIZER_DYNAMIC_SAMPLING parameter.
For dynamic sampling to automatically gather the necessary statistics, this parameter should be set to a value of 2 or higher.
The default value is 2. See "Dynamic Sampling Levels" for information about the sampling levels that can be set.

Oracle 11GR2 documentation says

13.6.2.2 When the Optimizer Uses Dynamic Sampling

During compilation, the optimizer decides whether to use dynamic sampling based on a number of factors,
including whether the statements use parallel processing or serial processing.

For parallel statements, the optimizer automatically decides whether to use dynamic sampling and which level to use.
The decision depends on the size of the tables and the complexity of the predicates. The optimizer expects parallel statements to be resource-intensive,
so the additional overhead at compile time is worth it to ensure the best plan. The database ignores the OPTIMIZER_DYNAMIC_SAMPLING setting
unless set to a nondefault value, in which case the value is honored.

For serially processed SQL statements, the dynamic sampling level depends on the value of the OPTIMIZER_DYNAMIC_SAMPLING parameter and
is not triggered automatically by the optimizer. Serial statements are typically short-running, so that any overhead at compile time
 could have a huge impact on their performance.

Did you spot the difference ? on 10GR2 the decision of the level completely depends on the optimizer_dynamic_sampling parameter (system,session,hint level),
however on 11GR2 for parallel statements optimizer “automatically” decides whether to use dynamic sampling and which level to use.

Since 2005 where my Oracle career starts, I learned one thing, When there is something “automagically” adjusted and decided, there is always big possibility for less stability.

Below is the difference in a simple case where I create a table, do not gather statistics, keep optimizer_dynamic_sampling=2 and run explain plan with parallel hint.


HR@ORACOSL> create table ds_test
as
with generator as (
select--+ materialize
rownum id
from dual
connect by
level  explain plan for select  * from ds_test;

Explained.

HR@ORACOSL> @e

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4280396642

-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |   185M|    15G|   469K  (1)| 01:33:57 |
|   1 |  TABLE ACCESS FULL| DS_TEST |   185M|    15G|   469K  (1)| 01:33:57 |
-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1 / DS_TEST@SEL$1

Note
-----
   - dynamic sampling used for this statement (level=2)

17 rows selected.

-----WITH PARALLELISM

HR@ORACOSL> explain plan for select /*+parallel*/ * from ds_test;

Explained.

HR@ORACOSL> @e

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 791174141

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |    92M|  7654M|   260K  (1)| 00:52:02 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |    92M|  7654M|   260K  (1)| 00:52:02 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |    92M|  7654M|   260K  (1)| 00:52:02 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS FULL| DS_TEST  |    92M|  7654M|   260K  (1)| 00:52:02 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   4 - SEL$1 / DS_TEST@SEL$1

Note
-----
   - dynamic sampling used for this statement (level=6)
   - automatic DOP: skipped because of IO calibrate statistics are missing

22 rows selected.

-----WITH OFE=10.2.0.3

HR@ORACOSL> alter session set optimizer_features_enable='10.2.0.3';

Session altered.

HR@ORACOSL> explain plan for select /*+parallel*/ * from ds_test;

Explained.

HR@ORACOSL> @e

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 791174141

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |   108M|  8965M|   260K  (1)| 00:08:41 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |   108M|  8965M|   260K  (1)| 00:08:41 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |   108M|  8965M|   260K  (1)| 00:08:41 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS FULL| DS_TEST  |   108M|  8965M|   260K  (1)| 00:08:41 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   4 - SEL$1 / DS_TEST@SEL$1

Note
-----
   - dynamic sampling used for this statement (level=2)
   - automatic DOP: Computed Degree of Parallelism is 2 because of degree limit

22 rows selected.

As you can see Oracle adjusted my default dynamic sampling level to 6 on OFE 11.2.0.3 when the statement is parallel and keeps it as Level 2 when statement is serial or OFE is lower than 11.2.0.1. Since this is a simple case there is no effect but when you have complex joins which you are sure about the plan you can suddenly start to see changes and got heavily hit by the feature. I can assure, I had the issue with tables that has got statistics as well so it is not the case for tables that need dynamic sampling.

Since I introduced you with the feature, next question is how can we control the feature?. I think optimizer guys did a mistake here and did not give any control via an optimizer parameter. How do I know it ? If you follow what I have done in the optimizer features post you can see there is no controlling on this feature via a parameter. So we only have optimizer bugfixes as controller.
Lets check the bugfixes related with dynamic sampling.

For this I use the sql below for bugfix.sql

column sql_feaure for a34
column ofe for a10
column description for a64

select bugno,optimizer_feature_enable ofe ,sql_feature,description,value from v$system_fix_control where (sql_feature like upper('%&1%') or upper(description) like '%&1%')
order by 2 desc nulls last,1 asc;

Results for calling it with search key as ‘dynamic’

HR@ORACOSL> @bugfix dynamic

     BUGNO OFE        SQL_FEATURE                                                      DESCRIPTION                                                           VALUE
---------- ---------- ---------------------------------------------------------------- ---------------------------------------------------------------- ----------
  12399886 11.2.0.3   QKSFM_DYNAMIC_SAMPLING_12399886                                  update the index clustering factor (DS) if statistics do not exi          1
   6408301 11.2.0.2   QKSFM_DYNAMIC_SAMPLING_6408301                                   use recursive idx selectivity for partitioned table as well               1
   8767442 11.2.0.2   QKSFM_DYNAMIC_SAMPLING_8767442                                   compute NDV for all columns in a multi-column join key using DS           1
   9272549 11.2.0.2   QKSFM_DYNAMIC_SAMPLING_9272549                                   do not sample columns which have statistics                               1
   7452863 11.2.0.1   QKSFM_DYNAMIC_SAMPLING_7452863                                   adjust DS level for large parallel tables based on the size               1
   6766962 11.1.0.7   QKSFM_DYNAMIC_SAMPLING_6766962                                   sample size in DS is at most equal to the partition sample size           1
   6956212 10.2.0.5   QKSFM_DYNAMIC_SAMPLING_6956212                                   allow dynamic sampling when OR-expansion is used                          1
   7592673 10.2.0.5   QKSFM_DYNAMIC_SAMPLING_7592673                                   do not use dyn sampling index selectivity if not all keys sample          1
   6608941            QKSFM_DYNAMIC_SAMPLING_6608941                                   allow dynamic sampling on fixed tables                                    0
   6708183            QKSFM_DYNAMIC_SAMPLING_6708183                                   allow dynamic sampling on table functions                                 0

10 rows selected.

If you look carefully to the ones which are introduced on 11.2.0.* you can see the 7452863 with description adjust DS level for large parallel tables based on the size.

Now time to try to disable the feature ( alter session set “_fix_control”=’7452863:OFF';) and see what the effect is

HR@ORACOSL> alter session set optimizer_features_enable='11.2.0.3';

Session altered.

HR@ORACOSL> alter session set "_fix_control"='7452863:OFF';

Session altered.

HR@ORACOSL> explain plan for select /*+parallel*/ * from ds_test;

Explained.

HR@ORACOSL> @e

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 791174141

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |   108M|  8965M|   260K  (1)| 00:52:03 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |   108M|  8965M|   260K  (1)| 00:52:03 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |   108M|  8965M|   260K  (1)| 00:52:03 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS FULL| DS_TEST  |   108M|  8965M|   260K  (1)| 00:52:03 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   4 - SEL$1 / DS_TEST@SEL$1

Note
-----
   - dynamic sampling used for this statement (level=2)
   - automatic DOP: skipped because of IO calibrate statistics are missing

22 rows selected.

As you can see dynamic sampling is back to the default level 2. You can use this fix control till oracle comes up with a better idea for your parallel queries where you want plan stability and you are happy with your defauly dynamic sampling level.

Attention I must also warn the a bit of unexpected behaviour that this automagical adjustment is working a bit weird. When I set the dynamic sampling to level higher than 2 Oracle accepts the new level as adjusted value like below

Update: After reading the blog posts Dominic shared in comments, I ‘m now sure feature is only active with default optimizer_dynamic_sampling=2

HR@ORACOSL> alter system set optimizer_dynamic_sampling=3;

System altered.

HR@ORACOSL> explain plan for select /*+parallel*/ * from ds_test;

Explained.

HR@ORACOSL> @e

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 791174141

--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |   108M|  8965M|   260K  (1)| 00:52:03 |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |   108M|  8965M|   260K  (1)| 00:52:03 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |   108M|  8965M|   260K  (1)| 00:52:03 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS FULL| DS_TEST  |   108M|  8965M|   260K  (1)| 00:52:03 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   4 - SEL$1 / DS_TEST@SEL$1

Note
-----
   - dynamic sampling used for this statement (level=3)
   - automatic DOP: skipped because of IO calibrate statistics are missing

10053 trace has got the info below when it adjust dynamic_sampling to 6 when the default is 2

Dynamic sampling level auto-adjusted from 6 to 6  

*** 2012-05-31 20:23:02.408
** Performing dynamic sampling initial checks. **
** Dynamic sampling initial checks returning TRUE (level = 6).
** Dynamic sampling updated table stats.: blocks=1726021

I still don’t get why it says from 6 to 6 normally it should be from 2 to 6 ()

and info below when we set it to level 3 either session or system level

*************************************
  PARAMETERS WITH ALTERED VALUES
  ******************************
Compilation Environment Dump
sqlstat_enabled                     = true
optimizer_dynamic_sampling          = 3
statistics_level                    = all
parallel_query_default_dop          = 2
parallel_hinted                     = default
_optimizer_use_feedback             = false
Bug Fix Control Environment

*** 2012-05-31 20:25:34.191
** Performing dynamic sampling initial checks. **
** Dynamic sampling initial checks returning TRUE (level = 3).
** Dynamic sampling updated table stats.: blocks=1726021

With this behaviour I started to think it only kicks in when dynamic sampling is level 2 but I need to confirm it. I will update the post if I get something different than my final thought.

For more information you can check the MOS note below

Different Level for Dynamic Sampling used than the Specified Level [ID 1102413.1]

Bugs:
Bug 11841491 : POOR PERFORMANCE WITH FIX 7452863
Bug 10623519 : SQL3 – INCORRECT CARDINALITY ESTIMATES IN 11G WITH MULTIPLE FILTERS ON SAME TABLE
Bug 14059307 : AUTO-ADJUST IMPOSED DYNAMIC SAMPLING INACCURATE FOR LARGE PARTITIONED TABLE
Bug 7832981 : WITH FIX 7676927 MERGE SQL TAKES LONG AND CONSUMES LOT OF TEMP SPACE

Update:

Additional more scientific info is in the blog posts below (Thanks to Dominic’s comment )

Randolf Geist touches on it here:
http://allthingsoracle.com/dynamic-sampling-ii-controlling-the-activity-part-2/

Also Jean Pierre-Dijcks here:

https://blogs.oracle.com/datawarehousing/entry/explaining_explain_plan_note

The Silver is the New Black Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 199 other followers