After you upgrade to 11G from lower versions you might realize that some of your plans involving nested loops are changed and you started to see extra nested loops.
According to documentation this is because of the “new” implementation for Nested Loop Joins.
Document says
Oracle Database 11g introduces a new implementation for nested loop joins. As a result, execution plans that include nested loops might appear different
than they did in previous releases of Oracle Database. Both the new implementation and the original implementation for nested loop
joins are possible in Oracle Database 11g. So, when analyzing execution plans, it is important to understand that the number of NESTED LOOPS join row sources
might be different.
Oracle Database 11g introduces a new implementation for nested loop joins to reduce overall latency for physical I/O.
When an index or a table block is not in the buffer cache and is needed to process the join, a physical I/O is required.
Oracle Database 11g can batch multiple physical I/O requests and process them using a vector I/O instead of processing them one at a time.
As part of the new implementation for nested loop joins, two NESTED LOOPS join row sources might appear in the execution plan where only one
would have appeared in prior releases. In such cases, Oracle Database allocates one NESTED LOOPS join row source to join the values from the table
on the outer side of the join with the index on the inner side. A second row source is allocated to join the result of the first join,
which includes the rowids stored in the index, with the table on the inner side of the join.
Here are the plan differences between new implementation and old one.
Query of the plan
select /*+ use_nl(t2) full(t2) gather_plan_statistics*//*test_650*/ * from t2,t1 where t1.id=t2.id and t2.id>650000;
for 10.2.0.4
HR@ORACOS> @ofe Enter value for ofe: 10.2.0.4 Session altered. HR@ORACOS> @test_e Explained. HR@ORACOS> @e PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 1708821971 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 350K| 78M| 703K (1)| 00:58:40 | | 1 | NESTED LOOPS | | 350K| 78M| 703K (1)| 00:58:40 | |* 2 | TABLE ACCESS FULL | T2 | 350K| 39M| 3446 (1)| 00:00:18 | | 3 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 118 | 2 (0)| 00:00:01 | |* 4 | INDEX UNIQUE SCAN | T1_IDX_ID | 1 | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("T2"."ID">650000) 4 - access("T1"."ID"="T2"."ID") filter("T1"."ID">650000) 18 rows selected. HR@ORACOS>
For 11.2.0.1
HR@ORACOS> @ofe Enter value for ofe: 11.2.0.1 Session altered. HR@ORACOS> @test_e Explained. HR@ORACOS> @e PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 3217821761 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 350K| 78M| 703K (1)| 00:58:40 | | 1 | NESTED LOOPS | | | | | | | 2 | NESTED LOOPS | | 350K| 78M| 703K (1)| 00:58:40 | |* 3 | TABLE ACCESS FULL | T2 | 350K| 39M| 3446 (1)| 00:00:18 | |* 4 | INDEX UNIQUE SCAN | T1_IDX_ID | 1 | | 1 (0)| 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 118 | 2 (0)| 00:00:01 | ------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("T2"."ID">650000) 4 - access("T1"."ID"="T2"."ID") filter("T1"."ID">650000) 19 rows selected.
According to me, this change brought extra complexity to read the plans. NLJ plans prior to 11G are way easier to read and bear in mind this example is just a simple 2 table join.
Once you query has got crazy number of joins it gets harder and harder to read. One thing you can spot is that the cost is “same” which is very good for plan change analysis.
According to my experience, %99 of same cost plan changes are because of this implementation and this made me exclude same cost-different plan_hash_value plans from comparisons.
On the last post, I already gave some information about the controlling hidden CBO parameter _nlj_batching_enabled.
Better I repeat here, this parameter has 2 values, 0 to disable the behaviour, 1 (which is default) to enable the behavior and this parameter is there since 11GR1.
When I interpret the sentence “new implementation for nested loop joins to reduce overall latency for physical I/O”. I expect some improvement with runtime like the effect of bulk collect or changing the array size of your sessions. But is my assumption true ?
I have doubts about this definition because in reality I do not feel any difference as interpreted from the official definition.
Nothing got faster in our production environment related with this change.
To prove my experience, I build a test case which you can find in the bottom of the page.
My test environment is 11.2.0.1 on Ubuntu with 1.5G memory target on a machine with 4 cores.
For this test I have 3 tables all of which have ID columns and have an index on ID columns
T1=100 million rows (I think I was too generous)
T2=1 million rows
T3=50K rows -derived from T2
First query is for small data set which I can do buffer cache read after the initial run to test non phsical I/O performance
select /*+ use_nl(t3) full(t3) gather_plan_statistics*//*test_999550*/ * from t3,t1 where t1.id=t3.id;
Runtime stats and plan statistics for this sql in 11.2.0.1 with _nlj_batching_enabled=0
49999 rows selected. Elapsed: 00:00:01.51 Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 52395 consistent gets 1842 physical reads 0 redo size 1781536 bytes sent via SQL*Net to client 1612 bytes received via SQL*Net from client 101 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 49999 rows processed Plan hash value: 692324935 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 0 | | 0 |00:00:00.01 | 0 | 0 | | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 1 | 49999 |00:00:00.83 | 52395 | 1842 | | 2 | NESTED LOOPS | | 1 | 50143 | 99999 |00:00:04.65 | 2396 | 978 | | 3 | TABLE ACCESS FULL | T3 | 1 | 49999 | 49999 |00:00:00.11 | 949 | 849 | |* 4 | INDEX UNIQUE SCAN | T1_IDX_ID | 49999 | 1 | 49999 |00:00:00.16 | 1447 | 129 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T1"."ID"="T3"."ID")
Runtime stats and plan statistics for this sql in 11.2.0.1 with _nlj_batching_enabled=1
select /*+ use_nl(t3) full(t3) gather_plan_statistics*//*test_999550*/ * from t3,t1 2 where t1.id=t3.id; 49999 rows selected. Elapsed: 00:00:01.40 Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 52395 consistent gets 1842 physical reads 0 redo size 1781536 bytes sent via SQL*Net to client 1612 bytes received via SQL*Net from client 101 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 49999 rows processed Plan hash value: 198103700 ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 49999 |00:00:00.74 | 52395 | 1842 | | 1 | NESTED LOOPS | | 1 | | 49999 |00:00:00.74 | 52395 | 1842 | | 2 | NESTED LOOPS | | 1 | 50143 | 49999 |00:00:00.27 | 2396 | 978 | | 3 | TABLE ACCESS FULL | T3 | 1 | 49999 | 49999 |00:00:00.07 | 949 | 849 | |* 4 | INDEX UNIQUE SCAN | T1_IDX_ID | 49999 | 1 | 49999 |00:00:00.15 | 1447 | 129 | | 5 | TABLE ACCESS BY INDEX ROWID| T1 | 49999 | 1 | 49999 |00:00:00.22 | 49999 | 864 | ------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T1"."ID"="T3"."ID")
Second query is for bigger data set which always do some physical read for T2
select /*+ use_nl(t2) full(t2) gather_plan_statistics*//*test_650*/ * from t2,t1 where t1.id=t2.id and t2.id>650000;
Runtime stats and plan statistics for this sql in 11.2.0.1 with _nlj_batching_enabled=0
select /*+ use_nl(t2) full(t2) gather_plan_statistics*//*test_650*/ * from t2,t1 where t1.id=t2.id and t2.id>650000; 2 3 350000 rows selected. Elapsed: 00:00:10.11 Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 377819 consistent gets 23736 physical reads 0 redo size 12606669 bytes sent via SQL*Net to client 8212 bytes received via SQL*Net from client 701 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 350000 rows processed Plan hash value: 1010637246 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 0 | | 0 |00:00:00.01 | 0 | 0 | | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 1 | 350K|00:00:04.91 | 377K| 23736 | | 2 | NESTED LOOPS | | 1 | 350K| 700K|00:01:46.31 | 27819 | 17752 | |* 3 | TABLE ACCESS FULL | T2 | 1 | 350K| 350K|00:00:01.39 | 17643 | 16951 | |* 4 | INDEX UNIQUE SCAN | T1_IDX_ID | 350K| 1 | 350K|00:00:01.05 | 10176 | 801 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("T2"."ID">650000) 4 - access("T1"."ID"="T2"."ID") filter("T1"."ID">650000)
Runtime stats and plan statistics for this sql in 11.2.0.1 with _nlj_batching_enabled=1
350000 rows selected. Elapsed: 00:00:10.00 Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 377819 consistent gets 23736 physical reads 0 redo size 12606669 bytes sent via SQL*Net to client 8212 bytes received via SQL*Net from client 701 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 350000 rows processed Plan hash value: 3217821761 ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 350K|00:00:04.67 | 377K| 23736 | | 1 | NESTED LOOPS | | 1 | | 350K|00:00:04.67 | 377K| 23736 | | 2 | NESTED LOOPS | | 1 | 350K| 350K|00:00:02.54 | 27819 | 17752 | |* 3 | TABLE ACCESS FULL | T2 | 1 | 350K| 350K|00:00:01.02 | 17643 | 16951 | |* 4 | INDEX UNIQUE SCAN | T1_IDX_ID | 350K| 1 | 350K|00:00:01.01 | 10176 | 801 | | 5 | TABLE ACCESS BY INDEX ROWID| T1 | 350K| 1 | 350K|00:00:01.47 | 350K| 5984 | ------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("T2"."ID">650000) 4 - access("T1"."ID"="T2"."ID") filter("T1"."ID">650000)
On the last post I already gave some information about the controlling parameter _nlj_batching_enabled so I change only this parameter to do my test and “keep OFE 11.2.0.1”
I ran these queries 4*50 times
1- _nlj_batching_enabled=1 (11G behaviour), flush the buffer cache twice and flush the cursor before first run, and run the statement 50 times.
2- _nlj_batching_enabled=1 (11G behaviour), flush the cursor at first run, flush the buffer cache twice before “each run” and run the statement 50 times to force phsical read all the time
3- _nlj_batching_enabled=0 (pre 11G behaviour), flush the buffer cache twice and flush the cursor before first run, and run the statement 50 times.
4- _nlj_batching_enabled=0 (pre 11G behaviour), flush the cursor at first run, flush the buffer cache twice before “each run” and run the statement 50 times to force phsical read all the time
After each run I snap the statistics from v$sql for the related sql and then do the comparison afterwards
Here are the results -( AVG1 includes first run for the step 1 and 3 which does both phsical and buffer cache read first run is not included on other calculations)
New behavior was faster in my Small dataset test.
I only consider median and for buffer cache reads it 10 milisecond less for CPU and 25 miliseconds less for Total ELA.
In Percentage we can say ~%5 for CPU and %6 for Total ELA
When it comes to phsical reads again 10 milisecond less for CPU and 11 miliseconds less for Total ELA
In percentage improvement is ~%4 for CPU and ~%1.5 for Total ELA
I was initially expecting bigger dataset will be again faster because we will do more phsical read but
This time new behaviour is slower for big dataset
For buffer cache reads 20 miliseconds slower for CPU and ~3 miliseconds slower for total ELA
In percentage degredation is ~%1.5 for CPU and ~%0.06 for Total ELA
For phsical reads 40 miliseconds slower for CPU and 7.5 miliseconds slower for total ELA
In percentage degredation is ~%2.5 for CPU and ~%0.14 for Total ELA
When I interpret these results I have the gut feeling that this is not like what is written in the documentation.
It looks like response time wise behavior is nearly same. From this output one can say either CBO engineers wanted to make it harder to read the plans or maybe they wanted to be more honest to show what was always there.
Better optimizer group write a blog post about it rather than me 🙂
There is a bug note which also supports the idea that its been there since 9i
Bug 9919556: OFN DB: CUSTOMER INTERFACE PROGRAM TAKING LONGER TIME AFTER 11G R2 UPGRADE
There are also some other bugs related with dumps and wrong results reported for this behavior which is inconsistent with the idea that its been there for a long time.
Bugs disappeared when behavior is disabled and revert back to original so it loooks like has some effect somewhere but not necesarrily on performance as it can be interpreted??
- Bug 10330043: WRONG RESULTS AFTER UPGRADE TO 11.1.0.7
- Bug 8658757: ORA-07445: EXCEPTION ENCOUNTERED: CORE DUMP [KDUGETPART()+532] [SIGBUS]
- Bug 7378322 – OERI [6704] on SELECT FOR UPDATE with partition iterator [ID 7378322.8]
- ORA-600 [6704] on delete from partitioned table [ID 868212.1]
- Bug 9563951 – Intermittent Wrong Results with nested loops and expressions [ID 9563951.8]
- Bug 7662620 – A dump can occur in ldxite with NLJ batching [ID 7662620.8]
- ORA-7445 [ldxite()] [ID 781295.1]
Now its time for bonus material, probably careful eyes already spotted that the plan with _nlj_batching_enabled=0 is different than plan would be generated on 10.2.04 and also
there is a problem with runtime A-time values on 3th step of the plans when _nlj_batching_enabled=0 in 11.2.0.1. They are basically reported way higher than actual elapsed time.
First let me show you how actual pre 11G plan and reported A-Time values are
(to get this plan I flush buffer cache twice,flush the sql from shared pool, set ofe to 10.2.0.4 instead of _nlj_batching_enabled=0 on OFE=11.2.0.1)
Small Dataset
HR@ORACOS> @test 49999 rows selected. Elapsed: 00:00:01.44 Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 52395 consistent gets 1842 physical reads 0 redo size 1781536 bytes sent via SQL*Net to client 1613 bytes received via SQL*Net from client 101 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 49999 rows processed Plan hash value: 225503291 ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 49999 |00:00:00.74 | 52395 | 1842 | | 1 | NESTED LOOPS | | 1 | 50143 | 49999 |00:00:00.74 | 52395 | 1842 | | 2 | TABLE ACCESS FULL | T3 | 1 | 49999 | 49999 |00:00:00.05 | 949 | 849 | | 3 | TABLE ACCESS BY INDEX ROWID| T1 | 49999 | 1 | 49999 |00:00:00.44 | 51446 | 993 | |* 4 | INDEX UNIQUE SCAN | T1_IDX_ID | 49999 | 1 | 49999 |00:00:00.15 | 1447 | 129 | ------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("T1"."ID"="T3"."ID")
Big Dataset
HR@ORACOS> @test 350000 rows selected. Elapsed: 00:00:10.25 Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 377819 consistent gets 23736 physical reads 0 redo size 12606669 bytes sent via SQL*Net to client 8213 bytes received via SQL*Net from client 701 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 350000 rows processed Plan hash value: 1708821971 ------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | ------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 350K|00:00:04.45 | 377K| 23736 | | 1 | NESTED LOOPS | | 1 | 350K| 350K|00:00:04.45 | 377K| 23736 | |* 2 | TABLE ACCESS FULL | T2 | 1 | 350K| 350K|00:00:01.03 | 17643 | 16951 | | 3 | TABLE ACCESS BY INDEX ROWID| T1 | 350K| 1 | 350K|00:00:02.81 | 360K| 6785 | |* 4 | INDEX UNIQUE SCAN | T1_IDX_ID | 350K| 1 | 350K|00:00:01.02 | 10176 | 801 | ------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("T2"."ID">650000) 4 - access("T1"."ID"="T2"."ID") filter("T1"."ID">650000)
As you can see timings are now more accurate and Nested Loop step is now changed its location in the plan.
I don’t know why A-time is reported wrong but thanks to the methods I described in earlier methods I can say, What is causing the plan change difference like below
HR@ORACOS> alter session set "_nlj_batching_enabled"=0; ===> to revert back pre 11G behavior Session altered. HR@ORACOS> HR@ORACOS> @build_optimizer_bugfix_test_e ====> to find out which bugfix may have caused Enter value for optimizer_feature_version: 11.1.0.6 ====> I alread found out it changed after 11.1.0.6 HR@ORACOS> @optimizer_bugfix_test_e PLAN_HASH_VALUE BUGFIX -------------------- ---------- 225503291 5585313 692324935 6439032 4279274 4370840 4386734 4507997 .... .... .... 39 rows selected. HR@ORACOS> @bugfixg Enter value for bugno: 5585313 BUGNO VALUE SQL_FEATURE DESCRIPTION OPTIMIZER_FEATURE_ENABLE ---------- ---------- ------------------------- ---------------------------------------------------------------- ------------------------- - 5585313 1 QKSFM_EXECUTION_5585313 enable index prefetching for NLJ with unique scan as inner acces 11.1.0.6
Since I know what is causing the plan changes for nested loop now its time for me to go ahead and test if OFE=10.2.0.4 is better for the tests I run above.
Here are the comparison between OFE 11.2.0.1 and OFE 10.2.0.4 (instead of setting _nlj_batching_enabled=0 while OFE is still 11.2.0.1 )
This time 10.2.0.4 results were better for both small and big dataset
for small dataset
For buffer cache reads, 10 miliseconds slower for CPU and ~3 miliseconds slower for total ELA
In percentage degredation is ~%7 for CPU and ~%0.06 for Total ELA
For phsical reads 10 miliseconds slower for CPU and ~19 miliseconds slower for total ELA
In percentage degredation is ~%7 for CPU and ~%8 for Total ELA
for big dataset
For buffer cache reads 70 miliseconds slower for CPU and ~65 miliseconds slower for total ELA
In percentage degredation is ~%5 for CPU and ~%1.3 for Total ELA
For phsical reads 80 miliseconds slower for CPU and 120 miliseconds slower for total ELA
In percentage degredation is ~%4.8 for CPU and ~%2.2 for Total ELA
As a last word I can’t say my tests are best way to measure, however results are not looking consistent with what is written in the official documentation.
I think this is a new feature that will effect your plans after you upgrade your DB to 11G, but it will not necessarily make any perceivable difference to your response times.
In my humble opinion as long as the cost of your queries are same with the new implementation in place,
you can ignore the existence of it and never bother to change the parameter unless support says so.
Please feel free to comment if you have anything to say on this one and if you spot anything wrong in my analysis.
Source Code;
Test script
Full Excell Sheet for time comparisons
<–Previous Post in This Series —————- Next Post in this series–>