Coskan’s Approach to Oracle

January 13, 2012

Session Snapper on Duty 2 – (how to support your answers, with proof)

Filed under: Basics, Performance, Tips — coskan @ 10:51 pm

Warning : This long blog is for educational and quality purposes and especially targeting Junior DBAs and DBAs who does not know how to use the Session Snapper
(greatest Oracle Tuning Tool yet) by Tanel Poder . If you feel comfortable with snapper and you think your relationship with your
clients just rocks then there is no need to read the post but you may still give it a go 🙂

Story started with a long email thread forward by a  developer who is asking question below for a system I am responsible but not feeling perfectly comfortable about how things really work.

Hi Josh (note to reader: Coskan is pronounced as Joshkahn so I use Josh at office)
Any ideas on how we can work with dbas to find out why our database is only committing at a range of 
2-30 sec for 100 commits instead of the ideal 100 commits a sec for a third party process?

I answered

From my point of view 

I will be requesting 

1-	Time interval when this process run
2-	If you have SID information when this process run 
3-	If it gives same results on UAT as well then can we run the operation while tracing is on (which need Change Request-hassle on production)

Then we can contione to understand what can be the problem

Developer replied

It runs 7/24 and UAT testing is not possible since we can't point the feeds to UAT but username is USER_XXX 
if it helps.

Long story short all I understood was in simplest words  they want to be able to process more data in same amount of time,  so I replied with a long mail below

BEGINNING OF THE EMAIL

I did some sampling on your sessions to understand what is happening and I think I can come up with rough conclusions about improvement suggestions.

Lets walk through on what is/maybe happening

This is 500 second sampling of one of your sessions

---Some important Statistics
-------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,     HDELTA, HDELTA/SEC,    %TIME, GRAPH
-------------------------------------------------------------------------------------------------------------------------------------
     71, USER_XXX  , STAT, user commits                                              ,        326,        .65,
     71, USER_XXX  , STAT, session logical reads                                     ,      1.36M,      2.71k,

—Wait profile

     71, USER_XXX  , WAIT, buffer busy waits                                         ,     1.18ms,     2.36us,      .0%, |          |
     71, USER_XXX  , WAIT, log file sync                                             ,     36.12s,    72.24ms,     7.2%, |@         |
     71, USER_XXX  , WAIT, db file sequential read                                   ,    276.17s,   552.34ms,    55.2%, |@@@@@@    |
     71, USER_XXX  , WAIT, flashback buf free by RVWR                                ,   403.28ms,   806.55us,      .1%, |          |
     71, USER_XXX  , WAIT, undo segment extension                                    ,    72.93ms,   145.86us,      .0%, |          |
     71, USER_XXX  , WAIT, SQL*Net message to client                                 ,     2.23ms,     4.47us,      .0%, |          |
     71, USER_XXX  , WAIT, SQL*Net message from client                               ,     178.2s,    356.4ms,    35.6%, |@@@@      |
     71, USER_XXX  , WAIT, SQL*Net more data from client                             ,   710.26ms,     1.42ms,      .1%, |          |

—Wait profile breakdown per sql

---------------------------------------------------------------------------------
Active% | SQL_ID          | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------
    20% | ggjkmtv8cxxh3   | db file sequential read             | User I/O
    19% | 64djbhqv5v93u   | db file sequential read             | User I/O
    13% | 3b8gt3dqtn345   | db file sequential read             | User I/O
     8% |                 | log file sync                       | Commit
     2% | 8zt375nzk26rh   | db file sequential read             | User I/O
     1% | 8ztz7p814q9u2   | db file sequential read             | User I/O
     1% | ggjkmtv8cxxh3   | ON CPU                              | ON CPU
     1% | 64djbhqv5v93u   | ON CPU                              | ON CPU
     0% | 64djbhqv5v93u   | flashback buf free by RVWR          | Configuration
     0% | 64djbhqv5v93u   | SQL*Net more data from client       | Network

--  End of ASH snap 1, end=2012-01-11 21:51:46, seconds=500, samples_taken=499

When we check the stats, you have done 326 commits in 500 seconds which makes 0.65 commit per second but if we take 180 second of idle events (SQL*Net message to client + SQL*Net message from client) out, it is 326 commits in 320 (500-180) seconds active time which is roughly 1 commit per second.

When we look at the wait profile we can see 276 seconds of “db file sequential read” (physical read) over 320 second ofnon-iddle period and 36  second  “log file sync” which we will come later

If we focus on the “db file sequential read” we can see it is %86 (276/320) of the active time and there are 3 main contributer sqls for this case.

SQLs are below

SQL> @sqlid2 ggjkmtv8cxxh3

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- ---------------------------------------------------------------------------------------------------------------
ggjkmtv8cxxh3 3503224323 DELETE FROM TABLE_T WHERE ESMP = :1

1 row selected.

  PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS AVG_FETCH_PE  AVG_ROWS_PE AVG_ELA_SECONDS      AVG_LIO_PE      AVG_PIO_PE
---------- -------- ---------- ---------- ---------- ------------ ------------ --------------- --------------- ---------------
4033870845        4          0         21      20219           .0        392.7       .26172575          1198.3            28.1


Plan hash value: 4033870845

-----------------------------------------------------------------------------------------------
| Id  | Operation         | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |                           |       |       |     4 (100)|          |
|   1 |  DELETE           | TABLE_T                   |       |       |            |          |
|*  2 |   INDEX RANGE SCAN| PK_TABLE_T                |     2 |    50 |     4   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
SQL> @sqlid2 64djbhqv5v93u

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
64djbhqv5v93u 3059590266 some complex update statement which does not need to be shared

PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS AVG_FETCH_PE  AVG_ROWS_PE AVG_ELA_SECONDS      AVG_LIO_PE      AVG_PIO_PE
---------- -------- ---------- ---------- ---------- ------------ ------------ --------------- --------------- ---------------
1856144614        3         26          2     473152           .0         85.4      .270458035           454.4            25.4

Plan hash value: 1856144614

-------------------------------------------------------------------------------------------
| Id  | Operation          | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |                      |       |       |     3 (100)|          |
|   1 |  UPDATE            | TABLE_XXXXXXX        |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| PK_TABLE_XXXXXXX     |     1 |   192 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

SQL> @sqlid2 3b8gt3dqtn345

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
3b8gt3dqtn345 1838812293 some complex update statement which does not need to be shared

PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS AVG_FETCH_PE  AVG_ROWS_PE AVG_ELA_SECONDS      AVG_LIO_PE      AVG_PIO_PE
--------- -------- ---------- ---------- ---------- ------------ ------------ --------------- --------------- ---------------
861798698        3          0          7      22308           .0         97.5      .113096364           382.6            10.9

Plan hash value: 861798698

------------------------------------------------------------------------------------------------
| Id  | Operation          | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |                           |       |       |     3 (100)|          |
|   1 |  UPDATE            | TABLE_ZZZZ                |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| PK_TABLE_ZZZZ             |     1 |    69 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

There is nothing much we can do about index unique scans other than changing the table structure but I think there is room for improve on index range scan for sql_id ggjkmtv8cxxh3 for delete statement which is waitig on db file sequential read for  %31 (500*%20 / 320 ) of the active time.

The reason the delete is doing index range scan is because we are trying to access 3 column composite index PK_TABLE_T by only using 1 column which is COL1. If business logic lets you also give other 2 columns as filter as well we will be doing much throughput with less LIO and PIO

TABLE_OWNER          TABLE_NAME                     INDEX_NAME                     POS# COLUMN_NAME
-------------------- ------------------------------ ------------------------------ ---- ------------------------------
APP_OWNER            TABLE_T                        PK_TABLE_T                        1 COL1
                                                                                      2 COL2
                                                                                      3 COL3

While I was doing sampling on above session at the same time I also did sampling on another session which is also having similar behaviour with different sqls like below.

—Some important Statistics

-------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,     HDELTA, HDELTA/SEC,    %TIME, GRAPH
-------------------------------------------------------------------------------------------------------------------------------------
    192, USER_XXX  , STAT, user commits                                              ,        343,        .69,
    192, USER_XXX  , STAT, session logical reads                                     ,      2.78M,      5.56k,

—Wait profile

  192, USER_XXX  , WAIT, buffer busy waits                                         ,      702us,      1.4us,      .0%, |          |
  192, USER_XXX  , WAIT, log file switch completion                                ,    94.34ms,   188.68us,      .0%, |          |
  192, USER_XXX  , WAIT, log file sync                                             ,     28.26s,    56.53ms,     5.7%, |@         |
  192, USER_XXX  , WAIT, db file sequential read                                   ,     364.3s,    728.6ms,    72.9%, |@@@@@@@@  |
  192, USER_XXX  , WAIT, flashback buf free by RVWR                                ,   348.79ms,   697.57us,      .1%, |          |
  192, USER_XXX  , WAIT, undo segment extension                                    ,    492.4ms,    984.8us,      .1%, |          |
  192, USER_XXX  , WAIT, SQL*Net message to client                                 ,     1.03ms,     2.07us,      .0%, |          |
  192, USER_XXX  , WAIT, SQL*Net message from client                               ,     94.02s,   188.05ms,    18.8%, |@@        |
  192, USER_XXX  , WAIT, SQL*Net more data from client                             ,   368.33ms,   736.67us,      .1%, |          |

—Wait profile breakdown per sql

---------------------------------------------------------------------------------
Active% | SQL_ID          | EVENT                               | WAIT_CLASS
---------------------------------------------------------------------------------
    37% | dkh6b1nhnhfcu   | db file sequential read             | User I/O
    32% | 29zz9n1b9sppr   | db file sequential read             | User I/O
     7% |                 | log file sync                       | Commit
     1% | dkh6b1nhnhfcu   | ON CPU                              | ON CPU
     1% | 29zz9n1b9sppr   | ON CPU                              | ON CPU
     0% | 29zz9n1b9sppr   | undo segment extension              | Configuration
     0% | 29zz9n1b9sppr   | flashback buf free by RVWR          | Configuration

--  End of ASH snap 1, end=2012-01-11 21:52:17, seconds=500, samples_taken=499

This time 94 seconds idle with 406 (500-94) second processing time which gives us 0.84 (343/406) commit per second and 392 seconds of wait
(364 “db file sequential read”+28 “log file sync”) wait.

When we look at the sqls below this time we can see 160 (500*%32) of 364 seconds (%43 of active time) is being spent on again index range scan (sqlid 29zz9n1b9sppr) on delete statement again due to the lack of composite index column filtering.

I again believe we can improve the access if we are able to give all columns as filter and maintain index unique scan.


TABLE_OWNER          TABLE_NAME                     INDEX_NAME                     POS# COLUMN_NAME                    DSC
-------------------- ------------------------------ ------------------------------ ---- ------------------------------ ----
APP_OWNER            TABLE_Z                        PK_TABLE_Z                      1    COL1
                                                                                    2    COL2
                                                                                    3    COL3

SQL> @sqlid2 dkh6b1nhnhfcu

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------
dkh6b1nhnhfcu  558381466 INSERT INTO APP_OWNER.TABLE_Z where bla bla bla bla

PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS AVG_FETCH_PE  AVG_ROWS_PE AVG_ELA_SECONDS      AVG_LIO_PE      AVG_PIO_PE
--------- -------- ---------- ---------- ---------- ------------ ------------ --------------- --------------- ---------------
        0        1        778         15     102566           .0       1065.3      .177031281          5673.4            30.4

-------------------------------------------------
| Id  | Operation                | Name | Cost  |
-------------------------------------------------
|   0 | INSERT STATEMENT         |      |     1 |
|   1 |  LOAD TABLE CONVENTIONAL |      |       |
-------------------------------------------------

SQL> @sqlid2 29zz9n1b9sppr

SQL_ID        HASH_VALUE SQL_TEXT
------------- ---------- -------------------------------------------------------------------------------------------------------
29zz9n1b9sppr 1453086391 DELETE FROM APP_OWNER.TABLE_Z WHERE ESMP = :1

1 row selected.

PLAN_HASH OPT_COST     PARSES   H_PARSES EXECUTIONS AVG_FETCH_PE  AVG_ROWS_PE AVG_ELA_SECONDS      AVG_LIO_PE      AVG_PIO_PE
---------- -------- ---------- ---------- ---------- ------------ ------------ --------------- --------------- ---------------
3229062859        4          0         30      21308           .0       1136.6      .642183193          2081.5            68.1

--------------------------------------------------------------------------------------
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | DELETE STATEMENT  |                  |       |       |     4 (100)|          |
|   1 |  DELETE           | TABLE_Z          |       |       |            |          |
|*  2 |   INDEX RANGE SCAN| PK_TABLE_Z       |     5 |   135 |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

I think I managed to make it clear about the index range scan (for more information see the bonus material below), Now Lets see what the log file sync is all about and why we are seeing it .
<p

Log File sync event has got definition below in Oracle Documentation.

log file sync

When a user session commits, the session’s redo information needs to be flushed to the redo logfile. The user session will post the LGWR to write the log buffer to the redo log file. When the LGWR has finished writing, it will post the user session.

Wait Time: The wait time includes the writing of the log buffer and the post.

Parameter Description
buffer# The number of the physical buffer in the redo log buffer that needs to be synchronized

Potential causes

Wait Event General Area Possible Causes Look for / Examine
log file sync I/O, over- committing Slow disks that store the online logs

Un-batched commits

Check the disks that house the online redo logs for resource contention. Check the number of transactions (commits + rollbacks) each second, from V$SYSSTAT.

First rule of tuning log file sync is doing less commit which means instead of commiting every dml or every single operation like

begin
for i in 300001..600000 loop
delete from t1 where id1=to_char(i);
commit;
end loop;
end;
/

Doing commits in batch like

begin
for i in 300001..600000 loop
delete from t1 where id1=to_char(i);
end loop;
commit;
end;
/

To prove how you are doing commits, I just get before sql statistics then get sampling of the session then again get after sql statistics and to see if number of user commits statistics during sampling interval are matching with number of increase in the sql executions

—-note to blog reader: this is, how I did,  no rocket science  (ones you arm yourself with right tools) just run the block as whole

select sql_id,executions from v$sql where sql_id in ('29zz9n1b9sppr','ggjkmtv8cxxh3');
@snapper ash=sql_id+event,stats,gather=stw,sinclude=user%commits 20 1 71,192
select sql_id,executions from v$sql where sql_id in ('29zz9n1b9sppr','ggjkmtv8cxxh3');

Results are

SQL> @test1

SQL_ID        EXECUTIONS
------------- ----------
29zz9n1b9sppr      28897
ggjkmtv8cxxh3      28211

Sampling SID 71,192 with interval 100 seconds, taking 1 snapshots...

-- Session Snapper v3.53 by Tanel Poder ( http://blog.tanelpoder.com )

-------------------------------------------------------------------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                                                 ,     HDELTA, HDELTA/SEC,    %TIME, GRAPH
-------------------------------------------------------------------------------------------------------------------------------------
     71, USER_XXX  , STAT, user commits                                              ,         12,        .12,
     71, USER_XXX  , TIME, repeated bind elapsed time                                ,       59us,      .59us,      .0%, |          |
     71, USER_XXX  , TIME, parse time elapsed                                        ,      127us,     1.27us,      .0%, |          |
     71, USER_XXX  , TIME, DB CPU                                                    ,   278.96ms,     2.79ms,      .3%, |          |
     71, USER_XXX  , TIME, sql execute elapsed time                                  ,     36.32s,   363.21ms,    36.3%, |@@@@      |
     71, USER_XXX  , TIME, DB time                                                   ,     40.75s,   407.51ms,    40.8%, |@@@@@     |
     71, USER_XXX  , WAIT, log file sync                                             ,      4.42s,    44.22ms,     4.4%, |@         |
     71, USER_XXX  , WAIT, db file sequential read                                   ,     35.04s,   350.43ms,    35.0%, |@@@@      |
     71, USER_XXX  , WAIT, SQL*Net message to client                                 ,      242us,     2.42us,      .0%, |          |
     71, USER_XXX  , WAIT, SQL*Net message from client                               ,     60.11s,   601.05ms,    60.1%, |@@@@@@    |
     71, USER_XXX  , WAIT, SQL*Net more data from client                             ,     3.94ms,    39.44us,      .0%, |          |
    192, USER_XXX  , STAT, user commits                                              ,         12,        .12,
    192, USER_XXX  , TIME, DB CPU                                                    ,   207.97ms,     2.08ms,      .2%, |          |
    192, USER_XXX  , TIME, sql execute elapsed time                                  ,     30.74s,   307.37ms,    30.7%, |@@@@      |
    192, USER_XXX  , TIME, DB time                                                   ,     35.14s,    351.4ms,    35.1%, |@@@@      |
    192, USER_XXX  , WAIT, log file sync                                             ,       4.4s,    43.98ms,     4.4%, |@         |
    192, USER_XXX  , WAIT, db file sequential read                                   ,     29.95s,   299.51ms,    30.0%, |@@@       |
    192, USER_XXX  , WAIT, SQL*Net message to client                                 ,      113us,     1.13us,      .0%, |          |
    192, USER_XXX  , WAIT, SQL*Net message from client                               ,     64.55s,   645.48ms,    64.5%, |@@@@@@@   |
    192, USER_XXX  , WAIT, SQL*Net more data from client                             ,    19.36ms,   193.62us,      .0%, |          |
--  End of Stats snap 1, end=2012-01-12 22:27:59, seconds=100

---------------------------------------------------------------
Active% | SQL_ID          | EVENT
---------------------------------------------------------------
    28% | 29zz9n1b9sppr   | db file sequential read             |
    12% | ggjkmtv8cxxh3   | db file sequential read             |
    11% | 3b8gt3dqtn345   | db file sequential read             |
     9% |                 | log file sync                       |
     9% | 64djbhqv5v93u   | db file sequential read             |
     4% | dkh6b1nhnhfcu   | db file sequential read             |
     1% | 8zt375nzk26rh   | db file sequential read             |
     1% | b513mwk4mf9rt   | db file sequential read             |
     1% | 0h01fy07y1c1p   | db file sequential read             |

--  End of ASH snap 1, end=2012-01-12 22:27:59, seconds=100, samples_taken=99

PL/SQL procedure successfully completed.

SQL_ID        EXECUTIONS
------------- ----------
29zz9n1b9sppr      28909
ggjkmtv8cxxh3      28223

As you can see, both sql increase by 12  (after-before execution times) and both sql show user commits statistics delta as 12 which means  there is a commit per execution that leads log file sync event.

After proving that the app is not doing batch commits,  I wonder if this is something avoidable or not,  in terms of business logic. Doing batch commits  will definitely help to increase your overall throughput plus whole system health.

So in my humble opinion we have two options without me seeing any code logic.

1- Providing unique index access for the delete operations by giving all columns of the index
2- Decreasing the number of commits and try to do batch commit.

For you to test all my actions, I have added a a simple synthetic test case attached to understand how these implementations can help us.

Cases tested.

1-Nonbatch commit range scan
2-Nonbatch commit unique scan

3-Batch commit range scan
4-Batch commit unique scan

Timings for the test are as follows for deleting 100K row sets as you can see

Elapsed Improvement
nonbatch-commit-range 84 0%
nonbatch-commit-unique 47 44%
batch-commit-range 55 34%
batch-commit-unique 35 58%

END OF THE EMAIL


Note to blog reader : (In case they are still reading), I wish you found something interesting about how I used the right tool right time
and how I answered the development questions without talking bollocks but numbers and proofs. One thing I need to admit there were only 2 sessions running this operation which made my life easy.

If you wonder the duration of whole diagnose: It took 15 mins in total to find out what is going on and 1 hour to write the mail but that 1 hour save at least 3-4 unnecessary email ping pong just because nobody can prove their fact with numbers.

Quick Tip: If you are sending any output to anybody use Courier New character set and make everybodies eyes happy.

Session Snapper and its usage can be found at Tanels website
Test case can be found  here

Advertisement

4 Comments »

  1. Enjoyed the post. Favourite part: “…how I answered the development questions without talking bollocks but numbers and proofs”

    Comment by ezuall — January 18, 2012 @ 10:07 am

  2. Awesome Post, I am still trying to learn snapper(awesome tool). But had one question for you.
    When you sampled this session for 500 seconds what was your snapper input? was is something like below ?

    @snapper ash=sql_id+event+wait_class,stat,gather=tsw 500 1 SID

    or did you take 500 snapshots or 1 second ?

    @snapper ash=sql_id+event+wait_class,stat,gather=tsw 1 500 SID

    Either ways….the 500 seconds, dose snapper goes to sleep for most of that 500 seconds or works on cpu or something ??

    Comment by irfan — January 30, 2012 @ 2:26 pm

    • Thank you for your comments

      I used

      @snapper ash,stats 500 1 SID

      which is 1 time 500 second snapshots

      snapper does not sleep. It gathers usually 1 snap per second. If you check the code you can see the whole idea behind it

      Comment by coskan — February 2, 2012 @ 10:44 pm

  3. Hi Coskan,

    I found your methodological explanation of this commit issue as a perfect example of what Cary Millsaps said “Why guess when you can know?”. I like very much this blog article even though that commiting once and outside a loop is the thing I would have suggested immediately if I have had been asked to help solving this issue.

    Best Regards

    Comment by hourim — March 24, 2012 @ 4:15 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: