Coskan’s Approach to Oracle

January 27, 2010

Working with statspack-part-1a-Diagnosis

Filed under: Basics, Performance — coskan @ 6:24 pm

Part 1b—->

Note: Post is a bit long and might be boring, if you are already good with statspack, but there is a question at the end if you would like to answer

Yesterday, I decided to look for something to troubleshoot without waiting for any user complaints so I can write something useful here :)
Primary reason for this practice was to get used to interpret statspack reports to find possible bottlenecks
secondary reason was because I was bored about being underutilized.
I always feel comfortable troubleshooting session wide problems, non-historic performance problems or historic problems with Diagnostics Pack,
but I am not that experienced on statspack.
In my current environment I don’t have Diagnostic Pack option so I have to make good friendship with statspack and
the best way to be a good friend is to spend more time with your new friend as I am going to explain how.

Characteristics of the DB I will work on:
daily load is between 08:00 and 19:00
export at 21:00 running for roughly 1 hour
DB backup start 23:00

There are couple of ways to spot a possible bottlenecks without getting individual reports.

—First one is from Tanel Poder and his famous Perfsheet. This tool is already configured and ready to be used with statspack.
The query he uses to generate the report is the same idea behind the statspack delta views of  Tim Gorman. Lets start

first screen is login screen (you need to activate the macros).

Inputs for statspack report is
Data Source- TNS Entry of the DB
Username/Password- username with statspack schema objects read privilege
Which Instance-instance name (lower case or modify the related sql in Queries sheet with upper function )
Which Query-Statspack CPU+Wait+Stats Hourly
From Date-start date
To Date-end date
then press show me. You will get warnings during the graph generation saying PivotTable report will not fit on the sheet … Just Press OK till the warnings end
Your graph might be empty on Office 2007. Just go to the Report Filter and select WAIT from EVENT_TYPES to see any abnormality on wait events.

First Graph is WAIT graph like below. As you see, I was able to spot a possible problem on 20/01/2010 at first try.

Lets make it more visible by adding top 10 filter to Legend Fields Section.

Another option is focus on the date where the abnormality is happened. Lets select only 20th. FYI top 10 filter is still on

Without the top 10 filter. Graph is slightly different but still giving what we need.

Lets Focus on only time interval between 18 and 23

As you see there is an abnormal increase on row cache lock logfile switch and buffer busy

Lets do double-check with system stats. By selecting all hours and available dates and events your graph will be a bit misleading because of very high metrics like below

What can we do ? DB time is something we want to decrease when troubleshooting, so lets focus on DB Time by filtering Legend Fields on DB Time

We have 2 peaks, which is a bit confusing for the initial assumption of problem timing. Lets focus on 20th again

There are two peaks  between 13-15 and 21-23. We have an interesting situation here in terms of DB time. From wait event perspective abnormality is between 20-22 but DB time perspective we have 1 more time slot between 13-15 .

At this point  I quit Perfsheet. I spotted what I needed for initial problem chase now I can go to other options before, gathering the report. (these options are way more easier than statspack report generation thats why I did not directly go to the statspack)

Second option is from Tim Gorman himself. sp_systime which I find very useful in terms of overall response time analysis. ( i needed to change the script to focus between the dates instead of days before sysdate option)

Here is the ouput

First Part is overall response time for that day.


SQL> @sp_systime2
Please enter the ORACLE_SID value: PROD1
Please enter the report begin date: sysdate-8
Please enter the report end date:sysdate-5

                                                                        Total             Cum
Service Seconds % of % of
Day or Wait Name Spent Total Total
------------ -------- ----------------------------------- ------------------- ------- -------
19-JAN       Service  SQL execution                                115,326.24   78.78   78.78
             Wait     db file sequential read                        7,697.33    5.26   84.04
             Wait     db file scattered read                         6,213.01    4.24   88.29
             Service  Recursive SQL execution                        4,638.75    3.17   91.46
Wait Backup: sbtwrite2 2,671.85 1.83 93.28
Wait Backup: sbtbackup 2,081.53 1.42 94.70
             Wait     log file parallel write                        1,423.58    0.97   95.67
             Service  Parsing SQL                                    1,109.49    0.76   96.43
             Wait     db file parallel write                           913.09    0.62   97.06
             Wait     log file sync                                    761.78    0.52   97.58
Wait Datapump dump file I/O 631.13 0.43 98.01
             Wait     Log archive I/O                                  549.08    0.38   98.38

20-JAN       Service  Recursive SQL execution                      255,597.67   65.42   65.42
             Wait     row cache lock                                48,098.17   12.31   77.74
Wait log file switch (checkpoint incompl 41,613.30 10.65 88.39
             Wait     buffer busy waits                             13,474.25    3.45   91.84
             Wait     db file sequential read                        9,788.51    2.51   94.34
             Wait     db file scattered read                         6,831.08    1.75   96.09
Wait Backup: sbtwrite2 3,512.51 0.90 96.99
             Wait     log file parallel write                        1,897.38    0.49   97.48
Wait Datapump dump file I/O 1,674.20 0.43 97.90
             Wait     db file parallel write                         1,389.77    0.36   98.26
             Wait     log file sync                                  1,073.14    0.27   98.53
             Service  Parsing SQL                                      964.58    0.25   98.78

21-JAN       Service  SQL execution                                 33,045.26   44.91   44.91
             Service  Recursive SQL execution                       12,785.72   17.38   62.28
             Wait     db file sequential read                        8,296.99   11.28   73.56
             Wait     db file scattered read                         6,731.96    9.15   82.71
Wait Backup: sbtwrite2 3,340.23 4.54 87.25
             Wait     log file parallel write                        1,856.99    2.52   89.77
             Wait     db file parallel write                         1,359.19    1.85   91.62
             Wait     log file sync                                  1,131.31    1.54   93.16
             Service  Parsing SQL                                    1,011.46    1.37   94.53
Wait enq: TX - row lock contention 861.24 1.17 95.70
             Wait     Log archive I/O                                  686.01    0.93   96.63
Wait Backup: sbtbackup 402.52 0.55 97.18

If you check the top 5 you will easily spot the row cache lock, log file switch and buffer busy waits are different from the other days

Second part of the report is hourly based response time for the days. I truncated the output for 8 hours but the full is here ***

 12:00 Service  SQL execution                                  6,268.92   76.66   76.66
       Wait     db file sequential read                          634.26    7.76   84.42
       Wait     db file scattered read                           344.02    4.21   88.63
       Service  Recursive SQL execution                          230.23    2.82   91.44
       Wait     log file parallel write                          147.58    1.80   93.25
       Wait     log file sync                                    129.94    1.59   94.84
 13:00 Service  SQL execution                                  6,942.33   83.97   83.97
       Wait     db file sequential read                          375.15    4.54   88.51
       Wait     db file scattered read                           207.74    2.51   91.02
       Service  Recursive SQL execution                          200.86    2.43   93.45
Wait enq: TX - row lock contention 162.83 1.97 95.42
       Wait     log file parallel write                           94.08    1.14   96.56
 14:00 Service  Recursive SQL execution                      241,820.91   99.35   99.35
       Wait     db file scattered read                           574.84    0.24   99.59
       Wait     db file sequential read                          515.74    0.21   99.80
       Wait     log file parallel write                           97.09    0.04   99.84
Wait enq: TX - row lock contention 87.19 0.04 99.88
       Service  Parsing SQL                                       81.25    0.03   99.91
 15:00 Service  SQL execution                                  3,620.32   62.09   62.09
       Service  Recursive SQL execution                        1,210.42   20.76   82.86
       Wait     db file sequential read                          328.60    5.64   88.49
       Wait     db file scattered read                           294.73    5.06   93.55
       Service  Parsing SQL                                       73.42    1.26   94.81
Wait enq: TX - row lock contention 72.03 1.24 96.04
 16:00 Service  SQL execution                                  3,667.78   70.86   70.86
       Wait     db file sequential read                          429.87    8.30   79.16
       Service  Recursive SQL execution                          371.24    7.17   86.33
       Wait     db file scattered read                           215.28    4.16   90.49
Wait enq: TX - row lock contention 144.67 2.79 93.29
       Wait     log file parallel write                          100.81    1.95   95.24
 17:00 Service  SQL execution                                  2,904.54   70.63   70.63
       Wait     db file sequential read                          397.42    9.66   80.29
       Service  Recursive SQL execution                          202.13    4.92   85.21
       Wait     db file scattered read                           138.29    3.36   88.57
       Wait     log file sync                                     94.87    2.31   90.88
       Wait     log file parallel write                           90.94    2.21   93.09
 18:00 Service  SQL execution                                  1,791.86   65.17   65.17
       Wait     db file sequential read                          302.02   10.98   76.15
       Service  Recursive SQL execution                          176.04    6.40   82.56
       Wait     db file scattered read                           151.83    5.52   88.08
       Wait     log file parallel write                           80.44    2.93   91.00
       Wait     log file sync                                     63.46    2.31   93.31
 19:00 Service  SQL execution                                  2,261.88   44.86   44.86
       Wait     db file sequential read                          758.41   15.04   59.91
       Wait     db file scattered read                           540.35   10.72   70.62
       Wait     db file parallel write                           272.36    5.40   76.03
       Wait     log file parallel write                          269.47    5.34   81.37
       Wait     log file sequential read                         154.58    3.07   84.44
 20:00 Service  SQL execution                                  3,517.30   73.29   73.29
       Wait     db file sequential read                          467.34    9.74   83.03
       Wait     db file scattered read                           251.47    5.24   88.27
       Wait     log file parallel write                          161.71    3.37   91.64
       Wait     db file parallel write                            92.73    1.93   93.57
       Wait     Log archive I/O                                   90.24    1.88   95.45
 21:00 Service  SQL execution                                  2,875.04   45.72   45.72
Wait Datapump dump file I/O 1,504.28 23.92 69.64
       Wait     db file sequential read                          873.68   13.89   83.53
       Wait     db file scattered read                           417.53    6.64   90.17
       Wait     log file parallel write                          130.15    2.07   92.24
       Wait     db file parallel write                           106.51    1.69   93.93
 22:00 Wait     row cache lock                                48,097.61   41.83   41.83
Wait log file switch (checkpoint incompl 41,594.75 36.17 78.00
       Wait     buffer busy waits                             13,454.89   11.70   89.70
       Service  Recursive SQL execution                        8,958.04    7.79   97.49
       Wait     db file sequential read                        1,485.98    1.29   98.79
       Wait     db file scattered read                           856.65    0.75   99.53
23:00 Wait Backup: sbtwrite2 2,801.73 76.03 76.03
Wait Backup: sbtbackup 352.22 9.56 85.59
       Wait     db file scattered read                           140.16    3.80   89.39
Wait Backup: sbtclose2 93.25 2.53 91.92
       Wait     db file sequential read                           78.52    2.13   94.05
       Service  SQL execution                                     65.77    1.78   95.84

What do we see here ? The graph told us there might be a possible problem between 13-15 and 21-23.

First; focus on 13-15

13:00 Service  SQL execution                                  6,942.33   83.97   83.97
       Wait     db file sequential read                          375.15    4.54   88.51
       Wait     db file scattered read                           207.74    2.51   91.02
       Service  Recursive SQL execution                          200.86    2.43   93.45
Wait enq: TX - row lock contention 162.83 1.97 95.42
       Wait     log file parallel write                           94.08    1.14   96.56
 14:00 Service  Recursive SQL execution                      241,820.91   99.35   99.35
       Wait     db file scattered read                           574.84    0.24   99.59
       Wait     db file sequential read                          515.74    0.21   99.80
       Wait     log file parallel write                           97.09    0.04   99.84
Wait enq: TX - row lock contention 87.19 0.04 99.88
       Service  Parsing SQL                                       81.25    0.03   99.91

Second ; focus on 21-23

21:00 Service  SQL execution                                  2,875.04   45.72   45.72
Wait Datapump dump file I/O 1,504.28 23.92 69.64
       Wait     db file sequential read                          873.68   13.89   83.53
       Wait     db file scattered read                           417.53    6.64   90.17
       Wait     log file parallel write                          130.15    2.07   92.24
       Wait     db file parallel write                           106.51    1.69   93.93
 22:00 Wait     row cache lock                                48,097.61   41.83   41.83
Wait log file switch (checkpoint incompl 41,594.75 36.17 78.00
       Wait     buffer busy waits                             13,454.89   11.70   89.70
       Service  Recursive SQL execution                        8,958.04    7.79   97.49
       Wait     db file sequential read                        1,485.98    1.29   98.79
       Wait     db file scattered read                           856.65    0.75   99.53

Nothing common but tells lots of things. First one spent %99 of response time on Recursive SQL Execution which means no wait but on CPU doing something which we don't know yet.
Second one spent %41 of its time on row cache lock and %36 on log file switch. Now we are ready for the statspack to see what can we find but
before that I want to show 2 more scripts whose output is can be obtained by Perfsheet as well (like DB time)

First script is Tim Gormans sp_evtrends scripts which shows trends of events. There are two problems; sometimes unnecessarily high number of * character  shown on the screen but comparing it with the seconds waited it can easily be ignored. Another thing is that, if you focus on 1 day sometimes * character graph might be misleading as it is with excel graphs so it is better to focus on 5-10 days

Scripts asks for db_name, begin date, end date and statistics to analyze

SQL> @sp_evtrends
Please enter the ORACLE_SID value: PROD1
Please enter the report begin date: sysdate-8
Please enter the report end date:sysdate-6

Some useful database statistics to search upon:

Name
------------------------------------------------------------
ARCH random i/o
ARCH sequential i/o
ARCH wait for archivelog lock
	....
	....
	....

Daily trends for "row cache lock"...

What statistic do you want to analyze? row cache lock

Day    Statistic Name                         Secs Waited Percentage of total over all days
------ ------------------------------ ------------------- ------------------------------------------------------------
19-JAN row cache lock                                0.00

20-JAN row cache lock                          480,981.73 ************************************************************

Daily/hourly trends for "row cache lock"...

Day    Hour   Statistic Name                         Secs Waited Percentage of total over all hours for each day
------ ------ ------------------------------ ------------------- ------------------------------------------------------------
19-JAN 00:00  row cache lock                                0.00
       01:00  row cache lock                                0.00
       02:00  row cache lock                                0.00
       03:00  row cache lock                                0.00
       04:00  row cache lock                                0.00
       05:00  row cache lock                                0.00
       06:00  row cache lock                                0.00
       07:00  row cache lock                                0.00
       08:00  row cache lock                                0.00
       09:00  row cache lock                                0.00
       10:00  row cache lock                                0.00
       11:00  row cache lock                                0.00
       12:00  row cache lock                                0.00
       13:00  row cache lock                                0.00
       14:00  row cache lock                                0.00
       15:00  row cache lock                                0.00
       16:00  row cache lock                                0.00
       17:00  row cache lock                                0.00
       18:00  row cache lock                                0.00
       19:00  row cache lock                                0.00
       20:00  row cache lock                                0.00
       21:00  row cache lock                                0.00
       22:00  row cache lock                                0.00 ************************************************************
       23:00  row cache lock                                0.00

20-JAN 00:00  row cache lock                                0.00
       01:00  row cache lock                                0.00
       02:00  row cache lock                                0.00
       03:00  row cache lock                                0.00
       04:00  row cache lock                                0.00
       05:00  row cache lock                                0.00
       06:00  row cache lock                                0.00
       07:00  row cache lock                                0.00
       08:00  row cache lock                                0.00
       09:00  row cache lock                                0.00
       10:00  row cache lock                                0.00
       11:00  row cache lock                                0.00
       12:00  row cache lock                                0.00
       13:00  row cache lock                                0.00
       14:00  row cache lock                                0.00
       15:00  row cache lock                                0.00
       16:00  row cache lock                                5.65
       17:00  row cache lock                                0.00
       18:00  row cache lock                                0.00
       19:00  row cache lock                                0.00
       20:00  row cache lock                                0.00
       21:00  row cache lock                                0.00
       22:00  row cache lock                          480,976.09 ************************************************************
       23:00  row cache lock                                0.00

As you see it shows the same number of * for 19th which is wrong when you check with time spent so abnormality can be spotted by both time spent and number of *.

Second script is modified version for sys_time_model statistics called sp_sys_time_trends.

SQL> @sp_sys_time_trends
</a>Please enter the ORACLE_SID value: PROD1
Please enter the report begin date: sysdate-8
Please enter the report end date:sysdate-6

Some useful database statistics to search upon:

Name
------------------------------------------------------------
DB CPU
DB time
Java execution elapsed time
PL/SQL compilation elapsed time
PL/SQL execution elapsed time
RMAN cpu time (backup/restore)
background cpu time
background elapsed time
connection management call elapsed time
failed parse (out of shared memory) elapsed time
failed parse elapsed time
hard parse (bind mismatch) elapsed time
hard parse (sharing criteria) elapsed time
hard parse elapsed time
inbound PL/SQL rpc elapsed time
parse time elapsed
repeated bind elapsed time
sequence load elapsed time
sql execute elapsed time
What statistic do you want to analyze? DB time

Daily trends for "DB time"...

Day    Statistic Name                      VALUE Percentage of total over all days
------ ------------------------------ ---------- ------------------------------------------------------------
19-JAN DB time                        1408469.51 **************

20-JAN DB time                        4797565.14 **********************************************

Daily/hourly trends for "DB time"...

Day    Hour   Statistic Name                      VALUE Percentage of total over all hours for each day
------ ------ ------------------------------ ---------- ------------------------------------------------------------
19-JAN 00:00  DB time                        18267.2492 *
01:00  DB time                        57907.3551 **
02:00  DB time                        64889.8266 ***
03:00  DB time                        37988.8748 **
04:00  DB time                        39093.3842 **
05:00  DB time                        39267.5194 **
06:00  DB time                        56103.5928 **
07:00  DB time                        43312.1147 **
08:00  DB time                        50542.4821 **
09:00  DB time                        62883.2236 ***
10:00  DB time                        89721.9315 ****
11:00  DB time                        86621.3671 ****
12:00  DB time                        78163.5806 ***
13:00  DB time                        64324.9537 ***
14:00  DB time                        78970.2241 ***
15:00  DB time                        78037.7742 ***
16:00  DB time                        78215.1336 ***
17:00  DB time                          69875.99 ***
18:00  DB time                        61075.9801 ***
19:00  DB time                        47658.5181 **
20:00  DB time                         41372.216 **
21:00  DB time                        57569.6918 **
22:00  DB time                        65593.3167 ***
23:00  DB time                        41013.2088 **

20-JAN 00:00  DB time                        39789.4329
01:00  DB time                        76274.4764 *
02:00  DB time                        39564.2615
03:00  DB time                        37327.7534
04:00  DB time                        38174.4725
05:00  DB time                         38335.113
06:00  DB time                        55688.7588 *
07:00  DB time                        39039.2476
08:00  DB time                        45155.1028 *
09:00  DB time                        62292.3892 *
10:00  DB time                        73688.6751 *
11:00  DB time                        86222.6032 *
12:00  DB time                        79400.9817 *
13:00  DB time                        81021.4711 *
14:00  DB time                        2498694.98 *******************************
15:00  DB time                        57824.5047 *
16:00  DB time                        50252.1229 *
17:00  DB time                        39474.4176
18:00  DB time                         25728.465
19:00  DB time                         40950.006 *
20:00  DB time                        45258.7127 *
21:00  DB time                        66186.8379 *
22:00  DB time                        1177471.46 ***************
23:00  DB time                        3748.88236

As you can spot like the excell graph we have DB time increased for 14-15 and 22-23 period

Now It is time to get the statspack report and dig further.
First I want to focus on the row cache problem if I have space I will talk about the recursive cursor problem maybe as second part.
We normally collect 30 minutes but available snaps are like below.

                          Snap
Snap Id   Snap Started    Level
8135 20 Jan 2010 21:05     7
8136 20 Jan 2010 21:35     7
8137 20 Jan 2010 22:28     7
8138 20 Jan 2010 22:58     7

If you look carefully you will see that snap at 22:00 is missing. There must be something wrong on DB which prevented statspack
from gathering but lets assume that you could not spot it for now as I could't which was a deadly mistake caused increase in trouble shooting time
I generated statspack report between 21:35 and 22:28

Here is the statspack summary (full statspack is at here. SQLs are a bit edited for keeping privacy)

STATSPACK report for

Database    DB Id    Instance     Inst Num Startup Time    Release     RAC
~~~~~~~~ ----------- ------------ -------- --------------- ----------- ---
1814528001 PROD1 1 30-Dec-09 01:59 10.2.0.4.0 NO

Host Name: MACHINE1 Num CPUs: 8 Phys Memory (MB): 32,763
~~~~

Snapshot       Snap Id     Snap Time      Sessions Curs/Sess Comment
~~~~~~~~    ---------- ------------------ -------- --------- -------------------
Begin Snap:       8136 20-Jan-10 21:35:59      128       7.2
  End Snap:       8137 20-Jan-10 22:28:50      189       5.4
   Elapsed:               52.85 (mins)

Cache Sizes                       Begin        End
~~~~~~~~~~~                  ---------- ----------
               Buffer Cache:     4,128M     4,032M  Std Block Size:         8K
           Shared Pool Size:     6,032M     6,128M      Log Buffer:    13,913K

Load Profile                            Per Second       Per Transaction
~~~~~~~~~~~~                       ---------------       ---------------
                  Redo size:            126,609.46            124,954.43
              Logical reads:             10,300.31             10,165.66
              Block changes:                904.39                892.57
             Physical reads:                375.25                370.34
            Physical writes:                 17.54                 17.31
                 User calls:                 15.88                 15.67
Parses: 9.22 9.09
Hard parses: 1.08 1.06
                      Sorts:                 60.58                 59.79
                     Logons:                  0.13                  0.13
                   Executes:                327.42                323.14
               Transactions:                  1.01

  % Blocks changed per Read:    8.78    Recursive Call %:    98.49
 Rollback per transaction %:    1.28       Rows per Sort:    28.80

Instance Efficiency Percentages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.96 Redo NoWait %: 95.11
            Buffer  Hit   %:   99.73    In-memory Sort %:  100.00
            Library Hit   %:   99.10        Soft Parse %:   88.32
         Execute to Parse %:   97.19         Latch Hit %:   99.91
Parse CPU to Parse Elapsd %: 64.77 % Non-Parse CPU: 99.51

 Shared Pool Statistics        Begin   End
                               ------  ------
             Memory Usage %:   74.42   72.87
    % SQL with executions>1:   33.97   33.40
  % Memory for SQL w/exec>1:   56.11   55.38

Top 5 Timed Events                                                    Avg %Total
~~~~~~~~~~~~~~~~~~                                                   wait   Call
Event                                            Waits    Time (s)   (ms)   Time
----------------------------------------- ------------ ----------- ------ ------
row cache lock                                  16,524      48,098   2911   45.6
log file switch (checkpoint incomplete)         68,137      41,595    610   39.5
buffer busy waits                               13,462      13,455    999   12.8
CPU time                                                     1,472           1.4
db file sequential read                        723,098         446      1     .4
          -------------------------------------------------------------
Host CPU CPUs-8
~~~~~~~~              Load Average
Begin End User System Idle WIO WCPU
                    ------- -------   ------- ------- ------- ------- --------
                                         3.69    2.12   94.19

Instance CPU
~~~~~~~~~~~~
              % of total CPU for Instance:    3.30
              % of busy  CPU for Instance:   56.81
  %DB time waiting for CPU - Resource Mgr:

Memory Statistics                       Begin          End
~~~~~~~~~~~~~~~~~                ------------ ------------
                  Host Mem (MB):     32,762.6     32,762.6
SGA use (MB): 10,240.0 10,240.0
                   PGA use (MB):        776.7        787.4
% Host Mem used for SGA+PGA: 33.6 33.7
          -------------------------------------------------------------

Time Model System Stats DB/Inst: PROD1/prod1 Snaps: 8136-8137
-> Ordered by % of DB time desc, Statistic name

Statistic                                       Time (s) % of DB time
----------------------------------- -------------------- ------------
sql execute elapsed time 101,089.5 100.0
DB CPU                                             821.2           .8
PL/SQL execution elapsed time                      620.9           .6
parse time elapsed                                  22.4           .0
hard parse elapsed time                             21.2           .0
PL/SQL compilation elapsed time                      1.7           .0
connection management call elapsed                   0.2           .0
hard parse (sharing criteria) elaps 0.1 .0
repeated bind elapsed time                           0.0           .0
hard parse (bind mismatch) elapsed                   0.0           .0
failed parse elapsed time                            0.0           .0
sequence load elapsed time                           0.0           .0
DB time                                        101,090.9
background elapsed time                          6,382.7
background cpu time                                 16.6

Wait Events DB/Inst: PROD1/prod1 Snaps: 8380-8382
-> s - second, cs - centisecond, ms - millisecond, us - microsecond
-> %Timeouts: value of 0 indicates value was < .5%. Value of null is truly 0
-> Only events with Total Wait Time (s) >= .001 are shown
-> ordered by Total Wait Time desc, Waits desc (idle events last)

                                                                    Avg
                                                %Time Total Wait   wait    Waits
Event Waits -outs Time (s) (ms) /txn
--------------------------------- ------------ ------ ---------- ------ --------
row cache lock                          16,524     97     48,098   2911      5.1
log file switch (checkpoint incom 68,137 61 41,595 610 21.2
buffer busy waits                       13,462    100     13,455    999      4.2
..
..
..

Wait Event Histogram DB/Inst: PROD1/prod1 Snaps: 8136-8137
-> Total Waits - units: K is 1000, M is 1000000, G is 1000000000
-> % of Waits - column heading: <=1s is truly <1024ms, >1s is truly >=1024ms
-> % of Waits - value: .0 indicates value was <.05%, null is truly 0
-> Ordered by Event (idle events last)

                           Total ----------------- % of Waits ------------------
Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
row cache lock               16K   2.9                                  .0  97.1
log file switch (checkpoin 66K 21.0 2.7 12.8 2.1 .3 61.1
buffer busy waits            13K    .0                            .0 100.0

As you see row cache locks avg wait time is nearly 3 second with 97% is over 1 second which looks like it is the problem

We already spotted row cache lock but where to look next ?

Documentation says ;

row cache lock

The session is trying to get a data dictionary lock. 

Wait Time: Wait up to 60 seconds.

Parameter	Description
cache id	The CACHE# column value in the V$ROWCACHE view
mode	See "mode"
request	The pipe timer set by the user

In Statspack report we have a section called "Dictionary Cache Stats". Lets check what we have there

Dictionary Cache Stats DB/Inst: PROD1/prod1 Snaps: 8136-8137
->"Pct Misses" should be very low (< 2% in most cases)
->"Final Usage" is the number of cache entries being used in End Snapshot

Get Pct Scan Pct Mod Final
Cache Requests Miss Reqs Miss Reqs Usage
------------------------- ------------ ------ ------- ----- -------- ----------
dc_awr_control                      18    0.0       0              0          1
dc_database_links                  107    0.0       0              0          1
dc_files                            13    0.0       0              0         13
dc_global_oids                  21,571    0.1       0              2        255
dc_histogram_data               73,396    2.2       0              0      3,904
dc_histogram_defs              279,541    1.4       0              0      7,652
dc_object_grants                   102   11.8       0              0        124
dc_object_ids                1,050,311    0.1       0              5      2,879
dc_objects                      14,052    1.3       0             31      2,605
dc_profiles                         74    0.0       0              0          1
dc_rollback_segments             4,147    0.0       0             64        300
dc_segments                     50,349    0.2       0            121      2,415
dc_sequences                        10   10.0       0             10         13
dc_table_scns                        1  100.0       0              0          0
dc_tablespace_quotas               289    0.0       0              0          9
dc_tablespaces                  66,671    0.0       0              0         11
dc_usernames                    18,888    0.1       0              0         53
dc_users                        99,804    0.0       0              0         71
outstanding_alerts                  26   15.4       0              4         24
          -------------------------------------------------------------

From my old shared pool post I remember that Pct Miss is important about dictionary cache tuning.
I did the comparison with the normal times and the only thing I spot was increase in Pct miss for
dc_object_grants and dc_sequences with less get request but without specific session activity all
I can do is this much for this event. so I moved to logfile switch event to see if I can find something

Definition **** log file switch (checkpoint incomplete)
The log file switch wait indicates that the process is waiting for the log file switch to complete,
but the log file switch is not possible because the checkpoint process for that log file
has not completed. You may see this event when the redo log files are sized too small.
Wait Parameters Wait parameters are not used for log file switch (checkpoint incomplete).
Wait Time One second

This is an event we see occasionally but not with the redo size we generated for that time interval. Lets see how many log switches has been done for that day

SQL> @gen_redo

DAY            COUNT#       MIN#       MAX# DAILY_AVG_MB
---------- ---------- ---------- ---------- ------------
2010-01-26        790      60361      61150        50560
2010-01-25        722      59639      60360        46208
2010-01-24        469      59170      59638        30016
2010-01-23        291      58879      59169        18624
2010-01-22        651      58228      58878        41664
2010-01-21        394      57834      58227        25216
2010-01-20        473      57361      57833        30272

10 rows selected.

Session altered.

DAY        00    01    02    03    04    05    06    07    08    09    10    11    12    13    14    15    16    17    18    19    20    21    22    23
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
20/01/2010    0    50     0     0     0     1    56     0     1     9    17    43    37    27     8     8    14    12     9   101    57    15     7     1
21/01/2010    0    52     0     1     0     0    56     0     1     6    13    12    12    12    17    10     4    11     6   100    58    18     4     1
22/01/2010    0    51     0     0     1     0    58     0     1     3    11    27    14    18    82    25    17    60   104   100    57    18     4     0
23/01/2010    1    51     2     1     0     0    54     1     0     0     0     0     1     0     0     0     1     0     0   100    57    18     3     1
24/01/2010    0    52     1     3     3     3    67     8     7     1     0     1     0     0   136    53    16     1     0    76    20    18     1     2
25/01/2010    0    53     0     0     0     1    59     1     1   120    45    89    40     4    80    17    16    78     6    90    11     8     3     0
26/01/2010    0    53    12    39    13     0    57     0     1    40    28    63    38   142     4    99    59     5    17    82    16    19     2     1

8 rows selected.

looks like nothing specific happened number of log switches looks innocent.

What is next buffer busy looks like the third one with little effect but still can give something ?

Lets check the trends

SQL> @sp_evtrends
Please enter the ORACLE_SID value: PROD1
Please enter the report begin date: sysdate-8
Please enter the report end date:sysdate

Daily trends for "buffer busy waits"...

Day    Statistic Name                         Secs Waited Percentage of total over all days
------ ------------------------------ ------------------- ------------------------------------------------------------
19-JAN buffer busy waits                           127.95

20-JAN buffer busy waits                       134,742.54 **********************************************************

21-JAN buffer busy waits                           205.92

22-JAN buffer busy waits                           553.43

23-JAN buffer busy waits                             0.31

24-JAN buffer busy waits                            32.49

25-JAN buffer busy waits                         1,449.90 *

26-JAN buffer busy waits                         1,131.49

Daily/hourly trends for "DB time"... (truncated)

Day    Hour   Statistic Name                      VALUE Percentage of total over all hours for each day
------ ------ ------------------------------ ---------- ------------------------------------------------------------

20-JAN 00:00  buffer busy waits                             0.00
       01:00  buffer busy waits                             0.00
       02:00  buffer busy waits                             0.00
       03:00  buffer busy waits                             0.00
       04:00  buffer busy waits                             0.00
       05:00  buffer busy waits                             0.00
       06:00  buffer busy waits                             0.00
       07:00  buffer busy waits                             0.00
       08:00  buffer busy waits                             0.00
       09:00  buffer busy waits                             4.86
       10:00  buffer busy waits                             6.03
       11:00  buffer busy waits                            43.35
       12:00  buffer busy waits                            24.02
       13:00  buffer busy waits                            20.27
       14:00  buffer busy waits                            16.45
       15:00  buffer busy waits                            24.45
       16:00  buffer busy waits                            27.31
       17:00  buffer busy waits                            23.86
       18:00  buffer busy waits                             2.78
       19:00  buffer busy waits                             0.24
       20:00  buffer busy waits                             0.00
       21:00  buffer busy waits                             0.00
       22:00  buffer busy waits                       134,548.91 ************************************************************
       23:00  buffer busy waits                             0.00

Looks like normally our DB doesn't spend time on this event which is good.

We have a nice section in statspack report for buffer busy wait events


Segments by Buffer Busy Waits DB/Inst: PROD1/prod1 Snaps: 8136-8137
-> End Segment Buffer Busy Waits Threshold:       100

                                                                    Buffer
Subobject Obj. Busy Pct
Owner Tablespace Object Name Name Type Waits Total
---------- ---------- -------------------- ------------ ----- ------------ -----
APP_USER USERS ULG TABLE 9,560 100.0
APP_USER USERS PK_ULG INDEX 1 .0
          -------------------------------------------------------------

All of the waits happened on the same object. Before checking I checked trend with another Tim Gorman script sp_buffer_busy (again edited for time interval and output only segments with buffer_busy waits >0)

I won't put the output but output says this table showed up only 3 more times with wait total less then 30, in 10 days period. If you check the actual statspack for sqls
ULG table is not shown on any of them which is a bit strange again.

Maybe I can find something else, if I get half an hour period report for

		 	Snap
Snap Id   Snap Started    Level
8137 20 Jan 2010 22:28     7
8138 20 Jan 2010 22:58     7

From top 5 wait events

Top 5 Timed Events                                                    Avg %Total
~~~~~~~~~~~~~~~~~~                                                   wait   Call
Event                                            Waits    Time (s)   (ms)   Time
----------------------------------------- ------------ ----------- ------ ------
db file sequential read                        613,887       1,040      2   42.0
db file scattered read                         266,197         853      3   34.5
CPU time                                                       418          16.9
db file parallel write                          45,244          86      2    3.5
log file parallel write                          4,449          26      6    1.1
          -------------------------------------------------------------

Our problem disappeared from top 5 list and DB time=16,656.2 which is very small comparing to previous value 101,090.9.
problem is gone but what has actually happened ?

I am pretty sure that there wasn't any change on that day. backup ran successfully after 23:00 and export finished before 9:30 so what can this be ?

I know the answer but whoever comes up with a good idea please be my guest to comment. If you are good enough, I promise to buy beer.

Answer will be here as an another post on Friday. If I have time I will also show what happened for the other DB time peak on same day.

Note:
With this post I did not re-invent the wheel but try to show you the available free tools you can use and modify for your own fast troubleshooting. As Tanel says right tool right time. These guys share these tools to help you , my advice is use it if you need the.
My script library consist of many scripts from Tanel Poder, Riyaj Shamsudeen ,Tim Gorman, Kerry Osborne, Dion Cho, Craig Shallahamer and from many others plus my own.
Thanks you very much to all guys who share their tools with us.

How much time I spent for diagnosis? Everything on this page took 10 minutes total to run. Rest is your experience and skills to search on metalink and internet.
for me coming to last point took only 1.5 hour.

List of scripts I used;

About these ads

15 Comments »

  1. Hi!

    Cool. What amaze me all the time is your approach and wide knowledge. You looks to me like “Google for Oracle”-only the finest collections.
    What to say then your blog should be part of any Oracle seminar additional literature!
    Cheers,
    Damir Vadas
    http://damir-vadas.blogspot.com

    Comment by Damir Vadas — January 28, 2010 @ 12:09 am

  2. Thank you for your kind words Damir. I am not an Oracle scientist and I don’t really going after internals, but at least i know what to use and how to use research on internals from other people.

    Comment by coskan — January 28, 2010 @ 10:16 am

  3. [...] } In part 1a I was trying to find a problem to troubleshoot and I spotted 2 possible problems and drilled down [...]

    Pingback by Working with statspack-part-1b-Solution/Answer/another Question « Coskan’s Approach to Oracle — January 29, 2010 @ 7:07 pm

  4. Hello, I follow your blog has some time and I believe that it contributes greatly to professional with performance issue in Oracle.

    I’ve been developing for some time a tool named Mandela to diagnose performance problems. It need not be installed or create any object in the database analyzed. It only needs read access to the tables of STATSPACK or AWR.

    The site is still under development, but if you want can download the Mandela and follow the launch of the latest versions.

    What makes Mandela nice? Well, he can show into timeline all statistical system, redo log files switches, statistics I/O (tablespace, and datafile filesystem).

    But what it does well done is to decompose the wait events (two clicks, three at most) and allow drilldown for the top sql by metrics (buffer gets, diskreads, cpu time, elapsed time, etc.) and now for the wait event. That’s right … from a chart of wait events you right click mouse and see the statements SQL that waited for “db file sequential read” or “CPU”, for example.

    If you’re interested (and available) to see how Mandela can help you gain productivity, access http://www.trevis.com.br/projetos/mandela/download

    If you want to chat or ask questions about the Mandela my contacts is as follows:

    [] s,
    Rafael Trevisan
    http://www.trevis.com.br
    EMail / MSN / GTalk: rafael at trevis.com.br

    Comment by Rafael Trevisan — February 2, 2010 @ 1:15 am

    • Hi Rafael,
      Thanks for stopping by.

      I see you can also speak English so it can be very nice if you can create en English version of your web page. I am downloading the product right now. When i have time to use it I will definitely give feedback and discuss with you about the product.

      Thank you again for contributing and your efforts

      Comment by coskan — February 2, 2010 @ 1:27 am

      • Hello, the site is still under development (including the English version), and the tool is “outing of the oven”.
        :-)

        Comment by Rafael Trevisan — February 2, 2010 @ 3:17 am

  5. I work as a Performance Analyst adressing solutions to performance problems or doing Top-down analysis searching for pro-active recommendations.
    I spent like 80% of my working-day at Mandela. It made me able to perform 2 complex or 4 simple analysis per day.
    I don’t spend more time at sqlplus and at excel generating endless graphs just to see what’s the problem with the DB. Now I can just click 3 or 4 times to see how my Database performed based on the AWR or Statspack repository.
    I recommend for juniors DBAs who doesn’t know the performance metrics and to seniors DBAs who are tired of spend so much time at excel.
    The best part is that it can even build your powerpoint presentation from your analysis.
    Just try it and you will definitely agree with me.

    Comment by Vinicius Hoffmann — February 2, 2010 @ 1:38 am

    • This is very interesting that after all the books and all my researches on the internet this is the first time I have heard about this tool. I think Rafael needs to work on marketing bit of the tool a bit more (no results on the web http://tiny.cc/6lhSW ). I think it is known in Brazil but not worldwide. If it does what it shows on the screenshots and what you say it is very promising.

      The only problem for me is that I am using linux at home so I can try it at work tomorrow. I tried to use wine but no chance :( As I said I will try and if I like it as much as you I will definitely share my experience. thank you very much for making me aware of this tool.

      Comment by coskan — February 2, 2010 @ 1:47 am

  6. the tool is pretty new, but it’s awesome.
    It’s user interface is fully in English so you can try it with no worries
    The setup is simple and you don’t need a tnsnames working as the software uses EZ direct connect (Ex: user/password@hostname:port)
    If you have any doubts about its usability try the
    Forum or Rafael e-mail or my email (hoffmann1337@hotmail.com)

    Comment by Vinicius Hoffmann — February 2, 2010 @ 3:07 am

  7. Terrific tool. It does make the performance analisys faster and easier. The appearance is way self-intuitive, so it work as a guide througth the performance tracking.

    I strongly recommend this tool for everybody who works with performance troubleshooting.

    regards

    Comment by Cristiano Reame — February 2, 2010 @ 12:23 pm

  8. Can anyone, maybe Rafael, explain what the prerequisites are for the tool, I get a “NET access violation 009B8701″ when trying to connect to a database.

    Thanks,
    Marcus

    Comment by Marcus Mönnig — February 3, 2010 @ 1:44 pm

    • Hello Marcus, I’m sorry for that information is not yet on the site, soon
      site are much more full of interesting information.

      In fact there is no special requirements for the tool. If
      you can connect to Oracle via SQLPlus or other tool such
      SQLDeveloper, Toad, SQLNavigator should also be able to Mandela.

      Can you tell me which OS and database (including version) is
      trying to connect?

      If you prefer, add me on MSN or GTalk I try to help you more quickly.

      rafael@trevis.com.br (both)

      []s,
      Rafael Trevisan.

      Comment by Rafael Trevisan — February 3, 2010 @ 1:50 pm

  9. [...] Gundogar was also in a deductive frame of mind. Here is his Working with statspack-part-1a-Diagnosis, featuring both a challenge and purty pictures [...]

    Pingback by Log Buffer #176: a Carnival of the Vanities for DBAs | The Pythian Blog — February 6, 2010 @ 6:56 am

  10. performance tuning…

    [...]Working with statspack-part-1a-Diagnosis « Coskan’s Approach to Oracle[...]…

    Trackback by performance tuning — December 10, 2011 @ 12:30 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Silver is the New Black Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 199 other followers

%d bloggers like this: