Part 1b—->
Note: Post is a bit long and might be boring, if you are already good with statspack, but there is a question at the end if you would like to answer
Yesterday, I decided to look for something to troubleshoot without waiting for any user complaints so I can write something useful here 🙂
Primary reason for this practice was to get used to interpret statspack reports to find possible bottlenecks
secondary reason was because I was bored about being underutilized.
I always feel comfortable troubleshooting session wide problems, non-historic performance problems or historic problems with Diagnostics Pack,
but I am not that experienced on statspack.
In my current environment I don’t have Diagnostic Pack option so I have to make good friendship with statspack and
the best way to be a good friend is to spend more time with your new friend as I am going to explain how.
Characteristics of the DB I will work on:
daily load is between 08:00 and 19:00
export at 21:00 running for roughly 1 hour
DB backup start 23:00
There are couple of ways to spot a possible bottlenecks without getting individual reports.
—First one is from Tanel Poder and his famous Perfsheet. This tool is already configured and ready to be used with statspack.
The query he uses to generate the report is the same idea behind the statspack delta views of Tim Gorman. Lets start
first screen is login screen (you need to activate the macros).

Inputs for statspack report is
Data Source- TNS Entry of the DB
Username/Password- username with statspack schema objects read privilege
Which Instance-instance name (lower case or modify the related sql in Queries sheet with upper function )
Which Query-Statspack CPU+Wait+Stats Hourly
From Date-start date
To Date-end date
then press show me. You will get warnings during the graph generation saying PivotTable report will not fit on the sheet … Just Press OK till the warnings end
Your graph might be empty on Office 2007. Just go to the Report Filter and select WAIT from EVENT_TYPES to see any abnormality on wait events.
First Graph is WAIT graph like below. As you see, I was able to spot a possible problem on 20/01/2010 at first try.

Lets make it more visible by adding top 10 filter to Legend Fields Section.

Another option is focus on the date where the abnormality is happened. Lets select only 20th. FYI top 10 filter is still on

Without the top 10 filter. Graph is slightly different but still giving what we need.
Lets Focus on only time interval between 18 and 23

As you see there is an abnormal increase on row cache lock logfile switch and buffer busy
Lets do double-check with system stats. By selecting all hours and available dates and events your graph will be a bit misleading because of very high metrics like below
What can we do ? DB time is something we want to decrease when troubleshooting, so lets focus on DB Time by filtering Legend Fields on DB Time

We have 2 peaks, which is a bit confusing for the initial assumption of problem timing. Lets focus on 20th again

There are two peaks between 13-15 and 21-23. We have an interesting situation here in terms of DB time. From wait event perspective abnormality is between 20-22 but DB time perspective we have 1 more time slot between 13-15 .
At this point I quit Perfsheet. I spotted what I needed for initial problem chase now I can go to other options before, gathering the report. (these options are way more easier than statspack report generation thats why I did not directly go to the statspack)
Second option is from Tim Gorman himself. sp_systime which I find very useful in terms of overall response time analysis. ( i needed to change the script to focus between the dates instead of days before sysdate option)
Here is the ouput
First Part is overall response time for that day.
SQL> @sp_systime2
Please enter the ORACLE_SID value: PROD1
Please enter the report begin date: sysdate-8
Please enter the report end date:sysdate-5
Total Cum
Service Seconds % of % of
Day or Wait Name Spent Total Total
------------ -------- ----------------------------------- ------------------- ------- -------
19-JAN Service SQL execution 115,326.24 78.78 78.78
Wait db file sequential read 7,697.33 5.26 84.04
Wait db file scattered read 6,213.01 4.24 88.29
Service Recursive SQL execution 4,638.75 3.17 91.46
Wait Backup: sbtwrite2 2,671.85 1.83 93.28
Wait Backup: sbtbackup 2,081.53 1.42 94.70
Wait log file parallel write 1,423.58 0.97 95.67
Service Parsing SQL 1,109.49 0.76 96.43
Wait db file parallel write 913.09 0.62 97.06
Wait log file sync 761.78 0.52 97.58
Wait Datapump dump file I/O 631.13 0.43 98.01
Wait Log archive I/O 549.08 0.38 98.38
20-JAN Service Recursive SQL execution 255,597.67 65.42 65.42
Wait row cache lock 48,098.17 12.31 77.74
Wait log file switch (checkpoint incompl 41,613.30 10.65 88.39
Wait buffer busy waits 13,474.25 3.45 91.84
Wait db file sequential read 9,788.51 2.51 94.34
Wait db file scattered read 6,831.08 1.75 96.09
Wait Backup: sbtwrite2 3,512.51 0.90 96.99
Wait log file parallel write 1,897.38 0.49 97.48
Wait Datapump dump file I/O 1,674.20 0.43 97.90
Wait db file parallel write 1,389.77 0.36 98.26
Wait log file sync 1,073.14 0.27 98.53
Service Parsing SQL 964.58 0.25 98.78
21-JAN Service SQL execution 33,045.26 44.91 44.91
Service Recursive SQL execution 12,785.72 17.38 62.28
Wait db file sequential read 8,296.99 11.28 73.56
Wait db file scattered read 6,731.96 9.15 82.71
Wait Backup: sbtwrite2 3,340.23 4.54 87.25
Wait log file parallel write 1,856.99 2.52 89.77
Wait db file parallel write 1,359.19 1.85 91.62
Wait log file sync 1,131.31 1.54 93.16
Service Parsing SQL 1,011.46 1.37 94.53
Wait enq: TX - row lock contention 861.24 1.17 95.70
Wait Log archive I/O 686.01 0.93 96.63
Wait Backup: sbtbackup 402.52 0.55 97.18
If you check the top 5 you will easily spot the row cache lock, log file switch and buffer busy waits are different from the other days
Second part of the report is hourly based response time for the days. I truncated the output for 8 hours but the full is here ***
12:00 Service SQL execution 6,268.92 76.66 76.66
Wait db file sequential read 634.26 7.76 84.42
Wait db file scattered read 344.02 4.21 88.63
Service Recursive SQL execution 230.23 2.82 91.44
Wait log file parallel write 147.58 1.80 93.25
Wait log file sync 129.94 1.59 94.84
13:00 Service SQL execution 6,942.33 83.97 83.97
Wait db file sequential read 375.15 4.54 88.51
Wait db file scattered read 207.74 2.51 91.02
Service Recursive SQL execution 200.86 2.43 93.45
Wait enq: TX - row lock contention 162.83 1.97 95.42
Wait log file parallel write 94.08 1.14 96.56
14:00 Service Recursive SQL execution 241,820.91 99.35 99.35
Wait db file scattered read 574.84 0.24 99.59
Wait db file sequential read 515.74 0.21 99.80
Wait log file parallel write 97.09 0.04 99.84
Wait enq: TX - row lock contention 87.19 0.04 99.88
Service Parsing SQL 81.25 0.03 99.91
15:00 Service SQL execution 3,620.32 62.09 62.09
Service Recursive SQL execution 1,210.42 20.76 82.86
Wait db file sequential read 328.60 5.64 88.49
Wait db file scattered read 294.73 5.06 93.55
Service Parsing SQL 73.42 1.26 94.81
Wait enq: TX - row lock contention 72.03 1.24 96.04
16:00 Service SQL execution 3,667.78 70.86 70.86
Wait db file sequential read 429.87 8.30 79.16
Service Recursive SQL execution 371.24 7.17 86.33
Wait db file scattered read 215.28 4.16 90.49
Wait enq: TX - row lock contention 144.67 2.79 93.29
Wait log file parallel write 100.81 1.95 95.24
17:00 Service SQL execution 2,904.54 70.63 70.63
Wait db file sequential read 397.42 9.66 80.29
Service Recursive SQL execution 202.13 4.92 85.21
Wait db file scattered read 138.29 3.36 88.57
Wait log file sync 94.87 2.31 90.88
Wait log file parallel write 90.94 2.21 93.09
18:00 Service SQL execution 1,791.86 65.17 65.17
Wait db file sequential read 302.02 10.98 76.15
Service Recursive SQL execution 176.04 6.40 82.56
Wait db file scattered read 151.83 5.52 88.08
Wait log file parallel write 80.44 2.93 91.00
Wait log file sync 63.46 2.31 93.31
19:00 Service SQL execution 2,261.88 44.86 44.86
Wait db file sequential read 758.41 15.04 59.91
Wait db file scattered read 540.35 10.72 70.62
Wait db file parallel write 272.36 5.40 76.03
Wait log file parallel write 269.47 5.34 81.37
Wait log file sequential read 154.58 3.07 84.44
20:00 Service SQL execution 3,517.30 73.29 73.29
Wait db file sequential read 467.34 9.74 83.03
Wait db file scattered read 251.47 5.24 88.27
Wait log file parallel write 161.71 3.37 91.64
Wait db file parallel write 92.73 1.93 93.57
Wait Log archive I/O 90.24 1.88 95.45
21:00 Service SQL execution 2,875.04 45.72 45.72
Wait Datapump dump file I/O 1,504.28 23.92 69.64
Wait db file sequential read 873.68 13.89 83.53
Wait db file scattered read 417.53 6.64 90.17
Wait log file parallel write 130.15 2.07 92.24
Wait db file parallel write 106.51 1.69 93.93
22:00 Wait row cache lock 48,097.61 41.83 41.83
Wait log file switch (checkpoint incompl 41,594.75 36.17 78.00
Wait buffer busy waits 13,454.89 11.70 89.70
Service Recursive SQL execution 8,958.04 7.79 97.49
Wait db file sequential read 1,485.98 1.29 98.79
Wait db file scattered read 856.65 0.75 99.53
23:00 Wait Backup: sbtwrite2 2,801.73 76.03 76.03
Wait Backup: sbtbackup 352.22 9.56 85.59
Wait db file scattered read 140.16 3.80 89.39
Wait Backup: sbtclose2 93.25 2.53 91.92
Wait db file sequential read 78.52 2.13 94.05
Service SQL execution 65.77 1.78 95.84
What do we see here ? The graph told us there might be a possible problem between 13-15 and 21-23.
First; focus on 13-15
13:00 Service SQL execution 6,942.33 83.97 83.97
Wait db file sequential read 375.15 4.54 88.51
Wait db file scattered read 207.74 2.51 91.02
Service Recursive SQL execution 200.86 2.43 93.45
Wait enq: TX - row lock contention 162.83 1.97 95.42
Wait log file parallel write 94.08 1.14 96.56
14:00 Service Recursive SQL execution 241,820.91 99.35 99.35
Wait db file scattered read 574.84 0.24 99.59
Wait db file sequential read 515.74 0.21 99.80
Wait log file parallel write 97.09 0.04 99.84
Wait enq: TX - row lock contention 87.19 0.04 99.88
Service Parsing SQL 81.25 0.03 99.91
Second ; focus on 21-23
21:00 Service SQL execution 2,875.04 45.72 45.72
Wait Datapump dump file I/O 1,504.28 23.92 69.64
Wait db file sequential read 873.68 13.89 83.53
Wait db file scattered read 417.53 6.64 90.17
Wait log file parallel write 130.15 2.07 92.24
Wait db file parallel write 106.51 1.69 93.93
22:00 Wait row cache lock 48,097.61 41.83 41.83
Wait log file switch (checkpoint incompl 41,594.75 36.17 78.00
Wait buffer busy waits 13,454.89 11.70 89.70
Service Recursive SQL execution 8,958.04 7.79 97.49
Wait db file sequential read 1,485.98 1.29 98.79
Wait db file scattered read 856.65 0.75 99.53
Nothing common but tells lots of things. First one spent %99 of response time on Recursive SQL Execution which means no wait but on CPU doing something which we don’t know yet.
Second one spent %41 of its time on row cache lock and %36 on log file switch. Now we are ready for the statspack to see what can we find but
before that I want to show 2 more scripts whose output is can be obtained by Perfsheet as well (like DB time)
First script is Tim Gormans sp_evtrends scripts which shows trends of events. There are two problems; sometimes unnecessarily high number of * character shown on the screen but comparing it with the seconds waited it can easily be ignored. Another thing is that, if you focus on 1 day sometimes * character graph might be misleading as it is with excel graphs so it is better to focus on 5-10 days
Scripts asks for db_name, begin date, end date and statistics to analyze
SQL> @sp_evtrends
Please enter the ORACLE_SID value: PROD1
Please enter the report begin date: sysdate-8
Please enter the report end date:sysdate-6
Some useful database statistics to search upon:
Name
------------------------------------------------------------
ARCH random i/o
ARCH sequential i/o
ARCH wait for archivelog lock
....
....
....
Daily trends for "row cache lock"...
What statistic do you want to analyze? row cache lock
Day Statistic Name Secs Waited Percentage of total over all days
------ ------------------------------ ------------------- ------------------------------------------------------------
19-JAN row cache lock 0.00
20-JAN row cache lock 480,981.73 ************************************************************
Daily/hourly trends for "row cache lock"...
Day Hour Statistic Name Secs Waited Percentage of total over all hours for each day
------ ------ ------------------------------ ------------------- ------------------------------------------------------------
19-JAN 00:00 row cache lock 0.00
01:00 row cache lock 0.00
02:00 row cache lock 0.00
03:00 row cache lock 0.00
04:00 row cache lock 0.00
05:00 row cache lock 0.00
06:00 row cache lock 0.00
07:00 row cache lock 0.00
08:00 row cache lock 0.00
09:00 row cache lock 0.00
10:00 row cache lock 0.00
11:00 row cache lock 0.00
12:00 row cache lock 0.00
13:00 row cache lock 0.00
14:00 row cache lock 0.00
15:00 row cache lock 0.00
16:00 row cache lock 0.00
17:00 row cache lock 0.00
18:00 row cache lock 0.00
19:00 row cache lock 0.00
20:00 row cache lock 0.00
21:00 row cache lock 0.00
22:00 row cache lock 0.00 ************************************************************
23:00 row cache lock 0.00
20-JAN 00:00 row cache lock 0.00
01:00 row cache lock 0.00
02:00 row cache lock 0.00
03:00 row cache lock 0.00
04:00 row cache lock 0.00
05:00 row cache lock 0.00
06:00 row cache lock 0.00
07:00 row cache lock 0.00
08:00 row cache lock 0.00
09:00 row cache lock 0.00
10:00 row cache lock 0.00
11:00 row cache lock 0.00
12:00 row cache lock 0.00
13:00 row cache lock 0.00
14:00 row cache lock 0.00
15:00 row cache lock 0.00
16:00 row cache lock 5.65
17:00 row cache lock 0.00
18:00 row cache lock 0.00
19:00 row cache lock 0.00
20:00 row cache lock 0.00
21:00 row cache lock 0.00
22:00 row cache lock 480,976.09 ************************************************************
23:00 row cache lock 0.00
As you see it shows the same number of * for 19th which is wrong when you check with time spent so abnormality can be spotted by both time spent and number of *.
Second script is modified version for sys_time_model statistics called sp_sys_time_trends.
SQL> @sp_sys_time_trends
</a>Please enter the ORACLE_SID value: PROD1
Please enter the report begin date: sysdate-8
Please enter the report end date:sysdate-6
Some useful database statistics to search upon:
Name
------------------------------------------------------------
DB CPU
DB time
Java execution elapsed time
PL/SQL compilation elapsed time
PL/SQL execution elapsed time
RMAN cpu time (backup/restore)
background cpu time
background elapsed time
connection management call elapsed time
failed parse (out of shared memory) elapsed time
failed parse elapsed time
hard parse (bind mismatch) elapsed time
hard parse (sharing criteria) elapsed time
hard parse elapsed time
inbound PL/SQL rpc elapsed time
parse time elapsed
repeated bind elapsed time
sequence load elapsed time
sql execute elapsed time
What statistic do you want to analyze? DB time
Daily trends for "DB time"...
Day Statistic Name VALUE Percentage of total over all days
------ ------------------------------ ---------- ------------------------------------------------------------
19-JAN DB time 1408469.51 **************
20-JAN DB time 4797565.14 **********************************************
Daily/hourly trends for "DB time"...
Day Hour Statistic Name VALUE Percentage of total over all hours for each day
------ ------ ------------------------------ ---------- ------------------------------------------------------------
19-JAN 00:00 DB time 18267.2492 *
01:00 DB time 57907.3551 **
02:00 DB time 64889.8266 ***
03:00 DB time 37988.8748 **
04:00 DB time 39093.3842 **
05:00 DB time 39267.5194 **
06:00 DB time 56103.5928 **
07:00 DB time 43312.1147 **
08:00 DB time 50542.4821 **
09:00 DB time 62883.2236 ***
10:00 DB time 89721.9315 ****
11:00 DB time 86621.3671 ****
12:00 DB time 78163.5806 ***
13:00 DB time 64324.9537 ***
14:00 DB time 78970.2241 ***
15:00 DB time 78037.7742 ***
16:00 DB time 78215.1336 ***
17:00 DB time 69875.99 ***
18:00 DB time 61075.9801 ***
19:00 DB time 47658.5181 **
20:00 DB time 41372.216 **
21:00 DB time 57569.6918 **
22:00 DB time 65593.3167 ***
23:00 DB time 41013.2088 **
20-JAN 00:00 DB time 39789.4329
01:00 DB time 76274.4764 *
02:00 DB time 39564.2615
03:00 DB time 37327.7534
04:00 DB time 38174.4725
05:00 DB time 38335.113
06:00 DB time 55688.7588 *
07:00 DB time 39039.2476
08:00 DB time 45155.1028 *
09:00 DB time 62292.3892 *
10:00 DB time 73688.6751 *
11:00 DB time 86222.6032 *
12:00 DB time 79400.9817 *
13:00 DB time 81021.4711 *
14:00 DB time 2498694.98 *******************************
15:00 DB time 57824.5047 *
16:00 DB time 50252.1229 *
17:00 DB time 39474.4176
18:00 DB time 25728.465
19:00 DB time 40950.006 *
20:00 DB time 45258.7127 *
21:00 DB time 66186.8379 *
22:00 DB time 1177471.46 ***************
23:00 DB time 3748.88236
As you can spot like the excell graph we have DB time increased for 14-15 and 22-23 period
Now It is time to get the statspack report and dig further.
First I want to focus on the row cache problem if I have space I will talk about the recursive cursor problem maybe as second part.
We normally collect 30 minutes but available snaps are like below.
Snap
Snap Id Snap Started Level
8135 20 Jan 2010 21:05 7
8136 20 Jan 2010 21:35 7
8137 20 Jan 2010 22:28 7
8138 20 Jan 2010 22:58 7
If you look carefully you will see that snap at 22:00 is missing. There must be something wrong on DB which prevented statspack
from gathering but lets assume that you could not spot it for now as I could’t which was a deadly mistake caused increase in trouble shooting time
I generated statspack report between 21:35 and 22:28
Here is the statspack summary (full statspack is at here. SQLs are a bit edited for keeping privacy)
STATSPACK report for
Database DB Id Instance Inst Num Startup Time Release RAC
~~~~~~~~ ----------- ------------ -------- --------------- ----------- ---
1814528001 PROD1 1 30-Dec-09 01:59 10.2.0.4.0 NO
Host Name: MACHINE1 Num CPUs: 8 Phys Memory (MB): 32,763
~~~~
Snapshot Snap Id Snap Time Sessions Curs/Sess Comment
~~~~~~~~ ---------- ------------------ -------- --------- -------------------
Begin Snap: 8136 20-Jan-10 21:35:59 128 7.2
End Snap: 8137 20-Jan-10 22:28:50 189 5.4
Elapsed: 52.85 (mins)
Cache Sizes Begin End
~~~~~~~~~~~ ---------- ----------
Buffer Cache: 4,128M 4,032M Std Block Size: 8K
Shared Pool Size: 6,032M 6,128M Log Buffer: 13,913K
Load Profile Per Second Per Transaction
~~~~~~~~~~~~ --------------- ---------------
Redo size: 126,609.46 124,954.43
Logical reads: 10,300.31 10,165.66
Block changes: 904.39 892.57
Physical reads: 375.25 370.34
Physical writes: 17.54 17.31
User calls: 15.88 15.67
Parses: 9.22 9.09
Hard parses: 1.08 1.06
Sorts: 60.58 59.79
Logons: 0.13 0.13
Executes: 327.42 323.14
Transactions: 1.01
% Blocks changed per Read: 8.78 Recursive Call %: 98.49
Rollback per transaction %: 1.28 Rows per Sort: 28.80
Instance Efficiency Percentages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.96 Redo NoWait %: 95.11
Buffer Hit %: 99.73 In-memory Sort %: 100.00
Library Hit %: 99.10 Soft Parse %: 88.32
Execute to Parse %: 97.19 Latch Hit %: 99.91
Parse CPU to Parse Elapsd %: 64.77 % Non-Parse CPU: 99.51
Shared Pool Statistics Begin End
------ ------
Memory Usage %: 74.42 72.87
% SQL with executions>1: 33.97 33.40
% Memory for SQL w/exec>1: 56.11 55.38
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
----------------------------------------- ------------ ----------- ------ ------
row cache lock 16,524 48,098 2911 45.6
log file switch (checkpoint incomplete) 68,137 41,595 610 39.5
buffer busy waits 13,462 13,455 999 12.8
CPU time 1,472 1.4
db file sequential read 723,098 446 1 .4
-------------------------------------------------------------
Host CPU CPUs-8
~~~~~~~~ Load Average
Begin End User System Idle WIO WCPU
------- ------- ------- ------- ------- ------- --------
3.69 2.12 94.19
Instance CPU
~~~~~~~~~~~~
% of total CPU for Instance: 3.30
% of busy CPU for Instance: 56.81
%DB time waiting for CPU - Resource Mgr:
Memory Statistics Begin End
~~~~~~~~~~~~~~~~~ ------------ ------------
Host Mem (MB): 32,762.6 32,762.6
SGA use (MB): 10,240.0 10,240.0
PGA use (MB): 776.7 787.4
% Host Mem used for SGA+PGA: 33.6 33.7
-------------------------------------------------------------
Time Model System Stats DB/Inst: PROD1/prod1 Snaps: 8136-8137
-> Ordered by % of DB time desc, Statistic name
Statistic Time (s) % of DB time
----------------------------------- -------------------- ------------
sql execute elapsed time 101,089.5 100.0
DB CPU 821.2 .8
PL/SQL execution elapsed time 620.9 .6
parse time elapsed 22.4 .0
hard parse elapsed time 21.2 .0
PL/SQL compilation elapsed time 1.7 .0
connection management call elapsed 0.2 .0
hard parse (sharing criteria) elaps 0.1 .0
repeated bind elapsed time 0.0 .0
hard parse (bind mismatch) elapsed 0.0 .0
failed parse elapsed time 0.0 .0
sequence load elapsed time 0.0 .0
DB time 101,090.9
background elapsed time 6,382.7
background cpu time 16.6
Wait Events DB/Inst: PROD1/prod1 Snaps: 8380-8382
-> s - second, cs - centisecond, ms - millisecond, us - microsecond
-> %Timeouts: value of 0 indicates value was < .5%. Value of null is truly 0
-> Only events with Total Wait Time (s) >= .001 are shown
-> ordered by Total Wait Time desc, Waits desc (idle events last)
Avg
%Time Total Wait wait Waits
Event Waits -outs Time (s) (ms) /txn
--------------------------------- ------------ ------ ---------- ------ --------
row cache lock 16,524 97 48,098 2911 5.1
log file switch (checkpoint incom 68,137 61 41,595 610 21.2
buffer busy waits 13,462 100 13,455 999 4.2
..
..
..
Wait Event Histogram DB/Inst: PROD1/prod1 Snaps: 8136-8137
-> Total Waits - units: K is 1000, M is 1000000, G is 1000000000
-> % of Waits - column heading: <=1s is truly <1024ms, >1s is truly >=1024ms
-> % of Waits - value: .0 indicates value was <.05%, null is truly 0
-> Ordered by Event (idle events last)
Total ----------------- % of Waits ------------------
Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
row cache lock 16K 2.9 .0 97.1
log file switch (checkpoin 66K 21.0 2.7 12.8 2.1 .3 61.1
buffer busy waits 13K .0 .0 100.0
As you see row cache locks avg wait time is nearly 3 second with 97% is over 1 second which looks like it is the problem
We already spotted row cache lock but where to look next ?
Documentation says ;
row cache lock
The session is trying to get a data dictionary lock.
Wait Time: Wait up to 60 seconds.
Parameter Description
cache id The CACHE# column value in the V$ROWCACHE view
mode See "mode"
request The pipe timer set by the user
In Statspack report we have a section called “Dictionary Cache Stats”. Lets check what we have there
Dictionary Cache Stats DB/Inst: PROD1/prod1 Snaps: 8136-8137
->"Pct Misses" should be very low (< 2% in most cases)
->"Final Usage" is the number of cache entries being used in End Snapshot
Get Pct Scan Pct Mod Final
Cache Requests Miss Reqs Miss Reqs Usage
------------------------- ------------ ------ ------- ----- -------- ----------
dc_awr_control 18 0.0 0 0 1
dc_database_links 107 0.0 0 0 1
dc_files 13 0.0 0 0 13
dc_global_oids 21,571 0.1 0 2 255
dc_histogram_data 73,396 2.2 0 0 3,904
dc_histogram_defs 279,541 1.4 0 0 7,652
dc_object_grants 102 11.8 0 0 124
dc_object_ids 1,050,311 0.1 0 5 2,879
dc_objects 14,052 1.3 0 31 2,605
dc_profiles 74 0.0 0 0 1
dc_rollback_segments 4,147 0.0 0 64 300
dc_segments 50,349 0.2 0 121 2,415
dc_sequences 10 10.0 0 10 13
dc_table_scns 1 100.0 0 0 0
dc_tablespace_quotas 289 0.0 0 0 9
dc_tablespaces 66,671 0.0 0 0 11
dc_usernames 18,888 0.1 0 0 53
dc_users 99,804 0.0 0 0 71
outstanding_alerts 26 15.4 0 4 24
-------------------------------------------------------------
From my old shared pool post I remember that Pct Miss is important about dictionary cache tuning.
I did the comparison with the normal times and the only thing I spot was increase in Pct miss for
dc_object_grants and dc_sequences with less get request but without specific session activity all
I can do is this much for this event. so I moved to logfile switch event to see if I can find something
Definition **** log file switch (checkpoint incomplete)
The log file switch wait indicates that the process is waiting for the log file switch to complete,
but the log file switch is not possible because the checkpoint process for that log file
has not completed. You may see this event when the redo log files are sized too small.
Wait Parameters Wait parameters are not used for log file switch (checkpoint incomplete).
Wait Time One second
This is an event we see occasionally but not with the redo size we generated for that time interval. Lets see how many log switches has been done for that day
SQL> @gen_redo
DAY COUNT# MIN# MAX# DAILY_AVG_MB
---------- ---------- ---------- ---------- ------------
2010-01-26 790 60361 61150 50560
2010-01-25 722 59639 60360 46208
2010-01-24 469 59170 59638 30016
2010-01-23 291 58879 59169 18624
2010-01-22 651 58228 58878 41664
2010-01-21 394 57834 58227 25216
2010-01-20 473 57361 57833 30272
10 rows selected.
Session altered.
DAY 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
20/01/2010 0 50 0 0 0 1 56 0 1 9 17 43 37 27 8 8 14 12 9 101 57 15 7 1
21/01/2010 0 52 0 1 0 0 56 0 1 6 13 12 12 12 17 10 4 11 6 100 58 18 4 1
22/01/2010 0 51 0 0 1 0 58 0 1 3 11 27 14 18 82 25 17 60 104 100 57 18 4 0
23/01/2010 1 51 2 1 0 0 54 1 0 0 0 0 1 0 0 0 1 0 0 100 57 18 3 1
24/01/2010 0 52 1 3 3 3 67 8 7 1 0 1 0 0 136 53 16 1 0 76 20 18 1 2
25/01/2010 0 53 0 0 0 1 59 1 1 120 45 89 40 4 80 17 16 78 6 90 11 8 3 0
26/01/2010 0 53 12 39 13 0 57 0 1 40 28 63 38 142 4 99 59 5 17 82 16 19 2 1
8 rows selected.
looks like nothing specific happened number of log switches looks innocent.
What is next buffer busy looks like the third one with little effect but still can give something ?
Lets check the trends
SQL> @sp_evtrends
Please enter the ORACLE_SID value: PROD1
Please enter the report begin date: sysdate-8
Please enter the report end date:sysdate
Daily trends for "buffer busy waits"...
Day Statistic Name Secs Waited Percentage of total over all days
------ ------------------------------ ------------------- ------------------------------------------------------------
19-JAN buffer busy waits 127.95
20-JAN buffer busy waits 134,742.54 **********************************************************
21-JAN buffer busy waits 205.92
22-JAN buffer busy waits 553.43
23-JAN buffer busy waits 0.31
24-JAN buffer busy waits 32.49
25-JAN buffer busy waits 1,449.90 *
26-JAN buffer busy waits 1,131.49
Daily/hourly trends for "DB time"... (truncated)
Day Hour Statistic Name VALUE Percentage of total over all hours for each day
------ ------ ------------------------------ ---------- ------------------------------------------------------------
20-JAN 00:00 buffer busy waits 0.00
01:00 buffer busy waits 0.00
02:00 buffer busy waits 0.00
03:00 buffer busy waits 0.00
04:00 buffer busy waits 0.00
05:00 buffer busy waits 0.00
06:00 buffer busy waits 0.00
07:00 buffer busy waits 0.00
08:00 buffer busy waits 0.00
09:00 buffer busy waits 4.86
10:00 buffer busy waits 6.03
11:00 buffer busy waits 43.35
12:00 buffer busy waits 24.02
13:00 buffer busy waits 20.27
14:00 buffer busy waits 16.45
15:00 buffer busy waits 24.45
16:00 buffer busy waits 27.31
17:00 buffer busy waits 23.86
18:00 buffer busy waits 2.78
19:00 buffer busy waits 0.24
20:00 buffer busy waits 0.00
21:00 buffer busy waits 0.00
22:00 buffer busy waits 134,548.91 ************************************************************
23:00 buffer busy waits 0.00
Looks like normally our DB doesn’t spend time on this event which is good.
We have a nice section in statspack report for buffer busy wait events
Segments by Buffer Busy Waits DB/Inst: PROD1/prod1 Snaps: 8136-8137
-> End Segment Buffer Busy Waits Threshold: 100
Buffer
Subobject Obj. Busy Pct
Owner Tablespace Object Name Name Type Waits Total
---------- ---------- -------------------- ------------ ----- ------------ -----
APP_USER USERS ULG TABLE 9,560 100.0
APP_USER USERS PK_ULG INDEX 1 .0
-------------------------------------------------------------
All of the waits happened on the same object. Before checking I checked trend with another Tim Gorman script sp_buffer_busy (again edited for time interval and output only segments with buffer_busy waits >0)
I won’t put the output but output says this table showed up only 3 more times with wait total less then 30, in 10 days period. If you check the actual statspack for sqls
ULG table is not shown on any of them which is a bit strange again.
Maybe I can find something else, if I get half an hour period report for
Snap
Snap Id Snap Started Level
8137 20 Jan 2010 22:28 7
8138 20 Jan 2010 22:58 7
From top 5 wait events
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
----------------------------------------- ------------ ----------- ------ ------
db file sequential read 613,887 1,040 2 42.0
db file scattered read 266,197 853 3 34.5
CPU time 418 16.9
db file parallel write 45,244 86 2 3.5
log file parallel write 4,449 26 6 1.1
-------------------------------------------------------------
Our problem disappeared from top 5 list and DB time=16,656.2 which is very small comparing to previous value 101,090.9.
problem is gone but what has actually happened ?
I am pretty sure that there wasn’t any change on that day. backup ran successfully after 23:00 and export finished before 9:30 so what can this be ?
I know the answer but whoever comes up with a good idea please be my guest to comment. If you are good enough, I promise to buy beer.
Answer will be here as an another post on Friday. If I have time I will also show what happened for the other DB time peak on same day.
Note:
With this post I did not re-invent the wheel but try to show you the available free tools you can use and modify for your own fast troubleshooting. As Tanel says right tool right time. These guys share these tools to help you , my advice is use it if you need the.
My script library consist of many scripts from Tanel Poder, Riyaj Shamsudeen ,Tim Gorman, Kerry Osborne, Dion Cho, Craig Shallahamer and from many others plus my own.
Thanks you very much to all guys who share their tools with us.
How much time I spent for diagnosis? Everything on this page took 10 minutes total to run. Rest is your experience and skills to search on metalink and internet.
for me coming to last point took only 1.5 hour.
List of scripts I used;