Note: Post is a bit long and might be boring, if you are already good with statspack, but there is a question at the end if you would like to answer
Yesterday, I decided to look for something to troubleshoot without waiting for any user complaints so I can write something useful here 🙂
Primary reason for this practice was to get used to interpret statspack reports to find possible bottlenecks
secondary reason was because I was bored about being underutilized.
I always feel comfortable troubleshooting session wide problems, non-historic performance problems or historic problems with Diagnostics Pack,
but I am not that experienced on statspack.
In my current environment I don’t have Diagnostic Pack option so I have to make good friendship with statspack and
the best way to be a good friend is to spend more time with your new friend as I am going to explain how.
Characteristics of the DB I will work on:
daily load is between 08:00 and 19:00
export at 21:00 running for roughly 1 hour
DB backup start 23:00
There are couple of ways to spot a possible bottlenecks without getting individual reports.
—First one is from Tanel Poder and his famous Perfsheet. This tool is already configured and ready to be used with statspack.
The query he uses to generate the report is the same idea behind the statspack delta views of Tim Gorman. Lets start
first screen is login screen (you need to activate the macros).
Inputs for statspack report is
Data Source- TNS Entry of the DB
Username/Password- username with statspack schema objects read privilege
Which Instance-instance name (lower case or modify the related sql in Queries sheet with upper function )
Which Query-Statspack CPU+Wait+Stats Hourly
From Date-start date
To Date-end date
then press show me. You will get warnings during the graph generation saying PivotTable report will not fit on the sheet … Just Press OK till the warnings end
Your graph might be empty on Office 2007. Just go to the Report Filter and select WAIT from EVENT_TYPES to see any abnormality on wait events.
First Graph is WAIT graph like below. As you see, I was able to spot a possible problem on 20/01/2010 at first try.
Lets make it more visible by adding top 10 filter to Legend Fields Section.
Another option is focus on the date where the abnormality is happened. Lets select only 20th. FYI top 10 filter is still on
Without the top 10 filter. Graph is slightly different but still giving what we need.
Lets Focus on only time interval between 18 and 23
As you see there is an abnormal increase on row cache lock logfile switch and buffer busy
Lets do double-check with system stats. By selecting all hours and available dates and events your graph will be a bit misleading because of very high metrics like below
What can we do ? DB time is something we want to decrease when troubleshooting, so lets focus on DB Time by filtering Legend Fields on DB Time
We have 2 peaks, which is a bit confusing for the initial assumption of problem timing. Lets focus on 20th again
There are two peaks between 13-15 and 21-23. We have an interesting situation here in terms of DB time. From wait event perspective abnormality is between 20-22 but DB time perspective we have 1 more time slot between 13-15 .
At this point I quit Perfsheet. I spotted what I needed for initial problem chase now I can go to other options before, gathering the report. (these options are way more easier than statspack report generation thats why I did not directly go to the statspack)
Second option is from Tim Gorman himself. sp_systime which I find very useful in terms of overall response time analysis. ( i needed to change the script to focus between the dates instead of days before sysdate option)
Here is the ouput
First Part is overall response time for that day.
SQL> @sp_systime2 Please enter the ORACLE_SID value: PROD1 Please enter the report begin date: sysdate-8 Please enter the report end date:sysdate-5 Total Cum Service Seconds % of % of Day or Wait Name Spent Total Total ------------ -------- ----------------------------------- ------------------- ------- ------- 19-JAN Service SQL execution 115,326.24 78.78 78.78 Wait db file sequential read 7,697.33 5.26 84.04 Wait db file scattered read 6,213.01 4.24 88.29 Service Recursive SQL execution 4,638.75 3.17 91.46 Wait Backup: sbtwrite2 2,671.85 1.83 93.28 Wait Backup: sbtbackup 2,081.53 1.42 94.70 Wait log file parallel write 1,423.58 0.97 95.67 Service Parsing SQL 1,109.49 0.76 96.43 Wait db file parallel write 913.09 0.62 97.06 Wait log file sync 761.78 0.52 97.58 Wait Datapump dump file I/O 631.13 0.43 98.01 Wait Log archive I/O 549.08 0.38 98.38 20-JAN Service Recursive SQL execution 255,597.67 65.42 65.42 Wait row cache lock 48,098.17 12.31 77.74 Wait log file switch (checkpoint incompl 41,613.30 10.65 88.39 Wait buffer busy waits 13,474.25 3.45 91.84 Wait db file sequential read 9,788.51 2.51 94.34 Wait db file scattered read 6,831.08 1.75 96.09 Wait Backup: sbtwrite2 3,512.51 0.90 96.99 Wait log file parallel write 1,897.38 0.49 97.48 Wait Datapump dump file I/O 1,674.20 0.43 97.90 Wait db file parallel write 1,389.77 0.36 98.26 Wait log file sync 1,073.14 0.27 98.53 Service Parsing SQL 964.58 0.25 98.78 21-JAN Service SQL execution 33,045.26 44.91 44.91 Service Recursive SQL execution 12,785.72 17.38 62.28 Wait db file sequential read 8,296.99 11.28 73.56 Wait db file scattered read 6,731.96 9.15 82.71 Wait Backup: sbtwrite2 3,340.23 4.54 87.25 Wait log file parallel write 1,856.99 2.52 89.77 Wait db file parallel write 1,359.19 1.85 91.62 Wait log file sync 1,131.31 1.54 93.16 Service Parsing SQL 1,011.46 1.37 94.53 Wait enq: TX - row lock contention 861.24 1.17 95.70 Wait Log archive I/O 686.01 0.93 96.63 Wait Backup: sbtbackup 402.52 0.55 97.18
If you check the top 5 you will easily spot the row cache lock, log file switch and buffer busy waits are different from the other days
Second part of the report is hourly based response time for the days. I truncated the output for 8 hours but the full is here ***
12:00 Service SQL execution 6,268.92 76.66 76.66 Wait db file sequential read 634.26 7.76 84.42 Wait db file scattered read 344.02 4.21 88.63 Service Recursive SQL execution 230.23 2.82 91.44 Wait log file parallel write 147.58 1.80 93.25 Wait log file sync 129.94 1.59 94.84 13:00 Service SQL execution 6,942.33 83.97 83.97 Wait db file sequential read 375.15 4.54 88.51 Wait db file scattered read 207.74 2.51 91.02 Service Recursive SQL execution 200.86 2.43 93.45 Wait enq: TX - row lock contention 162.83 1.97 95.42 Wait log file parallel write 94.08 1.14 96.56 14:00 Service Recursive SQL execution 241,820.91 99.35 99.35 Wait db file scattered read 574.84 0.24 99.59 Wait db file sequential read 515.74 0.21 99.80 Wait log file parallel write 97.09 0.04 99.84 Wait enq: TX - row lock contention 87.19 0.04 99.88 Service Parsing SQL 81.25 0.03 99.91 15:00 Service SQL execution 3,620.32 62.09 62.09 Service Recursive SQL execution 1,210.42 20.76 82.86 Wait db file sequential read 328.60 5.64 88.49 Wait db file scattered read 294.73 5.06 93.55 Service Parsing SQL 73.42 1.26 94.81 Wait enq: TX - row lock contention 72.03 1.24 96.04 16:00 Service SQL execution 3,667.78 70.86 70.86 Wait db file sequential read 429.87 8.30 79.16 Service Recursive SQL execution 371.24 7.17 86.33 Wait db file scattered read 215.28 4.16 90.49 Wait enq: TX - row lock contention 144.67 2.79 93.29 Wait log file parallel write 100.81 1.95 95.24 17:00 Service SQL execution 2,904.54 70.63 70.63 Wait db file sequential read 397.42 9.66 80.29 Service Recursive SQL execution 202.13 4.92 85.21 Wait db file scattered read 138.29 3.36 88.57 Wait log file sync 94.87 2.31 90.88 Wait log file parallel write 90.94 2.21 93.09 18:00 Service SQL execution 1,791.86 65.17 65.17 Wait db file sequential read 302.02 10.98 76.15 Service Recursive SQL execution 176.04 6.40 82.56 Wait db file scattered read 151.83 5.52 88.08 Wait log file parallel write 80.44 2.93 91.00 Wait log file sync 63.46 2.31 93.31 19:00 Service SQL execution 2,261.88 44.86 44.86 Wait db file sequential read 758.41 15.04 59.91 Wait db file scattered read 540.35 10.72 70.62 Wait db file parallel write 272.36 5.40 76.03 Wait log file parallel write 269.47 5.34 81.37 Wait log file sequential read 154.58 3.07 84.44 20:00 Service SQL execution 3,517.30 73.29 73.29 Wait db file sequential read 467.34 9.74 83.03 Wait db file scattered read 251.47 5.24 88.27 Wait log file parallel write 161.71 3.37 91.64 Wait db file parallel write 92.73 1.93 93.57 Wait Log archive I/O 90.24 1.88 95.45 21:00 Service SQL execution 2,875.04 45.72 45.72 Wait Datapump dump file I/O 1,504.28 23.92 69.64 Wait db file sequential read 873.68 13.89 83.53 Wait db file scattered read 417.53 6.64 90.17 Wait log file parallel write 130.15 2.07 92.24 Wait db file parallel write 106.51 1.69 93.93 22:00 Wait row cache lock 48,097.61 41.83 41.83 Wait log file switch (checkpoint incompl 41,594.75 36.17 78.00 Wait buffer busy waits 13,454.89 11.70 89.70 Service Recursive SQL execution 8,958.04 7.79 97.49 Wait db file sequential read 1,485.98 1.29 98.79 Wait db file scattered read 856.65 0.75 99.53 23:00 Wait Backup: sbtwrite2 2,801.73 76.03 76.03 Wait Backup: sbtbackup 352.22 9.56 85.59 Wait db file scattered read 140.16 3.80 89.39 Wait Backup: sbtclose2 93.25 2.53 91.92 Wait db file sequential read 78.52 2.13 94.05 Service SQL execution 65.77 1.78 95.84
What do we see here ? The graph told us there might be a possible problem between 13-15 and 21-23.
First; focus on 13-15
13:00 Service SQL execution 6,942.33 83.97 83.97 Wait db file sequential read 375.15 4.54 88.51 Wait db file scattered read 207.74 2.51 91.02 Service Recursive SQL execution 200.86 2.43 93.45 Wait enq: TX - row lock contention 162.83 1.97 95.42 Wait log file parallel write 94.08 1.14 96.56 14:00 Service Recursive SQL execution 241,820.91 99.35 99.35 Wait db file scattered read 574.84 0.24 99.59 Wait db file sequential read 515.74 0.21 99.80 Wait log file parallel write 97.09 0.04 99.84 Wait enq: TX - row lock contention 87.19 0.04 99.88 Service Parsing SQL 81.25 0.03 99.91
Second ; focus on 21-23
21:00 Service SQL execution 2,875.04 45.72 45.72 Wait Datapump dump file I/O 1,504.28 23.92 69.64 Wait db file sequential read 873.68 13.89 83.53 Wait db file scattered read 417.53 6.64 90.17 Wait log file parallel write 130.15 2.07 92.24 Wait db file parallel write 106.51 1.69 93.93 22:00 Wait row cache lock 48,097.61 41.83 41.83 Wait log file switch (checkpoint incompl 41,594.75 36.17 78.00 Wait buffer busy waits 13,454.89 11.70 89.70 Service Recursive SQL execution 8,958.04 7.79 97.49 Wait db file sequential read 1,485.98 1.29 98.79 Wait db file scattered read 856.65 0.75 99.53
Nothing common but tells lots of things. First one spent %99 of response time on Recursive SQL Execution which means no wait but on CPU doing something which we don’t know yet.
Second one spent %41 of its time on row cache lock and %36 on log file switch. Now we are ready for the statspack to see what can we find but
before that I want to show 2 more scripts whose output is can be obtained by Perfsheet as well (like DB time)
First script is Tim Gormans sp_evtrends scripts which shows trends of events. There are two problems; sometimes unnecessarily high number of * character shown on the screen but comparing it with the seconds waited it can easily be ignored. Another thing is that, if you focus on 1 day sometimes * character graph might be misleading as it is with excel graphs so it is better to focus on 5-10 days
Scripts asks for db_name, begin date, end date and statistics to analyze
SQL> @sp_evtrends Please enter the ORACLE_SID value: PROD1 Please enter the report begin date: sysdate-8 Please enter the report end date:sysdate-6 Some useful database statistics to search upon: Name ------------------------------------------------------------ ARCH random i/o ARCH sequential i/o ARCH wait for archivelog lock .... .... .... Daily trends for "row cache lock"... What statistic do you want to analyze? row cache lock Day Statistic Name Secs Waited Percentage of total over all days ------ ------------------------------ ------------------- ------------------------------------------------------------ 19-JAN row cache lock 0.00 20-JAN row cache lock 480,981.73 ************************************************************ Daily/hourly trends for "row cache lock"... Day Hour Statistic Name Secs Waited Percentage of total over all hours for each day ------ ------ ------------------------------ ------------------- ------------------------------------------------------------ 19-JAN 00:00 row cache lock 0.00 01:00 row cache lock 0.00 02:00 row cache lock 0.00 03:00 row cache lock 0.00 04:00 row cache lock 0.00 05:00 row cache lock 0.00 06:00 row cache lock 0.00 07:00 row cache lock 0.00 08:00 row cache lock 0.00 09:00 row cache lock 0.00 10:00 row cache lock 0.00 11:00 row cache lock 0.00 12:00 row cache lock 0.00 13:00 row cache lock 0.00 14:00 row cache lock 0.00 15:00 row cache lock 0.00 16:00 row cache lock 0.00 17:00 row cache lock 0.00 18:00 row cache lock 0.00 19:00 row cache lock 0.00 20:00 row cache lock 0.00 21:00 row cache lock 0.00 22:00 row cache lock 0.00 ************************************************************ 23:00 row cache lock 0.00 20-JAN 00:00 row cache lock 0.00 01:00 row cache lock 0.00 02:00 row cache lock 0.00 03:00 row cache lock 0.00 04:00 row cache lock 0.00 05:00 row cache lock 0.00 06:00 row cache lock 0.00 07:00 row cache lock 0.00 08:00 row cache lock 0.00 09:00 row cache lock 0.00 10:00 row cache lock 0.00 11:00 row cache lock 0.00 12:00 row cache lock 0.00 13:00 row cache lock 0.00 14:00 row cache lock 0.00 15:00 row cache lock 0.00 16:00 row cache lock 5.65 17:00 row cache lock 0.00 18:00 row cache lock 0.00 19:00 row cache lock 0.00 20:00 row cache lock 0.00 21:00 row cache lock 0.00 22:00 row cache lock 480,976.09 ************************************************************ 23:00 row cache lock 0.00
As you see it shows the same number of * for 19th which is wrong when you check with time spent so abnormality can be spotted by both time spent and number of *.
Second script is modified version for sys_time_model statistics called sp_sys_time_trends.
SQL> @sp_sys_time_trends </a>Please enter the ORACLE_SID value: PROD1 Please enter the report begin date: sysdate-8 Please enter the report end date:sysdate-6 Some useful database statistics to search upon: Name ------------------------------------------------------------ DB CPU DB time Java execution elapsed time PL/SQL compilation elapsed time PL/SQL execution elapsed time RMAN cpu time (backup/restore) background cpu time background elapsed time connection management call elapsed time failed parse (out of shared memory) elapsed time failed parse elapsed time hard parse (bind mismatch) elapsed time hard parse (sharing criteria) elapsed time hard parse elapsed time inbound PL/SQL rpc elapsed time parse time elapsed repeated bind elapsed time sequence load elapsed time sql execute elapsed time What statistic do you want to analyze? DB time Daily trends for "DB time"... Day Statistic Name VALUE Percentage of total over all days ------ ------------------------------ ---------- ------------------------------------------------------------ 19-JAN DB time 1408469.51 ************** 20-JAN DB time 4797565.14 ********************************************** Daily/hourly trends for "DB time"... Day Hour Statistic Name VALUE Percentage of total over all hours for each day ------ ------ ------------------------------ ---------- ------------------------------------------------------------ 19-JAN 00:00 DB time 18267.2492 * 01:00 DB time 57907.3551 ** 02:00 DB time 64889.8266 *** 03:00 DB time 37988.8748 ** 04:00 DB time 39093.3842 ** 05:00 DB time 39267.5194 ** 06:00 DB time 56103.5928 ** 07:00 DB time 43312.1147 ** 08:00 DB time 50542.4821 ** 09:00 DB time 62883.2236 *** 10:00 DB time 89721.9315 **** 11:00 DB time 86621.3671 **** 12:00 DB time 78163.5806 *** 13:00 DB time 64324.9537 *** 14:00 DB time 78970.2241 *** 15:00 DB time 78037.7742 *** 16:00 DB time 78215.1336 *** 17:00 DB time 69875.99 *** 18:00 DB time 61075.9801 *** 19:00 DB time 47658.5181 ** 20:00 DB time 41372.216 ** 21:00 DB time 57569.6918 ** 22:00 DB time 65593.3167 *** 23:00 DB time 41013.2088 ** 20-JAN 00:00 DB time 39789.4329 01:00 DB time 76274.4764 * 02:00 DB time 39564.2615 03:00 DB time 37327.7534 04:00 DB time 38174.4725 05:00 DB time 38335.113 06:00 DB time 55688.7588 * 07:00 DB time 39039.2476 08:00 DB time 45155.1028 * 09:00 DB time 62292.3892 * 10:00 DB time 73688.6751 * 11:00 DB time 86222.6032 * 12:00 DB time 79400.9817 * 13:00 DB time 81021.4711 * 14:00 DB time 2498694.98 ******************************* 15:00 DB time 57824.5047 * 16:00 DB time 50252.1229 * 17:00 DB time 39474.4176 18:00 DB time 25728.465 19:00 DB time 40950.006 * 20:00 DB time 45258.7127 * 21:00 DB time 66186.8379 * 22:00 DB time 1177471.46 *************** 23:00 DB time 3748.88236
As you can spot like the excell graph we have DB time increased for 14-15 and 22-23 period
Now It is time to get the statspack report and dig further.
First I want to focus on the row cache problem if I have space I will talk about the recursive cursor problem maybe as second part.
We normally collect 30 minutes but available snaps are like below.
Snap Snap Id Snap Started Level 8135 20 Jan 2010 21:05 7 8136 20 Jan 2010 21:35 7 8137 20 Jan 2010 22:28 7 8138 20 Jan 2010 22:58 7
If you look carefully you will see that snap at 22:00 is missing. There must be something wrong on DB which prevented statspack
from gathering but lets assume that you could not spot it for now as I could’t which was a deadly mistake caused increase in trouble shooting time
I generated statspack report between 21:35 and 22:28
Here is the statspack summary (full statspack is at here. SQLs are a bit edited for keeping privacy)
STATSPACK report for Database DB Id Instance Inst Num Startup Time Release RAC ~~~~~~~~ ----------- ------------ -------- --------------- ----------- --- 1814528001 PROD1 1 30-Dec-09 01:59 10.2.0.4.0 NO Host Name: MACHINE1 Num CPUs: 8 Phys Memory (MB): 32,763 ~~~~ Snapshot Snap Id Snap Time Sessions Curs/Sess Comment ~~~~~~~~ ---------- ------------------ -------- --------- ------------------- Begin Snap: 8136 20-Jan-10 21:35:59 128 7.2 End Snap: 8137 20-Jan-10 22:28:50 189 5.4 Elapsed: 52.85 (mins) Cache Sizes Begin End ~~~~~~~~~~~ ---------- ---------- Buffer Cache: 4,128M 4,032M Std Block Size: 8K Shared Pool Size: 6,032M 6,128M Log Buffer: 13,913K Load Profile Per Second Per Transaction ~~~~~~~~~~~~ --------------- --------------- Redo size: 126,609.46 124,954.43 Logical reads: 10,300.31 10,165.66 Block changes: 904.39 892.57 Physical reads: 375.25 370.34 Physical writes: 17.54 17.31 User calls: 15.88 15.67 Parses: 9.22 9.09 Hard parses: 1.08 1.06 Sorts: 60.58 59.79 Logons: 0.13 0.13 Executes: 327.42 323.14 Transactions: 1.01 % Blocks changed per Read: 8.78 Recursive Call %: 98.49 Rollback per transaction %: 1.28 Rows per Sort: 28.80 Instance Efficiency Percentages ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 99.96 Redo NoWait %: 95.11 Buffer Hit %: 99.73 In-memory Sort %: 100.00 Library Hit %: 99.10 Soft Parse %: 88.32 Execute to Parse %: 97.19 Latch Hit %: 99.91 Parse CPU to Parse Elapsd %: 64.77 % Non-Parse CPU: 99.51 Shared Pool Statistics Begin End ------ ------ Memory Usage %: 74.42 72.87 % SQL with executions>1: 33.97 33.40 % Memory for SQL w/exec>1: 56.11 55.38 Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time ----------------------------------------- ------------ ----------- ------ ------ row cache lock 16,524 48,098 2911 45.6 log file switch (checkpoint incomplete) 68,137 41,595 610 39.5 buffer busy waits 13,462 13,455 999 12.8 CPU time 1,472 1.4 db file sequential read 723,098 446 1 .4 ------------------------------------------------------------- Host CPU CPUs-8 ~~~~~~~~ Load Average Begin End User System Idle WIO WCPU ------- ------- ------- ------- ------- ------- -------- 3.69 2.12 94.19 Instance CPU ~~~~~~~~~~~~ % of total CPU for Instance: 3.30 % of busy CPU for Instance: 56.81 %DB time waiting for CPU - Resource Mgr: Memory Statistics Begin End ~~~~~~~~~~~~~~~~~ ------------ ------------ Host Mem (MB): 32,762.6 32,762.6 SGA use (MB): 10,240.0 10,240.0 PGA use (MB): 776.7 787.4 % Host Mem used for SGA+PGA: 33.6 33.7 ------------------------------------------------------------- Time Model System Stats DB/Inst: PROD1/prod1 Snaps: 8136-8137 -> Ordered by % of DB time desc, Statistic name Statistic Time (s) % of DB time ----------------------------------- -------------------- ------------ sql execute elapsed time 101,089.5 100.0 DB CPU 821.2 .8 PL/SQL execution elapsed time 620.9 .6 parse time elapsed 22.4 .0 hard parse elapsed time 21.2 .0 PL/SQL compilation elapsed time 1.7 .0 connection management call elapsed 0.2 .0 hard parse (sharing criteria) elaps 0.1 .0 repeated bind elapsed time 0.0 .0 hard parse (bind mismatch) elapsed 0.0 .0 failed parse elapsed time 0.0 .0 sequence load elapsed time 0.0 .0 DB time 101,090.9 background elapsed time 6,382.7 background cpu time 16.6 Wait Events DB/Inst: PROD1/prod1 Snaps: 8380-8382 -> s - second, cs - centisecond, ms - millisecond, us - microsecond -> %Timeouts: value of 0 indicates value was < .5%. Value of null is truly 0 -> Only events with Total Wait Time (s) >= .001 are shown -> ordered by Total Wait Time desc, Waits desc (idle events last) Avg %Time Total Wait wait Waits Event Waits -outs Time (s) (ms) /txn --------------------------------- ------------ ------ ---------- ------ -------- row cache lock 16,524 97 48,098 2911 5.1 log file switch (checkpoint incom 68,137 61 41,595 610 21.2 buffer busy waits 13,462 100 13,455 999 4.2 .. .. .. Wait Event Histogram DB/Inst: PROD1/prod1 Snaps: 8136-8137 -> Total Waits - units: K is 1000, M is 1000000, G is 1000000000 -> % of Waits - column heading: <=1s is truly <1024ms, >1s is truly >=1024ms -> % of Waits - value: .0 indicates value was <.05%, null is truly 0 -> Ordered by Event (idle events last) Total ----------------- % of Waits ------------------ Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s -------------------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- row cache lock 16K 2.9 .0 97.1 log file switch (checkpoin 66K 21.0 2.7 12.8 2.1 .3 61.1 buffer busy waits 13K .0 .0 100.0
As you see row cache locks avg wait time is nearly 3 second with 97% is over 1 second which looks like it is the problem
We already spotted row cache lock but where to look next ?
Documentation says ;
row cache lock The session is trying to get a data dictionary lock. Wait Time: Wait up to 60 seconds. Parameter Description cache id The CACHE# column value in the V$ROWCACHE view mode See "mode" request The pipe timer set by the user
In Statspack report we have a section called “Dictionary Cache Stats”. Lets check what we have there
Dictionary Cache Stats DB/Inst: PROD1/prod1 Snaps: 8136-8137 ->"Pct Misses" should be very low (< 2% in most cases) ->"Final Usage" is the number of cache entries being used in End Snapshot Get Pct Scan Pct Mod Final Cache Requests Miss Reqs Miss Reqs Usage ------------------------- ------------ ------ ------- ----- -------- ---------- dc_awr_control 18 0.0 0 0 1 dc_database_links 107 0.0 0 0 1 dc_files 13 0.0 0 0 13 dc_global_oids 21,571 0.1 0 2 255 dc_histogram_data 73,396 2.2 0 0 3,904 dc_histogram_defs 279,541 1.4 0 0 7,652 dc_object_grants 102 11.8 0 0 124 dc_object_ids 1,050,311 0.1 0 5 2,879 dc_objects 14,052 1.3 0 31 2,605 dc_profiles 74 0.0 0 0 1 dc_rollback_segments 4,147 0.0 0 64 300 dc_segments 50,349 0.2 0 121 2,415 dc_sequences 10 10.0 0 10 13 dc_table_scns 1 100.0 0 0 0 dc_tablespace_quotas 289 0.0 0 0 9 dc_tablespaces 66,671 0.0 0 0 11 dc_usernames 18,888 0.1 0 0 53 dc_users 99,804 0.0 0 0 71 outstanding_alerts 26 15.4 0 4 24 -------------------------------------------------------------
From my old shared pool post I remember that Pct Miss is important about dictionary cache tuning.
I did the comparison with the normal times and the only thing I spot was increase in Pct miss for
dc_object_grants and dc_sequences with less get request but without specific session activity all
I can do is this much for this event. so I moved to logfile switch event to see if I can find something
Definition **** log file switch (checkpoint incomplete) The log file switch wait indicates that the process is waiting for the log file switch to complete, but the log file switch is not possible because the checkpoint process for that log file has not completed. You may see this event when the redo log files are sized too small. Wait Parameters Wait parameters are not used for log file switch (checkpoint incomplete). Wait Time One second
This is an event we see occasionally but not with the redo size we generated for that time interval. Lets see how many log switches has been done for that day
SQL> @gen_redo DAY COUNT# MIN# MAX# DAILY_AVG_MB ---------- ---------- ---------- ---------- ------------ 2010-01-26 790 60361 61150 50560 2010-01-25 722 59639 60360 46208 2010-01-24 469 59170 59638 30016 2010-01-23 291 58879 59169 18624 2010-01-22 651 58228 58878 41664 2010-01-21 394 57834 58227 25216 2010-01-20 473 57361 57833 30272 10 rows selected. Session altered. DAY 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- 20/01/2010 0 50 0 0 0 1 56 0 1 9 17 43 37 27 8 8 14 12 9 101 57 15 7 1 21/01/2010 0 52 0 1 0 0 56 0 1 6 13 12 12 12 17 10 4 11 6 100 58 18 4 1 22/01/2010 0 51 0 0 1 0 58 0 1 3 11 27 14 18 82 25 17 60 104 100 57 18 4 0 23/01/2010 1 51 2 1 0 0 54 1 0 0 0 0 1 0 0 0 1 0 0 100 57 18 3 1 24/01/2010 0 52 1 3 3 3 67 8 7 1 0 1 0 0 136 53 16 1 0 76 20 18 1 2 25/01/2010 0 53 0 0 0 1 59 1 1 120 45 89 40 4 80 17 16 78 6 90 11 8 3 0 26/01/2010 0 53 12 39 13 0 57 0 1 40 28 63 38 142 4 99 59 5 17 82 16 19 2 1 8 rows selected.
looks like nothing specific happened number of log switches looks innocent.
What is next buffer busy looks like the third one with little effect but still can give something ?
Lets check the trends
SQL> @sp_evtrends Please enter the ORACLE_SID value: PROD1 Please enter the report begin date: sysdate-8 Please enter the report end date:sysdate Daily trends for "buffer busy waits"... Day Statistic Name Secs Waited Percentage of total over all days ------ ------------------------------ ------------------- ------------------------------------------------------------ 19-JAN buffer busy waits 127.95 20-JAN buffer busy waits 134,742.54 ********************************************************** 21-JAN buffer busy waits 205.92 22-JAN buffer busy waits 553.43 23-JAN buffer busy waits 0.31 24-JAN buffer busy waits 32.49 25-JAN buffer busy waits 1,449.90 * 26-JAN buffer busy waits 1,131.49 Daily/hourly trends for "DB time"... (truncated) Day Hour Statistic Name VALUE Percentage of total over all hours for each day ------ ------ ------------------------------ ---------- ------------------------------------------------------------ 20-JAN 00:00 buffer busy waits 0.00 01:00 buffer busy waits 0.00 02:00 buffer busy waits 0.00 03:00 buffer busy waits 0.00 04:00 buffer busy waits 0.00 05:00 buffer busy waits 0.00 06:00 buffer busy waits 0.00 07:00 buffer busy waits 0.00 08:00 buffer busy waits 0.00 09:00 buffer busy waits 4.86 10:00 buffer busy waits 6.03 11:00 buffer busy waits 43.35 12:00 buffer busy waits 24.02 13:00 buffer busy waits 20.27 14:00 buffer busy waits 16.45 15:00 buffer busy waits 24.45 16:00 buffer busy waits 27.31 17:00 buffer busy waits 23.86 18:00 buffer busy waits 2.78 19:00 buffer busy waits 0.24 20:00 buffer busy waits 0.00 21:00 buffer busy waits 0.00 22:00 buffer busy waits 134,548.91 ************************************************************ 23:00 buffer busy waits 0.00
Looks like normally our DB doesn’t spend time on this event which is good.
We have a nice section in statspack report for buffer busy wait events
Segments by Buffer Busy Waits DB/Inst: PROD1/prod1 Snaps: 8136-8137 -> End Segment Buffer Busy Waits Threshold: 100 Buffer Subobject Obj. Busy Pct Owner Tablespace Object Name Name Type Waits Total ---------- ---------- -------------------- ------------ ----- ------------ ----- APP_USER USERS ULG TABLE 9,560 100.0 APP_USER USERS PK_ULG INDEX 1 .0 -------------------------------------------------------------
All of the waits happened on the same object. Before checking I checked trend with another Tim Gorman script sp_buffer_busy (again edited for time interval and output only segments with buffer_busy waits >0)
I won’t put the output but output says this table showed up only 3 more times with wait total less then 30, in 10 days period. If you check the actual statspack for sqls
ULG table is not shown on any of them which is a bit strange again.
Maybe I can find something else, if I get half an hour period report for
Snap Snap Id Snap Started Level 8137 20 Jan 2010 22:28 7 8138 20 Jan 2010 22:58 7 From top 5 wait events Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time ----------------------------------------- ------------ ----------- ------ ------ db file sequential read 613,887 1,040 2 42.0 db file scattered read 266,197 853 3 34.5 CPU time 418 16.9 db file parallel write 45,244 86 2 3.5 log file parallel write 4,449 26 6 1.1 -------------------------------------------------------------
Our problem disappeared from top 5 list and DB time=16,656.2 which is very small comparing to previous value 101,090.9.
problem is gone but what has actually happened ?
I am pretty sure that there wasn’t any change on that day. backup ran successfully after 23:00 and export finished before 9:30 so what can this be ?
I know the answer but whoever comes up with a good idea please be my guest to comment. If you are good enough, I promise to buy beer.
Answer will be here as an another post on Friday. If I have time I will also show what happened for the other DB time peak on same day.
Note:
With this post I did not re-invent the wheel but try to show you the available free tools you can use and modify for your own fast troubleshooting. As Tanel says right tool right time. These guys share these tools to help you , my advice is use it if you need the.
My script library consist of many scripts from Tanel Poder, Riyaj Shamsudeen ,Tim Gorman, Kerry Osborne, Dion Cho, Craig Shallahamer and from many others plus my own.
Thanks you very much to all guys who share their tools with us.
How much time I spent for diagnosis? Everything on this page took 10 minutes total to run. Rest is your experience and skills to search on metalink and internet.
for me coming to last point took only 1.5 hour.
List of scripts I used;
Hi!
Cool. What amaze me all the time is your approach and wide knowledge. You looks to me like “Google for Oracle”-only the finest collections.
What to say then your blog should be part of any Oracle seminar additional literature!
Cheers,
Damir Vadas
http://damir-vadas.blogspot.com
Comment by Damir Vadas — January 28, 2010 @ 12:09 am
Thank you for your kind words Damir. I am not an Oracle scientist and I don’t really going after internals, but at least i know what to use and how to use research on internals from other people.
Comment by coskan — January 28, 2010 @ 10:16 am
[…] } In part 1a I was trying to find a problem to troubleshoot and I spotted 2 possible problems and drilled down […]
Pingback by Working with statspack-part-1b-Solution/Answer/another Question « Coskan’s Approach to Oracle — January 29, 2010 @ 7:07 pm
Hello, I follow your blog has some time and I believe that it contributes greatly to professional with performance issue in Oracle.
I’ve been developing for some time a tool named Mandela to diagnose performance problems. It need not be installed or create any object in the database analyzed. It only needs read access to the tables of STATSPACK or AWR.
The site is still under development, but if you want can download the Mandela and follow the launch of the latest versions.
What makes Mandela nice? Well, he can show into timeline all statistical system, redo log files switches, statistics I/O (tablespace, and datafile filesystem).
But what it does well done is to decompose the wait events (two clicks, three at most) and allow drilldown for the top sql by metrics (buffer gets, diskreads, cpu time, elapsed time, etc.) and now for the wait event. That’s right … from a chart of wait events you right click mouse and see the statements SQL that waited for “db file sequential read” or “CPU”, for example.
If you’re interested (and available) to see how Mandela can help you gain productivity, access http://www.trevis.com.br/projetos/mandela/download
If you want to chat or ask questions about the Mandela my contacts is as follows:
[] s,
Rafael Trevisan
http://www.trevis.com.br
EMail / MSN / GTalk: rafael at trevis.com.br
Comment by Rafael Trevisan — February 2, 2010 @ 1:15 am
Hi Rafael,
Thanks for stopping by.
I see you can also speak English so it can be very nice if you can create en English version of your web page. I am downloading the product right now. When i have time to use it I will definitely give feedback and discuss with you about the product.
Thank you again for contributing and your efforts
Comment by coskan — February 2, 2010 @ 1:27 am
Hello, the site is still under development (including the English version), and the tool is “outing of the oven”.
🙂
Comment by Rafael Trevisan — February 2, 2010 @ 3:17 am
I work as a Performance Analyst adressing solutions to performance problems or doing Top-down analysis searching for pro-active recommendations.
I spent like 80% of my working-day at Mandela. It made me able to perform 2 complex or 4 simple analysis per day.
I don’t spend more time at sqlplus and at excel generating endless graphs just to see what’s the problem with the DB. Now I can just click 3 or 4 times to see how my Database performed based on the AWR or Statspack repository.
I recommend for juniors DBAs who doesn’t know the performance metrics and to seniors DBAs who are tired of spend so much time at excel.
The best part is that it can even build your powerpoint presentation from your analysis.
Just try it and you will definitely agree with me.
Comment by Vinicius Hoffmann — February 2, 2010 @ 1:38 am
This is very interesting that after all the books and all my researches on the internet this is the first time I have heard about this tool. I think Rafael needs to work on marketing bit of the tool a bit more (no results on the web http://tiny.cc/6lhSW ). I think it is known in Brazil but not worldwide. If it does what it shows on the screenshots and what you say it is very promising.
The only problem for me is that I am using linux at home so I can try it at work tomorrow. I tried to use wine but no chance 😦 As I said I will try and if I like it as much as you I will definitely share my experience. thank you very much for making me aware of this tool.
Comment by coskan — February 2, 2010 @ 1:47 am
the tool is pretty new, but it’s awesome.
It’s user interface is fully in English so you can try it with no worries
The setup is simple and you don’t need a tnsnames working as the software uses EZ direct connect (Ex: user/password@hostname:port)
If you have any doubts about its usability try the
Forum or Rafael e-mail or my email (hoffmann1337@hotmail.com)
Comment by Vinicius Hoffmann — February 2, 2010 @ 3:07 am
Terrific tool. It does make the performance analisys faster and easier. The appearance is way self-intuitive, so it work as a guide througth the performance tracking.
I strongly recommend this tool for everybody who works with performance troubleshooting.
regards
Comment by Cristiano Reame — February 2, 2010 @ 12:23 pm
[…] https://coskan.wordpress.com/2010/01/27/working-with-statspack-part-1a-diagnosis/ […]
Pingback by Oracle Peformance Visualization… | Tanel Poder's blog: Core IT for Geeks and Pros — February 3, 2010 @ 9:47 am
Can anyone, maybe Rafael, explain what the prerequisites are for the tool, I get a “NET access violation 009B8701” when trying to connect to a database.
Thanks,
Marcus
Comment by Marcus Mönnig — February 3, 2010 @ 1:44 pm
Hello Marcus, I’m sorry for that information is not yet on the site, soon
site are much more full of interesting information.
In fact there is no special requirements for the tool. If
you can connect to Oracle via SQLPlus or other tool such
SQLDeveloper, Toad, SQLNavigator should also be able to Mandela.
Can you tell me which OS and database (including version) is
trying to connect?
If you prefer, add me on MSN or GTalk I try to help you more quickly.
rafael@trevis.com.br (both)
[]s,
Rafael Trevisan.
Comment by Rafael Trevisan — February 3, 2010 @ 1:50 pm
[…] Gundogar was also in a deductive frame of mind. Here is his Working with statspack-part-1a-Diagnosis, featuring both a challenge and purty pictures […]
Pingback by Log Buffer #176: a Carnival of the Vanities for DBAs | The Pythian Blog — February 6, 2010 @ 6:56 am
performance tuning…
[…]Working with statspack-part-1a-Diagnosis « Coskan’s Approach to Oracle[…]…
Trackback by performance tuning — December 10, 2011 @ 12:30 pm
I have noticed you don’t monetize your blog, don’t waste your traffic,
you can earn extra cash every month because you’ve got hi quality content.
If you want to know how to make extra money, search for: best adsense alternative Wrastain’s
tools
Comment by 86Rodney — August 19, 2017 @ 3:10 pm
Richard carpenter net worth
Working with statspack-part-1a-Diagnosis | Coskan’s Approach to Oracle
Trackback by commercial cleaning — September 28, 2018 @ 12:24 pm