Coskan’s Approach to Oracle

March 10, 2010

Blogroll Report 05/02/2009 – 12/02/2010

Filed under: Blogroll Report — coskan @ 2:27 am

<—- Blogroll Report 29/01/2010 – 05/02/2010

1-How to simulate Oracle errors with simulator events

Dennis Yurichev-Oracle RDBMS internal self-testing features

2-How to decode dates in histograms?

Martin Widlake-Format of Date Histograms

3-How to decide the value of datafile resize option ?

Jonathan Lewis-Shrink Tablespace

4-All about network wait events

Kyle Hailey-Oracle: Network Waits

5-When does manual logfile switch trigger complete checkpoint ?

Lua Donghua-ALTER SYSTEM SWITCH LOGFILE and CHECKPOINT

6-When does “alter system archive log current” trigger checkpoint?

Lua Donghua-ALTER SYSTEM ARCHIVE LOG CURRENT and CHECKPOINT

7-How to secure java in Oracle and how to test public revocation?

Paul Wright-Securing Java In Oracle and DBMS_JVM_EXP_PERMS

8-Difference between case and decode in terms of trimming

Timo Raitalaakso-Equality -comparing text

9-How to upgrade ASM 10.2 to 11.2 single instance

Martin Bach-Upgrade ASM 10.2 to 11.2 single instance

10-How to use resumable operations in Oracle

Stewart Bryson-Data Warehouse Fault Tolerance Part 1: Resuming

11-How to restart your ETL process with restore points ?

Stewart Bryson-Data Warehouse Fault Tolerance Part 2: Restarting

12-How to use incremental backup for no logging operations for recovery in DW environments?

Stewart Bryson-Data Warehouse Fault Tolerance Part 2: Restoring

13-Beginners guide to index block dumps and treedumps Part-1

Richard Foote-Index Block Dumps and Index Tree Dumps Part I: (Knock On Wood)

14-NVL and COALESCE performance comparison?

Scott Wesley-Short-circuit Evaluations – moving away from NVL

Scott Wesley-One more COALESCE vs NVL example to finish the week

15-How to drill down and detail sys_time_model data vs2?

Charles Hooper-Working with Oracle’s Time Model Data 3

16-How to put gather_plan_statistics hint without changing the code by sql profiles?

Kerry Osborne-gather_plan_statistics

17-How to check/extract unsuccessful connection attempts in listener_log ?

Martin Bach-Check for non-successful connection attempts in listener.log

18-Why does Oracle use undo when there is already redo logs? (Must read)

Jonathan Lewis-Why Undo?

19-How to configure ASYNC-IO on HP/UX?

John Hallas-OER 27064: cannot perform async I/O to file – HPUX

20-How to convert cross platform single instance to RAC with RCONFIG?

Syed Jaffar Hussain-My experience of converting a cross platform (AIX to HP) single instance database to RAC using RCONFIG tool

21-How to backup Oracle to cloud storage ?

Alisher Yuldashev -Let Your Oracle Backups Be Up in the Clouds

22-Fundemantals of Performance Tuning (Must read)

Cary Milsap-Thinking Clearly About Performance

23-Solutions to Permission problems with fuse and DBFS

Kevin Closson-Oracle Database 11g Database
File System (DBFS). Common DBFS Problems and Solutions. Part II.

24-How to send execution plan to excell and show execution plans on web

Charles Hooper-Automated DBMS_XPLAN, Trace, and Send to Excel

25-Weird Transitive Closure behaviour in execution plan which causes ignorance of hints

Jonathan Lewis-Ignoring Hints-2

26-How to do initial server pool configuration before database creation in 11GR2?

Martin Bach-Server Pool experiments in RAC 11.2

27-Problems with usage of Append_Values and solutions

Gary Myers-Append Values and how not to break the database

28-How does Subquery factoring work?

Tim Hartley-Getting Recursive

Blogroll Report 12/02/2010 – 19/02/2010–>

March 5, 2010

Blogroll Report 29/01/2009 – 05/02/2010

Filed under: Blogroll Report — coskan @ 9:59 am

<—- Blogroll Report 22/01/2010 – 29/01/2010


1-Oracle 0 day exploit
Miladin Modrakovic-Oracle Exploit Published 11g R2
2-Faulty quotes about Non Default Block Sizes
Charles Hooper-Faulty Quotes 5 – Block Sizes
3-Common mistake done after adding and index to see how it performs
Hemant K Chitale-Some Common Errors – 1 – using COUNT(*)
4-RAC One Node and connection failover handling
Marcin Przepiorowski-Oracle RAC One Node – part 3
5-How to set PCTFREE ?
Anurag Tibrewal-PCTFREE
6-How to use average active sessions and available CPU for tuning ?
Kyle Hailey-Tuning the Application, Database and Hardware
7-How to get workload character of your Database from AWR data?
Karl Arao-Workload characterization using DBA_HIST tables and kSar
8-Flash Cache and Flash drive performance comparisons
Guy Harrison-Flash tablespace vs. DB Flash Cache
9-How to use NMAP for network diagnosing
Frits Hoogland-Must have network diag tool NMAP
10-How to discover device names for LUNs when LUN number is known?
Saurabh Sood-How To Discover Disk Name When LUN Number is Known (OEL5)
11-Parallel queries and sort performance behaviour
Kellyn Pedersen-When Parallel is Wrong
12-How to use CATALOG command to catalog missing tape backups?
Marcin Przepiorowski-RMAN and SBT_TAPE backup pieces – how to catalog it
13-How to audit sys into an OS file owned by root?
Uwe Hesse-How to audit sys into an OS file owned by root
14-How to use dbms_parallel_execute package in 11GR2?
Tim Hall- DBMS_PARALLEL_EXECUTE
15-How to share read-only tablespaces between databases (comments***)?
Uwe Hesse-Sharing READ ONLY Tablespaces between Databases
16-Documentation bugs during pre-requisite checks on Solaris
Luo Donghua-Documentation Bug for Oracle database installation on Solaris
17-How to report if there was an unrecoverable operation happened?
Catherine Chow-Oracle: What is an Unrecoverable Data File?
18-How to use latch classes in case you really want to change the spin count and cause minimal effect ?
Craig Shallahamer – Gotta low latch class
19-All about gathering execution plans
Charles Hooper-Execution Plans – What is the Plan, and Where Do I Find It?
Charles Hooper-Execution Plans – What is the Plan, and Where Do I Find It 2?
Charles Hooper-Execution Plans – What is the Plan, and Where Do I Find It 3?
20-Faulty quotes about %100 CPU utilization ? (comments)
Charles Hooper-Faulty Quotes 6 – CPU Utilization
21-SQL Tuning Advisor generated SQL Profiles and manual sql profile (comments)
Kerry Osborne-Single Hint SQL Profiles
Kerry Osborne-SQL Tuning Advisor Profiles
22-How to do pivoting pre-11G?
Marc Billette-Easy Pivot Query Result in pre-11g Oracle

Blogroll Report 05/02/2010 – 12/02/2010 –>

February 21, 2010

Blogroll Report 22/01/2009 – 29/01/2010

Filed under: Blogroll Report — coskan @ 3:24 am

<—- Blogroll Report 15/01/2010 – 22/01/2010

1-How is upcoming code ignored after select into returns no rows?
Timo Raitalaakso-Select Into

2-How does format command have an impact on SAN?
hpdba-Don’t Blame the Database, part three

3-How to sort “Transport endpoint is not connected” error with DBFS?
Kevin Closson-Oracle Database 11g Database File System (DBFS). Common DBFS Problems and Solutions. Part I.

4-Internals of goldengate extract process when ASM is used
Alex Fatkulin-Oracle GoldenGate Extract Internals, Part II

5-How to sort ‘PLAN_TABLE’ is old version problem
Jonathan Lewis-old plan table

6-How to configure DNS for SCAN in 11GR2 for personal tests ? (easier zone configuration with H2N )
Harald van Bredeerode-How to setup a private DNS for your virtual cluster

7-How to use column aliases with SQL queries? (comments)
Jonathan Lewis-Aliases

8-PGA aggregate target and swap relationship
Krishna Manoharan-ora-4030 errors and swap sizing on DataWarehousing platforms

9-How to customise getid for PL/SQL?
Mike Smithers-Customising gedit for PL/SQL on Ubuntu 9.10

10-PSU dependancy checking can cause problems in 11G
John Hallas-PSU dependancy checking with ASM now enforced in 11G

11-How to syncronize out of sync tables with logical standby.
Anand Prakash-Logical Standby Out of SYNC

12-There is 1 working Oracle TNS Listener among nearly 69,000 random IP addresses.
Dennis Yurichev-Random Oracle hosts statistics

13-How to export resultset to CSV with function?
Damir Vadas-SQL based export data to file

14-How to read 10046 raw trace file?
Charles Hooper-10046 Extended SQL Trace Interpretation 2

15-How to move OCR and Moving disk into ASM in 11GR2?
Martin Bach-How to move OCR and voting disks into ASM

16-Direct path reads with non parallel queries in 11G and 10949 event relationship
Craig Shallahamer-The mystery surrounding 11g and direct reads

17-How to chance priority and time quanta of oracle processes?
Krishna Manoharan-Understanding CPU time as oracle wait event

18-Notes on various Oracle Parameters
Charles Hooper-Notes about various Oracle Parameters

19-How does CBO behave with different PX Distribution Methods ?
Christian Antognini-Does the Query Optimizer Cost PX Distribution Methods?

20-How to set up your own free ASH ?
Kyle Hailey-ASH simulation Scripts (SASH and SASHMON)

21-Redo related wait events
Kyle Hailey-Oracle Redo log waits

22-How to use windowing aggregate functions for avoiding self join?
Luo Donghua-Performance Tuning using Windowing Aggregate Functions to avoid table self-join

23-Information about policy managed database in 11GR2
Martin Bach-Policy managed databases

24-ASSM oddities with insert after uncommitted delete
Charles Hooper-Database using ASSM Tablespace Exhibits Slow Insert Performance After an Uncommitted Delete

25-Tips and tricks about some Oracle topics
Charles Hooper- Neat Tricks

26-Different CBO behaviours for functions in SQLs
Randolf Geist-When your projection is not cost-free

Blogroll Report 29/01/2010 – 05/02/2010 –>

February 12, 2010

Blogroll Report 15/01/2009 – 22/01/2010

Filed under: Blogroll Report — coskan @ 6:59 pm

<—- Blogroll Report 08/01/2010 – 15/01/2010

1-How to install grid control 10.2.0.5 on OEL?
Tim Hall-Grid Control (10.2.0.5) on OEL 4.8 x86_64…

2-How to get around from “Cannot open your terminal ‘/dev/pts/” with screen command ?
Martin Bach-screen fails with “Cannot open your terminal ‘/dev/pts/x’ – please check.”

3-How to use case instead of if ?
Alex Nuijten-The Case for the Case Statement

4-How to sort “clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk” on RAC?
Murali Vallath-OCR Repair

5-How to solve Data Pump “ORA-39021: Database compatibility version XXXX is not supported.” problem?
Mike Rothouse-Oracle 11g Data Pump

6-StaticConnectIdentifier for dataguard environments in 11G
Asanga Pradeep-VIP, Dataguard Broker & RAC-

7-How does session PGA usage behaves when session gives error?
Charles Hooper-PGA Memory – The Developer’s Secret Weapon for Stealing All of the Memory in the Server

8-How to get around from ORA-14074 with exchange partition ?
Marko Sutic-ORA-14074 when adding partition at the beginning of a table

9-How to use bind variables and arrays with Python for best Oracle performance ?
Guy Harrison-Best practices with Python and Oracle

10-How does “coalesce_sq” hint works in 11GR2?
Timur Akhmedeev-Coalesce_SQ

11-How does in memory parallel execution works in 11GR2?
Mark Rittman-In-Memory Parallel Execution in Oracle Database 11gR2

12-Fundamentals of table compression for data warehousing ?
Greg Rahn-The Core Performance Fundamentals Of Oracle Data Warehousing – Table Compression

13-How to visualize live Oracle Performance data in scrolling graphs with excel?
Charles Hooper- Excel – Scrolling Oracle Performance Charts

14-How to gather current temp usage of active sessions and workaround for v$sort_usage sqlid column?
Kerry Osborne-Temp Usage

15-Syslogd prevents Oracle HAS fail to start
Surachart Opun-Oracle HAS startup failed, After stop syslogd

16-How to improve performance with pipelined functions?
Adrian Billington-improving performance with pipelined table functions

17-Internals of goldengate extract process
Alext Fatkulin-Oracle GoldenGate Extract Internals, Part I

18-Clusterware Upgrade Story from 10.2 to 11.2
Martin Bach-Upgrade Clusterware 10.2 to 11.2

19-How to use logon triggers for preventing access of certain programs ?
Damir Vadas-Trace Analyzer “Directory alias does not exist in DBA_DIRECTORIES”

20-Controlfile sequential reads caused by subquery factoring
Timur Akhadeev -Control File Sequential Read

21-Why does Full table scans do not read blocks up to high water mark sometimes?
Charles Hooper-How Many Topics can be Packed into a Short OTN Thread?

22-How to set up Data Guard with Grid Control for 11GR2 DB?
Tim Hall- Oracle Database 11g Release 2 Data Guard Setup using Oracle Grid Control

23-How to sort -ORA-01102: cannot mount database in EXCLUSIVE mode?
Muhammed Arju-ORA-01102: cannot mount database in EXCLUSIVE mode

Blogroll Report 22/01/2010 – 29/01/2010 –>

February 2, 2010

Blogroll Report 08/01/2009 – 15/01/2010

Filed under: Blogroll Report — coskan @ 6:57 pm

<—- Blogroll Report 01/01/2010 – 08/01/2010

1-How to use v$sesstat when wait interface is not enough?
Tanel Poder-Beyond Oracle Wait Interface – Part 2

2-How to get Oracle process info from OS on Windows?
Charles Hooper-Drilling into Session Detail from the Operating System – On the Windows Platform

Charles Hooper-Drilling into Session Detail from the Operating System – On the Windows Platform 2

3-How does force_match work with SQL_PROFILEs?
Jonathan Lewis-force_match

4-How to report bar chart of frequency histograms in sqlplus ?
Guy Harrison-Histograms of histograms

5-How to get around from _pga_max_size limitation?
Kelly Pedersen-When PGA Size is Not Enough

6-How to drill down and detail sys_time_model data ? –Highly Recommended
Charles Hooper- Working with Oracle’s Time Model Data

Charles Hooper-Working with Oracle’s Time Model Data 2

7-Diagnosing CBC Latches and their causes and possible solutions
Kyle Hailey-Oracle Waits – latch : cache buffer chains

8-Oracle and Netapp best practices
Neil Gerren-Oracle on NetApp White Papers

9-How to setup dataguard with OEM Grid Control?
Porus Homi Havewala-Easy Disaster Proof Production

10-How does execution plan gathering can be wrong?
Charles Hooper-Explain Plan Lies, Autotrace Lies, TKPROF Lies, What is the Plan?

11-How to drop partitions from interval partitioned table?
Harald van Breederode-dropping inverval partitions

12-How does Join Elimination works in 11G for unnecessary self joins?
Christian Antognini-Join Elimination

13-How to recover from “ORA-00752: recovery detected a lost write of a data block” on Standby in 11G?
Mladin Modrakovic-ORA-00752 on standby and DB_ULTRA_SAFE parameter

14-How to troubleshoot buffer busy waits?
Kyle Hailey-Oracle Wait: Buffer Busy Wait

15-CPU cost and execution time relationship
Richard Foote-The CPU Costing Model: A Few Thoughts Part V (Reality)

16-Global Index behaviour with DDL on empty partitions?
Hemant K Chitale-DDL on Empty Partitions — are Global Indexes made UNUSABLE ?

17-How does RELY clause works with foreign key constraints?
Timo Raitalaakso-Rely constraint validated mess

18-How to hide sql query from spool file ?
Martin Bach-hide SQL query in spool file

19-How to use DBBS API’s ?
Tim Hall-Oracle Database File System (DBFS) PL/SQL APIs

20-Introduction to Oracle Analytic Functions
Adrian Billington-introduction to 8i analytic functions

21-How does not null constraint effect updates with subqueries?
Charles Hooper-The Effects of Potential NULL Values in Row Sources During Updates Using an IN Subquery

22-How to rename cloned ASM diskgroups in 10GR2 – 11GR2 renamedg alternative?
Martin Bach-Rename cloned diskgroup in Oracle 10.2

23-Possible reason begin poor performing first_rows_n ?
Jonathan Lewis- first_rows_N again

24-Side effects of setting db_file_multiblock_read_count on 10GR2 and above
Kerry Osborne-Autotuned DB_FILE_MULTIBLOCK_READ_COUNT

25-How to do what if analysis for index clustering factor?
Randolf Geist-CLUSTERING_FACTOR What-If Analysis

26-11GR2 edition based redefinition detailed analysis
Christian Antognini-Edition Based Redefinition

Blogroll Report 15/01/2010 – 22/01/2010 –>

January 29, 2010

Working with statspack-part-1b-Solution/Answer/another Question

Filed under: Diary, Performance — coskan @ 6:58 pm

<—-Part 1a

In part 1a I was trying to find a problem to troubleshoot and I spotted 2 possible problems and drilled down into one of them and stuck. Below is the list of what I did till I stuck and started to spend time on doing more research than it actually needs because I forgot the most important and easy step which I will explain in a moment.

What I did so far,

  • found the possible bottleneck on graph
  • cross-checked the bottleneck with DB time
  • checked the statspack and found “row cache lock” “log file switch (checkpoint incomplete)”, “buffer busy” waits as queue time
  • checked if Dictionary Cache statistics gives clue – No luck -at least for me
  • checked if DB did more redo than redo size can handle- redo activity was way lower than actual office hours
  • checked the segments for buffer busy waits and found only 1 segment caused all the wait but buffer busy waits and row cache lock are not related are they ?

What else I have in my hand ? I have a skipped snapshot slot. One of the statspack snaps could not be gathered.

What should I have done, before going and reading and learning more (which wasn’t a waste of time for learning but will be a waste of time in pressure situation) about those top events ?

Answer was I should have checked the alert log especially when my information is minimum with this historic problems. I was on call that week and I “assumed” that nothing was wrong with DB because I did not get called. Assumptions without proofs can be big problem and big time waster for your troubleshooting practices. After 4 hours of searching on the net for causes of row cache lock (which is not a common event), I decided to check alert log because of that skipped snapshot. What I found ?

Wed Jan 20 21:47:09 2010
Errors in file j:\data\oracle\trace\prod01\bdump\prod01_arc1_7976.trc:
ora-19816: Message 19816 not found; No message file for product=RDBMS, facility=ORA; arguments: [db_recovery_file_dest]
ora-27044: Message 27044 not found; No message file for product=RDBMS, facility=ORA
OSD-04008: WriteFile() failure, unable to write to file
O/S-Error: (OS 112) There is not enough space on the disk.
....
....
....
....
Wed Jan 20 22:28:48 2010
kcrrdmx: Successful archiving of previously failed ORL
Archiver process freed from errors. No longer stopped
Wed Jan 20 22:28:49 2010

As you see, I found that the database was hanged because the disk was full . Why was it full ? Looks like a mistake maybe not. It might be because we do exports on the same disk with db_file_recovery_area and probably One of my team mates took extra export during the day and forgot to delete it. With the redo amount generated that day it is unlikely that we can fill the logical limit and we did not get any %85 warning on that day on the alert log. Why did not I get called, because the issue lasted 45 minutes and probably technical guys did not check Big Brother monitoring that time:) I have no clue how it solved

I asked myself what is it to do with “row cache lock” why not “log file switch (archiving needed)”

Definition of the expected wait event fits with the alert log so it should be that event or should it ?

log file switch (archiving needed)

Waiting for a log switch because the log that the LGWR will be switching into has not been archived yet. Check the alert file to make sure that archiving has not stopped due to a failed archive write. To speed archiving, consider adding more archive processes or putting the archive files on striped disks.

Wait Time: 1 second

Parameters: None

Let’s check how it was;

SQL> @sp_evtrends
Please enter the ORACLE_SID value: prod01
Please enter the report begin date: sysdate-11
Please enter the report end date:sysdate-8

Some useful database statistics to search upon:
------------------------------------------------------------
        ARCH random i/o
        ARCH sequential i/o
	....
	.....
	.....
What statistic do you want to analyze? log file switch (archiving needed)

...
...
...
Daily trends for "log file switch (archiving needed)"...

Day    Hour   Statistic Name                         Secs Waited Percentage of total over all hours for each day
------ ------ ------------------------------ ------------------- ------------------------------------------------
17:00  log file switch (archiving needed)                0.00
18:00  log file switch (archiving needed)                0.00
19:00  log file switch (archiving needed)                0.00
20:00  log file switch (archiving needed)                0.00
21:00  log file switch (archiving needed)                0.00
22:00  log file switch (archiving needed)                0.00
23:00  log file switch (archiving needed)                0.00

How come ? Alert log says I was getting “log file switch (archiving needed)” but why my system event doesn’t show it? Lets check cumulative wait on this event since server started (12/30/2009)

SQL> select * from v$system_event where event like 'log file switch (archiving needed)';

EVENT                                                            TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TIME_WAITED_MICRO   EVENT_ID WAIT_CLASS_ID WAIT_CLASS# WAIT_CLASS
---------------------------------------------------------------- ----------- -------------- ----------- ------------ ----------------- ---------- ------------- ----------- ---------------
log file switch (archiving needed)                                    132442          80465     8062918        60.88        8.0629E+10  681532028    3290255840           2 Configuration

It says my DB waited on this event before but when ?

SQL> @sp_event_delta2 "log file switch (archiving needed)"

   SNAP_ID SNAP_TIME         EVENT                                                            TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED_MICRO
---------- ----------------- ---------------------------------------------------------------- ----------- -------------- -----------------
      7517 20100108 06:07:52 log file switch (archiving needed)                                        27              5           7022081
      7766 20100113 06:18:58 log file switch (archiving needed)                                         4              0            194101
      7831 20100114 13:49:06 log file switch (archiving needed)                                        16              0           5062268
      7892 20100115 21:05:16 log file switch (archiving needed)                                    132395          80460        8.0617E+10

I have 4 more day and time slots for this event and looks like on 15/01/2010 I had a similar issue. Lets check further for that day.

SQL> @sp_systime2
Please enter the ORACLE_SID value: prod01
Please enter the report begin date: sysdate-15
Please enter the report end date:sysdate-13

                                                                        Total             Cum
             Service                                                  Seconds    % of    % of
Day          or Wait  Name                                              Spent   Total   Total
------------ -------- ----------------------------------- ------------------- ------- -------

14-JAN       Service  SQL execution                                 32,871.29   39.24   39.24
             Wait     db file sequential read                       14,875.81   17.76   56.99
             Service  Recursive SQL execution                       13,203.42   15.76   72.75
             Wait     log file parallel write                        3,381.76    4.04   76.79
             Wait     db file scattered read                         3,069.28    3.66   80.45
             Wait     log file sync                                  2,651.77    3.17   83.62
             Wait     db file parallel write                         2,330.67    2.78   86.40
             Wait     Backup: sbtwrite2                              2,323.22    2.77   89.17
             Wait     enq: TX - row lock contention                  1,515.07    1.81   90.98
             Wait     Log archive I/O                                1,387.98    1.66   92.64
             Service  Parsing SQL                                    1,138.64    1.36   94.00
             Wait     log file sequential read                         814.92    0.97   94.97

15-JAN       Wait     log file switch (archiving needed)            80,616.90   31.14   31.14
             Wait     buffer busy waits                             41,401.25   15.99   47.13
             Wait     row cache lock                                39,873.16   15.40   62.53
             Service  SQL execution                                 37,210.97   14.37   76.90
             Wait     db file sequential read                       15,267.63    5.90   82.79
             Service  Recursive SQL execution                       13,166.82    5.09   87.88
             Wait     db file scattered read                         5,490.75    2.12   90.00
             Wait     log file parallel write                        5,157.94    1.99   91.99
             Wait     log file sync                                  3,793.75    1.47   93.46
             Wait     db file parallel write                         3,447.89    1.33   94.79
             Wait     enq: TX - row lock contention                  3,175.70    1.23   96.02
             Wait     Log archive I/O                                2,231.86    0.86   96.88

....
....
...
                                                                              Total             Cum
                   Service                                                  Seconds    % of    % of
Day          Hour  or Wait  Name                                              Spent   Total   Total
------------ ----- -------- ----------------------------------- ------------------- ------- -------
15-JAN      19:00 Service  SQL execution                                  2,193.28   34.66   34.66
                   Wait     db file sequential read                        1,423.59   22.50   57.16
                   Wait     log file parallel write                          543.65    8.59   65.75
                   Wait     db file parallel write                           445.41    7.04   72.79
                   Wait     log file sequential read                         344.37    5.44   78.24
                   Wait     db file scattered read                           304.99    4.82   83.06
         *** 21:00 Wait     log file switch (archiving needed)            80,616.90   48.82   48.82
                   Wait     buffer busy waits                             41,336.81   25.03   73.86
                   Wait     row cache lock                                39,873.16   24.15   98.01
                   Service  SQL execution                                  1,637.31    0.99   99.00
                   Wait     db file sequential read                          359.08    0.22   99.22
                   Service  Recursive SQL execution                          325.17    0.20   99.41
             22:00 Service  SQL execution                                  3,763.74   53.76   53.76
                   Wait     db file sequential read                        1,475.86   21.08   74.84
                   Wait     db file scattered read                           947.84   13.54   88.38
                   Wait     db file parallel write                           313.39    4.48   92.86
                   Service  Recursive SQL execution                          273.72    3.91   96.77

Between 20-21 (attention!! there is no 20:00 data which means problem is not between 21-22 but 20-21)
log file switch (archiving needed) is 48 percent of our response time. Which is very expectable if there was an archiver hang
problem on that day.

Lets cross-check if there was a snapshot at 20

SQL> @sp_stats_snap sysdate-14 sysdate-13

SNAP_ID SNAP_TIME
---------- -----------------
...
...
      7890 20100115 19:19:18
      7891 20100115 19:49:18
      7892 20100115 21:05:16  ----where is 20:00
      7893 20100115 21:35:18
      7894 20100115 22:05:18
      7895 20100115 22:35:18
      7896 20100115 23:05:18
      7897 20100115 23:35:18

The reason our report shows the waits between 21-22 because query checks the time when the snapshot gathered so it assumes those happened at 21,
Tim wrote the queries for hourly interval so it sometimes mislead you in this kind of situations be carefull and pay more attention.

Alert log also says the problem started around 20:05 and ended at 21.

Fri Jan 15 20:00:45 2010
Errors in file j:\data\oracle\trace\prod01\bdump\prod01_arc1_7976.trc:
ORA-19816: Message 19816 not found; No message file for product=RDBMS, facility=ORA; arguments: [db_recovery_file_dest]
ORA-27044: Message 27044 not found; No message file for product=RDBMS, facility=ORA
OSD-04008: WriteFile() failure, unable to write to file
O/S-Error: (OS 112) There is not enough space on the disk.
...
...
...
...
kcrrdmx: Successful archiving of previously failed ORL
Archiver process freed from errors. No longer stopped
Fri Jan 15 21:05:15 2010

So for the exactly same issue (maybe not ) our DB did not cry for “log file switch (archiving needed)” why ?

I am thinking about possible answers but I need to test meanwhile maybe you want to share your ideas
before I wrote the 3th part for my answer?

Today I kept my promise about giving the reason but I have to close an answering post with another question
which is the best part of troubleshooting, it never ends.

Moral of the story, “Check the alert log”

January 27, 2010

Working with statspack-part-1a-Diagnosis

Filed under: Basics, Performance — coskan @ 6:24 pm

Part 1b—->

Note: Post is a bit long and might be boring, if you are already good with statspack, but there is a question at the end if you would like to answer

Yesterday, I decided to look for something to troubleshoot without waiting for any user complaints so I can write something useful here :)
Primary reason for this practice was to get used to interpret statspack reports to find possible bottlenecks
secondary reason was because I was bored about being underutilized.
I always feel comfortable troubleshooting session wide problems, non-historic performance problems or historic problems with Diagnostics Pack,
but I am not that experienced on statspack.
In my current environment I don’t have Diagnostic Pack option so I have to make good friendship with statspack and
the best way to be a good friend is to spend more time with your new friend as I am going to explain how.

Characteristics of the DB I will work on:
daily load is between 08:00 and 19:00
export at 21:00 running for roughly 1 hour
DB backup start 23:00

There are couple of ways to spot a possible bottlenecks without getting individual reports.

—First one is from Tanel Poder and his famous Perfsheet. This tool is already configured and ready to be used with statspack.
The query he uses to generate the report is the same idea behind the statspack delta views of  Tim Gorman. Lets start

first screen is login screen (you need to activate the macros).

Inputs for statspack report is
Data Source- TNS Entry of the DB
Username/Password- username with statspack schema objects read privilege
Which Instance-instance name (lower case or modify the related sql in Queries sheet with upper function )
Which Query-Statspack CPU+Wait+Stats Hourly
From Date-start date
To Date-end date
then press show me. You will get warnings during the graph generation saying PivotTable report will not fit on the sheet … Just Press OK till the warnings end
Your graph might be empty on Office 2007. Just go to the Report Filter and select WAIT from EVENT_TYPES to see any abnormality on wait events.

First Graph is WAIT graph like below. As you see, I was able to spot a possible problem on 20/01/2010 at first try.

Lets make it more visible by adding top 10 filter to Legend Fields Section.

Another option is focus on the date where the abnormality is happened. Lets select only 20th. FYI top 10 filter is still on

Without the top 10 filter. Graph is slightly different but still giving what we need.

Lets Focus on only time interval between 18 and 23

As you see there is an abnormal increase on row cache lock logfile switch and buffer busy

Lets do double-check with system stats. By selecting all hours and available dates and events your graph will be a bit misleading because of very high metrics like below

What can we do ? DB time is something we want to decrease when troubleshooting, so lets focus on DB Time by filtering Legend Fields on DB Time

We have 2 peaks, which is a bit confusing for the initial assumption of problem timing. Lets focus on 20th again

There are two peaks  between 13-15 and 21-23. We have an interesting situation here in terms of DB time. From wait event perspective abnormality is between 20-22 but DB time perspective we have 1 more time slot between 13-15 .

At this point  I quit Perfsheet. I spotted what I needed for initial problem chase now I can go to other options before, gathering the report. (these options are way more easier than statspack report generation thats why I did not directly go to the statspack)

Second option is from Tim Gorman himself. sp_systime which I find very useful in terms of overall response time analysis. ( i needed to change the script to focus between the dates instead of days before sysdate option)

Here is the ouput

First Part is overall response time for that day.


SQL> @sp_systime2
Please enter the ORACLE_SID value: PROD1
Please enter the report begin date: sysdate-8
Please enter the report end date:sysdate-5

                                                                        Total             Cum
Service Seconds % of % of
Day or Wait Name Spent Total Total
------------ -------- ----------------------------------- ------------------- ------- -------
19-JAN       Service  SQL execution                                115,326.24   78.78   78.78
             Wait     db file sequential read                        7,697.33    5.26   84.04
             Wait     db file scattered read                         6,213.01    4.24   88.29
             Service  Recursive SQL execution                        4,638.75    3.17   91.46
Wait Backup: sbtwrite2 2,671.85 1.83 93.28
Wait Backup: sbtbackup 2,081.53 1.42 94.70
             Wait     log file parallel write                        1,423.58    0.97   95.67
             Service  Parsing SQL                                    1,109.49    0.76   96.43
             Wait     db file parallel write                           913.09    0.62   97.06
             Wait     log file sync                                    761.78    0.52   97.58
Wait Datapump dump file I/O 631.13 0.43 98.01
             Wait     Log archive I/O                                  549.08    0.38   98.38

20-JAN       Service  Recursive SQL execution                      255,597.67   65.42   65.42
             Wait     row cache lock                                48,098.17   12.31   77.74
Wait log file switch (checkpoint incompl 41,613.30 10.65 88.39
             Wait     buffer busy waits                             13,474.25    3.45   91.84
             Wait     db file sequential read                        9,788.51    2.51   94.34
             Wait     db file scattered read                         6,831.08    1.75   96.09
Wait Backup: sbtwrite2 3,512.51 0.90 96.99
             Wait     log file parallel write                        1,897.38    0.49   97.48
Wait Datapump dump file I/O 1,674.20 0.43 97.90
             Wait     db file parallel write                         1,389.77    0.36   98.26
             Wait     log file sync                                  1,073.14    0.27   98.53
             Service  Parsing SQL                                      964.58    0.25   98.78

21-JAN       Service  SQL execution                                 33,045.26   44.91   44.91
             Service  Recursive SQL execution                       12,785.72   17.38   62.28
             Wait     db file sequential read                        8,296.99   11.28   73.56
             Wait     db file scattered read                         6,731.96    9.15   82.71
Wait Backup: sbtwrite2 3,340.23 4.54 87.25
             Wait     log file parallel write                        1,856.99    2.52   89.77
             Wait     db file parallel write                         1,359.19    1.85   91.62
             Wait     log file sync                                  1,131.31    1.54   93.16
             Service  Parsing SQL                                    1,011.46    1.37   94.53
Wait enq: TX - row lock contention 861.24 1.17 95.70
             Wait     Log archive I/O                                  686.01    0.93   96.63
Wait Backup: sbtbackup 402.52 0.55 97.18

If you check the top 5 you will easily spot the row cache lock, log file switch and buffer busy waits are different from the other days

Second part of the report is hourly based response time for the days. I truncated the output for 8 hours but the full is here ***

 12:00 Service  SQL execution                                  6,268.92   76.66   76.66
       Wait     db file sequential read                          634.26    7.76   84.42
       Wait     db file scattered read                           344.02    4.21   88.63
       Service  Recursive SQL execution                          230.23    2.82   91.44
       Wait     log file parallel write                          147.58    1.80   93.25
       Wait     log file sync                                    129.94    1.59   94.84
 13:00 Service  SQL execution                                  6,942.33   83.97   83.97
       Wait     db file sequential read                          375.15    4.54   88.51
       Wait     db file scattered read                           207.74    2.51   91.02
       Service  Recursive SQL execution                          200.86    2.43   93.45
Wait enq: TX - row lock contention 162.83 1.97 95.42
       Wait     log file parallel write                           94.08    1.14   96.56
 14:00 Service  Recursive SQL execution                      241,820.91   99.35   99.35
       Wait     db file scattered read                           574.84    0.24   99.59
       Wait     db file sequential read                          515.74    0.21   99.80
       Wait     log file parallel write                           97.09    0.04   99.84
Wait enq: TX - row lock contention 87.19 0.04 99.88
       Service  Parsing SQL                                       81.25    0.03   99.91
 15:00 Service  SQL execution                                  3,620.32   62.09   62.09
       Service  Recursive SQL execution                        1,210.42   20.76   82.86
       Wait     db file sequential read                          328.60    5.64   88.49
       Wait     db file scattered read                           294.73    5.06   93.55
       Service  Parsing SQL                                       73.42    1.26   94.81
Wait enq: TX - row lock contention 72.03 1.24 96.04
 16:00 Service  SQL execution                                  3,667.78   70.86   70.86
       Wait     db file sequential read                          429.87    8.30   79.16
       Service  Recursive SQL execution                          371.24    7.17   86.33
       Wait     db file scattered read                           215.28    4.16   90.49
Wait enq: TX - row lock contention 144.67 2.79 93.29
       Wait     log file parallel write                          100.81    1.95   95.24
 17:00 Service  SQL execution                                  2,904.54   70.63   70.63
       Wait     db file sequential read                          397.42    9.66   80.29
       Service  Recursive SQL execution                          202.13    4.92   85.21
       Wait     db file scattered read                           138.29    3.36   88.57
       Wait     log file sync                                     94.87    2.31   90.88
       Wait     log file parallel write                           90.94    2.21   93.09
 18:00 Service  SQL execution                                  1,791.86   65.17   65.17
       Wait     db file sequential read                          302.02   10.98   76.15
       Service  Recursive SQL execution                          176.04    6.40   82.56
       Wait     db file scattered read                           151.83    5.52   88.08
       Wait     log file parallel write                           80.44    2.93   91.00
       Wait     log file sync                                     63.46    2.31   93.31
 19:00 Service  SQL execution                                  2,261.88   44.86   44.86
       Wait     db file sequential read                          758.41   15.04   59.91
       Wait     db file scattered read                           540.35   10.72   70.62
       Wait     db file parallel write                           272.36    5.40   76.03
       Wait     log file parallel write                          269.47    5.34   81.37
       Wait     log file sequential read                         154.58    3.07   84.44
 20:00 Service  SQL execution                                  3,517.30   73.29   73.29
       Wait     db file sequential read                          467.34    9.74   83.03
       Wait     db file scattered read                           251.47    5.24   88.27
       Wait     log file parallel write                          161.71    3.37   91.64
       Wait     db file parallel write                            92.73    1.93   93.57
       Wait     Log archive I/O                                   90.24    1.88   95.45
 21:00 Service  SQL execution                                  2,875.04   45.72   45.72
Wait Datapump dump file I/O 1,504.28 23.92 69.64
       Wait     db file sequential read                          873.68   13.89   83.53
       Wait     db file scattered read                           417.53    6.64   90.17
       Wait     log file parallel write                          130.15    2.07   92.24
       Wait     db file parallel write                           106.51    1.69   93.93
 22:00 Wait     row cache lock                                48,097.61   41.83   41.83
Wait log file switch (checkpoint incompl 41,594.75 36.17 78.00
       Wait     buffer busy waits                             13,454.89   11.70   89.70
       Service  Recursive SQL execution                        8,958.04    7.79   97.49
       Wait     db file sequential read                        1,485.98    1.29   98.79
       Wait     db file scattered read                           856.65    0.75   99.53
23:00 Wait Backup: sbtwrite2 2,801.73 76.03 76.03
Wait Backup: sbtbackup 352.22 9.56 85.59
       Wait     db file scattered read                           140.16    3.80   89.39
Wait Backup: sbtclose2 93.25 2.53 91.92
       Wait     db file sequential read                           78.52    2.13   94.05
       Service  SQL execution                                     65.77    1.78   95.84

What do we see here ? The graph told us there might be a possible problem between 13-15 and 21-23.

First; focus on 13-15

13:00 Service  SQL execution                                  6,942.33   83.97   83.97
       Wait     db file sequential read                          375.15    4.54   88.51
       Wait     db file scattered read                           207.74    2.51   91.02
       Service  Recursive SQL execution                          200.86    2.43   93.45
Wait enq: TX - row lock contention 162.83 1.97 95.42
       Wait     log file parallel write                           94.08    1.14   96.56
 14:00 Service  Recursive SQL execution                      241,820.91   99.35   99.35
       Wait     db file scattered read                           574.84    0.24   99.59
       Wait     db file sequential read                          515.74    0.21   99.80
       Wait     log file parallel write                           97.09    0.04   99.84
Wait enq: TX - row lock contention 87.19 0.04 99.88
       Service  Parsing SQL                                       81.25    0.03   99.91

Second ; focus on 21-23

21:00 Service  SQL execution                                  2,875.04   45.72   45.72
Wait Datapump dump file I/O 1,504.28 23.92 69.64
       Wait     db file sequential read                          873.68   13.89   83.53
       Wait     db file scattered read                           417.53    6.64   90.17
       Wait     log file parallel write                          130.15    2.07   92.24
       Wait     db file parallel write                           106.51    1.69   93.93
 22:00 Wait     row cache lock                                48,097.61   41.83   41.83
Wait log file switch (checkpoint incompl 41,594.75 36.17 78.00
       Wait     buffer busy waits                             13,454.89   11.70   89.70
       Service  Recursive SQL execution                        8,958.04    7.79   97.49
       Wait     db file sequential read                        1,485.98    1.29   98.79
       Wait     db file scattered read                           856.65    0.75   99.53

Nothing common but tells lots of things. First one spent %99 of response time on Recursive SQL Execution which means no wait but on CPU doing something which we don’t know yet.
Second one spent %41 of its time on row cache lock and %36 on log file switch. Now we are ready for the statspack to see what can we find but
before that I want to show 2 more scripts whose output is can be obtained by Perfsheet as well (like DB time)

First script is Tim Gormans sp_evtrends scripts which shows trends of events. There are two problems; sometimes unnecessarily high number of * character  shown on the screen but comparing it with the seconds waited it can easily be ignored. Another thing is that, if you focus on 1 day sometimes * character graph might be misleading as it is with excel graphs so it is better to focus on 5-10 days

Scripts asks for db_name, begin date, end date and statistics to analyze

SQL> @sp_evtrends
Please enter the ORACLE_SID value: PROD1
Please enter the report begin date: sysdate-8
Please enter the report end date:sysdate-6

Some useful database statistics to search upon:

Name
------------------------------------------------------------
ARCH random i/o
ARCH sequential i/o
ARCH wait for archivelog lock
	....
	....
	....

Daily trends for "row cache lock"...

What statistic do you want to analyze? row cache lock

Day    Statistic Name                         Secs Waited Percentage of total over all days
------ ------------------------------ ------------------- ------------------------------------------------------------
19-JAN row cache lock                                0.00

20-JAN row cache lock                          480,981.73 ************************************************************

Daily/hourly trends for "row cache lock"...

Day    Hour   Statistic Name                         Secs Waited Percentage of total over all hours for each day
------ ------ ------------------------------ ------------------- ------------------------------------------------------------
19-JAN 00:00  row cache lock                                0.00
       01:00  row cache lock                                0.00
       02:00  row cache lock                                0.00
       03:00  row cache lock                                0.00
       04:00  row cache lock                                0.00
       05:00  row cache lock                                0.00
       06:00  row cache lock                                0.00
       07:00  row cache lock                                0.00
       08:00  row cache lock                                0.00
       09:00  row cache lock                                0.00
       10:00  row cache lock                                0.00
       11:00  row cache lock                                0.00
       12:00  row cache lock                                0.00
       13:00  row cache lock                                0.00
       14:00  row cache lock                                0.00
       15:00  row cache lock                                0.00
       16:00  row cache lock                                0.00
       17:00  row cache lock                                0.00
       18:00  row cache lock                                0.00
       19:00  row cache lock                                0.00
       20:00  row cache lock                                0.00
       21:00  row cache lock                                0.00
       22:00  row cache lock                                0.00 ************************************************************
       23:00  row cache lock                                0.00

20-JAN 00:00  row cache lock                                0.00
       01:00  row cache lock                                0.00
       02:00  row cache lock                                0.00
       03:00  row cache lock                                0.00
       04:00  row cache lock                                0.00
       05:00  row cache lock                                0.00
       06:00  row cache lock                                0.00
       07:00  row cache lock                                0.00
       08:00  row cache lock                                0.00
       09:00  row cache lock                                0.00
       10:00  row cache lock                                0.00
       11:00  row cache lock                                0.00
       12:00  row cache lock                                0.00
       13:00  row cache lock                                0.00
       14:00  row cache lock                                0.00
       15:00  row cache lock                                0.00
       16:00  row cache lock                                5.65
       17:00  row cache lock                                0.00
       18:00  row cache lock                                0.00
       19:00  row cache lock                                0.00
       20:00  row cache lock                                0.00
       21:00  row cache lock                                0.00
       22:00  row cache lock                          480,976.09 ************************************************************
       23:00  row cache lock                                0.00

As you see it shows the same number of * for 19th which is wrong when you check with time spent so abnormality can be spotted by both time spent and number of *.

Second script is modified version for sys_time_model statistics called sp_sys_time_trends.

SQL> @sp_sys_time_trends
</a>Please enter the ORACLE_SID value: PROD1
Please enter the report begin date: sysdate-8
Please enter the report end date:sysdate-6

Some useful database statistics to search upon:

Name
------------------------------------------------------------
DB CPU
DB time
Java execution elapsed time
PL/SQL compilation elapsed time
PL/SQL execution elapsed time
RMAN cpu time (backup/restore)
background cpu time
background elapsed time
connection management call elapsed time
failed parse (out of shared memory) elapsed time
failed parse elapsed time
hard parse (bind mismatch) elapsed time
hard parse (sharing criteria) elapsed time
hard parse elapsed time
inbound PL/SQL rpc elapsed time
parse time elapsed
repeated bind elapsed time
sequence load elapsed time
sql execute elapsed time
What statistic do you want to analyze? DB time

Daily trends for "DB time"...

Day    Statistic Name                      VALUE Percentage of total over all days
------ ------------------------------ ---------- ------------------------------------------------------------
19-JAN DB time                        1408469.51 **************

20-JAN DB time                        4797565.14 **********************************************

Daily/hourly trends for "DB time"...

Day    Hour   Statistic Name                      VALUE Percentage of total over all hours for each day
------ ------ ------------------------------ ---------- ------------------------------------------------------------
19-JAN 00:00  DB time                        18267.2492 *
01:00  DB time                        57907.3551 **
02:00  DB time                        64889.8266 ***
03:00  DB time                        37988.8748 **
04:00  DB time                        39093.3842 **
05:00  DB time                        39267.5194 **
06:00  DB time                        56103.5928 **
07:00  DB time                        43312.1147 **
08:00  DB time                        50542.4821 **
09:00  DB time                        62883.2236 ***
10:00  DB time                        89721.9315 ****
11:00  DB time                        86621.3671 ****
12:00  DB time                        78163.5806 ***
13:00  DB time                        64324.9537 ***
14:00  DB time                        78970.2241 ***
15:00  DB time                        78037.7742 ***
16:00  DB time                        78215.1336 ***
17:00  DB time                          69875.99 ***
18:00  DB time                        61075.9801 ***
19:00  DB time                        47658.5181 **
20:00  DB time                         41372.216 **
21:00  DB time                        57569.6918 **
22:00  DB time                        65593.3167 ***
23:00  DB time                        41013.2088 **

20-JAN 00:00  DB time                        39789.4329
01:00  DB time                        76274.4764 *
02:00  DB time                        39564.2615
03:00  DB time                        37327.7534
04:00  DB time                        38174.4725
05:00  DB time                         38335.113
06:00  DB time                        55688.7588 *
07:00  DB time                        39039.2476
08:00  DB time                        45155.1028 *
09:00  DB time                        62292.3892 *
10:00  DB time                        73688.6751 *
11:00  DB time                        86222.6032 *
12:00  DB time                        79400.9817 *
13:00  DB time                        81021.4711 *
14:00  DB time                        2498694.98 *******************************
15:00  DB time                        57824.5047 *
16:00  DB time                        50252.1229 *
17:00  DB time                        39474.4176
18:00  DB time                         25728.465
19:00  DB time                         40950.006 *
20:00  DB time                        45258.7127 *
21:00  DB time                        66186.8379 *
22:00  DB time                        1177471.46 ***************
23:00  DB time                        3748.88236

As you can spot like the excell graph we have DB time increased for 14-15 and 22-23 period

Now It is time to get the statspack report and dig further.
First I want to focus on the row cache problem if I have space I will talk about the recursive cursor problem maybe as second part.
We normally collect 30 minutes but available snaps are like below.

                          Snap
Snap Id   Snap Started    Level
8135 20 Jan 2010 21:05     7
8136 20 Jan 2010 21:35     7
8137 20 Jan 2010 22:28     7
8138 20 Jan 2010 22:58     7

If you look carefully you will see that snap at 22:00 is missing. There must be something wrong on DB which prevented statspack
from gathering but lets assume that you could not spot it for now as I could’t which was a deadly mistake caused increase in trouble shooting time
I generated statspack report between 21:35 and 22:28

Here is the statspack summary (full statspack is at here. SQLs are a bit edited for keeping privacy)

STATSPACK report for

Database    DB Id    Instance     Inst Num Startup Time    Release     RAC
~~~~~~~~ ----------- ------------ -------- --------------- ----------- ---
1814528001 PROD1 1 30-Dec-09 01:59 10.2.0.4.0 NO

Host Name: MACHINE1 Num CPUs: 8 Phys Memory (MB): 32,763
~~~~

Snapshot       Snap Id     Snap Time      Sessions Curs/Sess Comment
~~~~~~~~    ---------- ------------------ -------- --------- -------------------
Begin Snap:       8136 20-Jan-10 21:35:59      128       7.2
  End Snap:       8137 20-Jan-10 22:28:50      189       5.4
   Elapsed:               52.85 (mins)

Cache Sizes                       Begin        End
~~~~~~~~~~~                  ---------- ----------
               Buffer Cache:     4,128M     4,032M  Std Block Size:         8K
           Shared Pool Size:     6,032M     6,128M      Log Buffer:    13,913K

Load Profile                            Per Second       Per Transaction
~~~~~~~~~~~~                       ---------------       ---------------
                  Redo size:            126,609.46            124,954.43
              Logical reads:             10,300.31             10,165.66
              Block changes:                904.39                892.57
             Physical reads:                375.25                370.34
            Physical writes:                 17.54                 17.31
                 User calls:                 15.88                 15.67
Parses: 9.22 9.09
Hard parses: 1.08 1.06
                      Sorts:                 60.58                 59.79
                     Logons:                  0.13                  0.13
                   Executes:                327.42                323.14
               Transactions:                  1.01

  % Blocks changed per Read:    8.78    Recursive Call %:    98.49
 Rollback per transaction %:    1.28       Rows per Sort:    28.80

Instance Efficiency Percentages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.96 Redo NoWait %: 95.11
            Buffer  Hit   %:   99.73    In-memory Sort %:  100.00
            Library Hit   %:   99.10        Soft Parse %:   88.32
         Execute to Parse %:   97.19         Latch Hit %:   99.91
Parse CPU to Parse Elapsd %: 64.77 % Non-Parse CPU: 99.51

 Shared Pool Statistics        Begin   End
                               ------  ------
             Memory Usage %:   74.42   72.87
    % SQL with executions>1:   33.97   33.40
  % Memory for SQL w/exec>1:   56.11   55.38

Top 5 Timed Events                                                    Avg %Total
~~~~~~~~~~~~~~~~~~                                                   wait   Call
Event                                            Waits    Time (s)   (ms)   Time
----------------------------------------- ------------ ----------- ------ ------
row cache lock                                  16,524      48,098   2911   45.6
log file switch (checkpoint incomplete)         68,137      41,595    610   39.5
buffer busy waits                               13,462      13,455    999   12.8
CPU time                                                     1,472           1.4
db file sequential read                        723,098         446      1     .4
          -------------------------------------------------------------
Host CPU CPUs-8
~~~~~~~~              Load Average
Begin End User System Idle WIO WCPU
                    ------- -------   ------- ------- ------- ------- --------
                                         3.69    2.12   94.19

Instance CPU
~~~~~~~~~~~~
              % of total CPU for Instance:    3.30
              % of busy  CPU for Instance:   56.81
  %DB time waiting for CPU - Resource Mgr:

Memory Statistics                       Begin          End
~~~~~~~~~~~~~~~~~                ------------ ------------
                  Host Mem (MB):     32,762.6     32,762.6
SGA use (MB): 10,240.0 10,240.0
                   PGA use (MB):        776.7        787.4
% Host Mem used for SGA+PGA: 33.6 33.7
          -------------------------------------------------------------

Time Model System Stats DB/Inst: PROD1/prod1 Snaps: 8136-8137
-> Ordered by % of DB time desc, Statistic name

Statistic                                       Time (s) % of DB time
----------------------------------- -------------------- ------------
sql execute elapsed time 101,089.5 100.0
DB CPU                                             821.2           .8
PL/SQL execution elapsed time                      620.9           .6
parse time elapsed                                  22.4           .0
hard parse elapsed time                             21.2           .0
PL/SQL compilation elapsed time                      1.7           .0
connection management call elapsed                   0.2           .0
hard parse (sharing criteria) elaps 0.1 .0
repeated bind elapsed time                           0.0           .0
hard parse (bind mismatch) elapsed                   0.0           .0
failed parse elapsed time                            0.0           .0
sequence load elapsed time                           0.0           .0
DB time                                        101,090.9
background elapsed time                          6,382.7
background cpu time                                 16.6

Wait Events DB/Inst: PROD1/prod1 Snaps: 8380-8382
-> s - second, cs - centisecond, ms - millisecond, us - microsecond
-> %Timeouts: value of 0 indicates value was < .5%. Value of null is truly 0
-> Only events with Total Wait Time (s) >= .001 are shown
-> ordered by Total Wait Time desc, Waits desc (idle events last)

                                                                    Avg
                                                %Time Total Wait   wait    Waits
Event Waits -outs Time (s) (ms) /txn
--------------------------------- ------------ ------ ---------- ------ --------
row cache lock                          16,524     97     48,098   2911      5.1
log file switch (checkpoint incom 68,137 61 41,595 610 21.2
buffer busy waits                       13,462    100     13,455    999      4.2
..
..
..

Wait Event Histogram DB/Inst: PROD1/prod1 Snaps: 8136-8137
-> Total Waits - units: K is 1000, M is 1000000, G is 1000000000
-> % of Waits - column heading: <=1s is truly <1024ms, >1s is truly >=1024ms
-> % of Waits - value: .0 indicates value was <.05%, null is truly 0
-> Ordered by Event (idle events last)

                           Total ----------------- % of Waits ------------------
Event Waits <1ms <2ms <4ms <8ms <16ms <32ms <=1s >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
row cache lock               16K   2.9                                  .0  97.1
log file switch (checkpoin 66K 21.0 2.7 12.8 2.1 .3 61.1
buffer busy waits            13K    .0                            .0 100.0

As you see row cache locks avg wait time is nearly 3 second with 97% is over 1 second which looks like it is the problem

We already spotted row cache lock but where to look next ?

Documentation says ;

row cache lock

The session is trying to get a data dictionary lock. 

Wait Time: Wait up to 60 seconds.

Parameter	Description
cache id	The CACHE# column value in the V$ROWCACHE view
mode	See "mode"
request	The pipe timer set by the user

In Statspack report we have a section called “Dictionary Cache Stats”. Lets check what we have there

Dictionary Cache Stats DB/Inst: PROD1/prod1 Snaps: 8136-8137
->"Pct Misses" should be very low (< 2% in most cases)
->"Final Usage" is the number of cache entries being used in End Snapshot

Get Pct Scan Pct Mod Final
Cache Requests Miss Reqs Miss Reqs Usage
------------------------- ------------ ------ ------- ----- -------- ----------
dc_awr_control                      18    0.0       0              0          1
dc_database_links                  107    0.0       0              0          1
dc_files                            13    0.0       0              0         13
dc_global_oids                  21,571    0.1       0              2        255
dc_histogram_data               73,396    2.2       0              0      3,904
dc_histogram_defs              279,541    1.4       0              0      7,652
dc_object_grants                   102   11.8       0              0        124
dc_object_ids                1,050,311    0.1       0              5      2,879
dc_objects                      14,052    1.3       0             31      2,605
dc_profiles                         74    0.0       0              0          1
dc_rollback_segments             4,147    0.0       0             64        300
dc_segments                     50,349    0.2       0            121      2,415
dc_sequences                        10   10.0       0             10         13
dc_table_scns                        1  100.0       0              0          0
dc_tablespace_quotas               289    0.0       0              0          9
dc_tablespaces                  66,671    0.0       0              0         11
dc_usernames                    18,888    0.1       0              0         53
dc_users                        99,804    0.0       0              0         71
outstanding_alerts                  26   15.4       0              4         24
          -------------------------------------------------------------

From my old shared pool post I remember that Pct Miss is important about dictionary cache tuning.
I did the comparison with the normal times and the only thing I spot was increase in Pct miss for
dc_object_grants and dc_sequences with less get request but without specific session activity all
I can do is this much for this event. so I moved to logfile switch event to see if I can find something

Definition **** log file switch (checkpoint incomplete)
The log file switch wait indicates that the process is waiting for the log file switch to complete,
but the log file switch is not possible because the checkpoint process for that log file
has not completed. You may see this event when the redo log files are sized too small.
Wait Parameters Wait parameters are not used for log file switch (checkpoint incomplete).
Wait Time One second

This is an event we see occasionally but not with the redo size we generated for that time interval. Lets see how many log switches has been done for that day

SQL> @gen_redo

DAY            COUNT#       MIN#       MAX# DAILY_AVG_MB
---------- ---------- ---------- ---------- ------------
2010-01-26        790      60361      61150        50560
2010-01-25        722      59639      60360        46208
2010-01-24        469      59170      59638        30016
2010-01-23        291      58879      59169        18624
2010-01-22        651      58228      58878        41664
2010-01-21        394      57834      58227        25216
2010-01-20        473      57361      57833        30272

10 rows selected.

Session altered.

DAY        00    01    02    03    04    05    06    07    08    09    10    11    12    13    14    15    16    17    18    19    20    21    22    23
---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
20/01/2010    0    50     0     0     0     1    56     0     1     9    17    43    37    27     8     8    14    12     9   101    57    15     7     1
21/01/2010    0    52     0     1     0     0    56     0     1     6    13    12    12    12    17    10     4    11     6   100    58    18     4     1
22/01/2010    0    51     0     0     1     0    58     0     1     3    11    27    14    18    82    25    17    60   104   100    57    18     4     0
23/01/2010    1    51     2     1     0     0    54     1     0     0     0     0     1     0     0     0     1     0     0   100    57    18     3     1
24/01/2010    0    52     1     3     3     3    67     8     7     1     0     1     0     0   136    53    16     1     0    76    20    18     1     2
25/01/2010    0    53     0     0     0     1    59     1     1   120    45    89    40     4    80    17    16    78     6    90    11     8     3     0
26/01/2010    0    53    12    39    13     0    57     0     1    40    28    63    38   142     4    99    59     5    17    82    16    19     2     1

8 rows selected.

looks like nothing specific happened number of log switches looks innocent.

What is next buffer busy looks like the third one with little effect but still can give something ?

Lets check the trends

SQL> @sp_evtrends
Please enter the ORACLE_SID value: PROD1
Please enter the report begin date: sysdate-8
Please enter the report end date:sysdate

Daily trends for "buffer busy waits"...

Day    Statistic Name                         Secs Waited Percentage of total over all days
------ ------------------------------ ------------------- ------------------------------------------------------------
19-JAN buffer busy waits                           127.95

20-JAN buffer busy waits                       134,742.54 **********************************************************

21-JAN buffer busy waits                           205.92

22-JAN buffer busy waits                           553.43

23-JAN buffer busy waits                             0.31

24-JAN buffer busy waits                            32.49

25-JAN buffer busy waits                         1,449.90 *

26-JAN buffer busy waits                         1,131.49

Daily/hourly trends for "DB time"... (truncated)

Day    Hour   Statistic Name                      VALUE Percentage of total over all hours for each day
------ ------ ------------------------------ ---------- ------------------------------------------------------------

20-JAN 00:00  buffer busy waits                             0.00
       01:00  buffer busy waits                             0.00
       02:00  buffer busy waits                             0.00
       03:00  buffer busy waits                             0.00
       04:00  buffer busy waits                             0.00
       05:00  buffer busy waits                             0.00
       06:00  buffer busy waits                             0.00
       07:00  buffer busy waits                             0.00
       08:00  buffer busy waits                             0.00
       09:00  buffer busy waits                             4.86
       10:00  buffer busy waits                             6.03
       11:00  buffer busy waits                            43.35
       12:00  buffer busy waits                            24.02
       13:00  buffer busy waits                            20.27
       14:00  buffer busy waits                            16.45
       15:00  buffer busy waits                            24.45
       16:00  buffer busy waits                            27.31
       17:00  buffer busy waits                            23.86
       18:00  buffer busy waits                             2.78
       19:00  buffer busy waits                             0.24
       20:00  buffer busy waits                             0.00
       21:00  buffer busy waits                             0.00
       22:00  buffer busy waits                       134,548.91 ************************************************************
       23:00  buffer busy waits                             0.00

Looks like normally our DB doesn’t spend time on this event which is good.

We have a nice section in statspack report for buffer busy wait events


Segments by Buffer Busy Waits DB/Inst: PROD1/prod1 Snaps: 8136-8137
-> End Segment Buffer Busy Waits Threshold:       100

                                                                    Buffer
Subobject Obj. Busy Pct
Owner Tablespace Object Name Name Type Waits Total
---------- ---------- -------------------- ------------ ----- ------------ -----
APP_USER USERS ULG TABLE 9,560 100.0
APP_USER USERS PK_ULG INDEX 1 .0
          -------------------------------------------------------------

All of the waits happened on the same object. Before checking I checked trend with another Tim Gorman script sp_buffer_busy (again edited for time interval and output only segments with buffer_busy waits >0)

I won’t put the output but output says this table showed up only 3 more times with wait total less then 30, in 10 days period. If you check the actual statspack for sqls
ULG table is not shown on any of them which is a bit strange again.

Maybe I can find something else, if I get half an hour period report for

		 	Snap
Snap Id   Snap Started    Level
8137 20 Jan 2010 22:28     7
8138 20 Jan 2010 22:58     7

From top 5 wait events

Top 5 Timed Events                                                    Avg %Total
~~~~~~~~~~~~~~~~~~                                                   wait   Call
Event                                            Waits    Time (s)   (ms)   Time
----------------------------------------- ------------ ----------- ------ ------
db file sequential read                        613,887       1,040      2   42.0
db file scattered read                         266,197         853      3   34.5
CPU time                                                       418          16.9
db file parallel write                          45,244          86      2    3.5
log file parallel write                          4,449          26      6    1.1
          -------------------------------------------------------------

Our problem disappeared from top 5 list and DB time=16,656.2 which is very small comparing to previous value 101,090.9.
problem is gone but what has actually happened ?

I am pretty sure that there wasn’t any change on that day. backup ran successfully after 23:00 and export finished before 9:30 so what can this be ?

I know the answer but whoever comes up with a good idea please be my guest to comment. If you are good enough, I promise to buy beer.

Answer will be here as an another post on Friday. If I have time I will also show what happened for the other DB time peak on same day.

Note:
With this post I did not re-invent the wheel but try to show you the available free tools you can use and modify for your own fast troubleshooting. As Tanel says right tool right time. These guys share these tools to help you , my advice is use it if you need the.
My script library consist of many scripts from Tanel Poder, Riyaj Shamsudeen ,Tim Gorman, Kerry Osborne, Dion Cho, Craig Shallahamer and from many others plus my own.
Thanks you very much to all guys who share their tools with us.

How much time I spent for diagnosis? Everything on this page took 10 minutes total to run. Rest is your experience and skills to search on metalink and internet.
for me coming to last point took only 1.5 hour.

List of scripts I used;

January 20, 2010

Blogroll Report 01/01/2009 – 08/01/2010

Filed under: Blogroll Report — coskan @ 3:45 pm

<—- Blogroll Report 25/12/2009 – 01/01/2010

1-How to sort getting “ORA-00942: table or view does not exist” with procedures of users with DBA roles ?
Damir Vadas-ORA-00942 and “AUTHID CURRENT_USER”

2-How to secure hashed passwords?
John Hallas-The need to ensure that hashed password values are safe

3-How to workaround “ERROR-400Data will be rejected for upload from agent max size limit for direct load exceeded”?
Kellyn Pedersen-ERROR-400Data will be rejected for upload from agent ‘https://:/emd/main/’, max size limit for direct load exceeded [6827354/5242880]

4-Numeric Segment names in DBA_SEGMENTS table
Anand Prakash-Segment name – Numeric

5-How to trace/debug oracle data guard?
Miladin Modrakovic-Tracing Oracle Data Guard

6-How to tune via VST diagrams and manual tuning by NO_MERGE hint example?
Kyle Hailey-Visually Identifying SQL Optimizer Weaknesses

7-Flashback can be disabled automatically
John Hallas-Flashback disabled automatically – a minor rant

8-How to configure Oracle Database File System-DBFS on Linux?
Tim Hall-Oracle Database File System (DBFS) in Oracle Database 11g Release 2

9-CPU Speed and CPU Costing Model Relationship
Richard Foote-The CPU Costing Model: A Few Thoughts Part IV (Map of the Problematique)

10-How to get list of available command types ?
Martin Widlake-More on COMMAND_TYPE values

11-How to copy stats between objects ?
Jonathan Lewis-Copy Stats

12-How to force Oracle to scan only single datafile for manual parallelization ?
Jonathan Lewis-Pseudo Parallel

13-Locking mode behaviour change in 11G for unindexed foreign key columns
Charles Hooper-Deadlock on Oracle 11g but Not on 10g

14-Tests on using date or number for date datatype and different optimizer behaviours
Charles Hooper-DATE Datatype Or NUMBER Datatype – Which Should be Used?

Blogroll Report 08/01/2010 – 15/01/2010 –>

January 8, 2010

Blogroll Report 25/12/2009 – 01/01/2010

Filed under: Blogroll Report — coskan @ 6:50 pm

<—- Blogroll Report 18/12/2009 – 25/12/2009

1-How do function based indexes using date functions behave for different NLS session settings?
Gary Myers-Happy New Year and fun with dates

2-How does CPU count initialization parameter work?
Lua Donghua-CPU_COUNT initialization parameter

3-Pocket Reference Guide for ASM
Charles Kim-ASM Pocket Reference Guide

4-Pocket Reference Guide for RAC
Charles Kim-Real Application Clusters (RAC) Pocket Reference Guide

5-How many rows are sorted and how much memory used by your query?
Jonathan Lewis-Short Sorts

6-How much memory used per program ?
Ronny Egner-How much memory is being used per program

7-Why does streams appyl process perform well in 11GR2 (oracle internals)?
Alex Fatkulin-Oracle Streams Apply Process changes in 11GR2

8-How to toubleshoot “ORA-00304: requested INSTANCE_NUMBER is busy” in a non-RAC environment?
Arup Nanda-Instance_Number is busy Message during Standby Instance Startup

9-Tom Kyte’s thoughts about Edition Based Redefinition and deferred segment creation
Thomas Kyte-A Closer Look at the New Edition

10-How to work with hybrid columnar compression in Exadata V2?
Arup Nanda-Compressing Columns

11-How to relocate grid infrastructure installation in 11GR2?
Rene Kundersma-Relocating Grid Infrastructure

12-All about grouping
Rob Van Wijk-Paper abput grouping

13-How to use dbms_metadata on objects other than dual?
Jonathan Lewis-DBMS_METADATA

14-How can system stats collection or manually setting mbrc cause performance problems ?
Charles Hooper-High Value for MBRC Causes High BCHR, High CPU Usage, and Slow Performance

15-How to get peeked binds from OTHER_XML column of V$SQL_PLAN?
Dominic Brooks-SQL_PLAN – other_xml

16-How to track down plan changes with AWR?
Dominic Brooks-Plans gone AWRy

17-How to solve CRS is not installed on any of the nodes, reported by cluvf, problem?
Amit Bansal-Cluvfy Reports CRS not Installed on Nodes

18-Data pump tips for parallel and compress
Rajeev Ramdas Thottathil-Datapump export and import – parallel and compress

19-Performance of IGNORE_ROW_ON_DUPKEY_INDEX in 11GR2
Guy Harrison-The 11GR2 IGNORE_ROW_ON_DUPKEY_INDEX hint

20-How to install 11GR2 RAC on Linux Using NFS?
Tim Hall-Oracle Database 11g Release 2 RAC On Linux Using NFS

21-How to configure DNS for SCAN in 11GR2 for personal tests?
Tim Hall-DNS Configuration for the SCAN used with Oracle RAC Database 11g Release 2

22-How to use Analytical Functions for certain complex problems ?
Charles Hooper-SQL – DENSE_RANK, PERCENT_RANK, and COUNT Analytical Functions

Charles Hooper-SQL – COUNT Analytical Function, GROUP BY, HAVING

Charles Hooper-SQL – RANK, MAX Analytical Functions, DECODE, SIGN

23-How does CBO behave for deleting rows with exists or in and which one is better?
Charles Hooper-Which is Most Efficient when Deleting Rows: EXISTS, IN, or a VIEW

24-Possible workaround for “ORA-15077:could not locate ASM instance serving a required diskgroup”
David Marcos-PRKP-1001 : Error starting instance …. followed by CRS-0215

Blogroll Report 01/01/2010 – 08/01/2010 –>

Object without object_id

Filed under: Diary — coskan @ 4:44 pm

This is probably not new to most of you who checked the definition of DBA_OBJECTS view, but it was new to me, and I wanted to share.

Normally in documentation DBA_OBJECTS is referenced to ALL_OBJECTS view as all other DBA_XXX views, and OBJECT_ID column in ALL_OBJECTS view is NOT_NULL like below, but DBA_OBJECTS does not have any information about nullability

What I was trying was creating a test table created with CTAS from dba_objects  and tried to create a primary key on OBJECT_ID but I couldn’t because of having nulls. When I checked I found that DB_LINKS don’t have object_id.

 
 
SQL> desc all_objects;
           Name                            Null?    Type
           ------------------------------- -------- ------------------
    1      OWNER                           NOT NULL VARCHAR2(30)
    2      OBJECT_NAME                     NOT NULL VARCHAR2(30)
    3      SUBOBJECT_NAME                           VARCHAR2(30)
    4      OBJECT_ID                       NOT NULL NUMBER
    5      DATA_OBJECT_ID                           NUMBER
    6      OBJECT_TYPE                              VARCHAR2(19)
    7      CREATED                         NOT NULL DATE
    8      LAST_DDL_TIME                   NOT NULL DATE
    9      TIMESTAMP                                VARCHAR2(19)
   10      STATUS                                   VARCHAR2(7)
   11      TEMPORARY                                VARCHAR2(1)
   12      GENERATED                                VARCHAR2(1)
   13      SECONDARY                                VARCHAR2(1)
   14      NAMESPACE                       NOT NULL NUMBER
   15      EDITION_NAME                             VARCHAR2(30)

SQL> desc dba_objects
           Name                            Null?    Type
           ------------------------------- -------- ------------------
    1      OWNER                                    VARCHAR2(30)
    2      OBJECT_NAME                              VARCHAR2(128)
    3      SUBOBJECT_NAME                           VARCHAR2(30)
    4      OBJECT_ID                                NUMBER
    5      DATA_OBJECT_ID                           NUMBER
    6      OBJECT_TYPE                              VARCHAR2(19)
    7      CREATED                                  DATE
    8      LAST_DDL_TIME                            DATE
    9      TIMESTAMP                                VARCHAR2(19)
   10      STATUS                                   VARCHAR2(7)
   11      TEMPORARY                                VARCHAR2(1)
   12      GENERATED                                VARCHAR2(1)
   13      SECONDARY                                VARCHAR2(1)
   14      NAMESPACE                                NUMBER
   15      EDITION_NAME                             VARCHAR2(30)
 

When I check the definition of DBA_OBJECTS, I saw that DB_LINKS are added with a union to the view without an object_id

 
.....
.....
.....
union all
select u.name, l.name, NULL, to_number(null), to_number(null),
       'DATABASE LINK',
       l.ctime, to_date(null), NULL, 'VALID','N','N', 'N', NULL, NULL
from sys.link$ l, sys.user$ u
where l.owner# = u.user#;
 

I check ALL_OBJECTS again and could not find any DB_LINK listed in  ALL_OBJECTS. In my understanding DB_LINKS are not even objects, at least they don’t deserve to have an OBJECT_ID from Oracle’s point.

« Newer PostsOlder Posts »

Theme: Silver is the New Black. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 203 other followers