Coskan’s Approach to Oracle

December 29, 2010

How to monitor services on 11GR2 ?

Filed under: Monitoring, RAC — coskan @ 8:10 pm

It is very important to have our services run on the preferred nodes for the stability of our system (as long as all nodes are available)  and we sometimes see some of our services move between nodes without any good reason on 11GR2 grid. It was a big shame we discover it after business realizes the situation so we decided to have a check on service availability. At first I think about using DB to check the info but oracle does not keep the preferred node information in the DB itself 😦 remaining option is to use srvctl output which means coding needed. Time was limited to write it from scratch with  my coding coding speed so  I asked the question on Oracle-L and get a very good response from Yong Huang . His script was written in perl and running well on 10G and 11GR1 but not giving the expected results on our 11GR2 cluster,  for this reason, I modified the script a bit to use 11GR2 output (did I mention I’m better modifying codes than writing them 🙂 ).

Below is the actual perl script to check the service availability.
Nothing rocket science here , It first gets the service names from crs_stat (I know it is deprecated please do not mention ) and in a for loop it checks the output if srvctl config and srvctl status for the services and than mails the problem services in a single mail.

We are calling perl script in a bash script (bottom of the page) which could also be optimized to run everything in the perl and I am again lazy not doing it and I find current way easier to move it between servers. only tiny bash script needs to be modified. I configure it to run every hour and job is done.

I hope it helps somebody. Thank you very much to Yong for sharing his work with me.

—Actual Script

#!/usr/bin/perl -w
# Modifier = Coskan Gundogar (original script is from YONG HUANG)
# Purpose = checking if services are running on the right node or not
# Maintenance needs
# CRS_HOME is hardcoded. Needs to be changed when CRS home changes
# Script needs to be called by a script which set env variables

$HOST=`hostname -a`;
$RECIPIENTEMAIL = 'yourdbagroup@yourdomain.com';
$ENV{PATH}="/usr/bin:/bin";
$CRS_HOME="/u01/crs/oracle/product/11.2.0/grid/bin/";
$ORASID=$ENV{'ORACLE_SID'};
$ORAHOME=$ENV{'ORACLE_HOME'};
$msg_run="";
$msg_none="";
$msg_all="";
$MSG_SUB = "";

                $_=`$CRS_HOME/crs_stat | awk -F. '/^NAME.*$ORASID.*\.svc\$/ {print \$3}' | sort | uniq`; #Get all service names (domain stripped) 11.2G

@line = split /\n/;

foreach (@line)
{
                 $service = "$_"; #Our domain names are guaranteed to be this.
                 $prefinst=`$ORAHOME/bin/srvctl config service -d $ORASID -s $service | grep -e Preferred | awk '{print \$3}'`;
                 $prefinst=~ s/\s+$//;
                 $statusline = `$ORAHOME/bin/srvctl status service -d $ORASID -s $service`;

                 if ($statusline =~ /is running on instance\(s\) (.*)$/)
                 {
                 $runinst = $1;
                 if ($prefinst ne $runinst)  # if service is running on the wrong node
                   {
                     $msg_avail = "Service \"$service \" preferred instance list differs from service running instance list: \n Preferred : $prefinst \n Running on: $runinst\n";
			         $msg_run=$msg_run . "\n" . $msg_avail;
		             $MSG_SUB = 'PRD-WARNING: ' . $HOST . ' DB: ' . $ORASID . ' Service Availability Problem'
		  }
                 }
                 else #This service is not even running.
                 {
		   			$msg_none=$msg_none . "\n" . $statusline . "\n";
					$MSG_SUB = "PRD-CRITICAL: " . $HOST . " DB: " . $ORASID . " Service Availability Problem"
		 }
                 @prefinst = ();
                 @runinst = ();
}
$msg_all=$msg_run . "\n" . $msg_none;

if ($msg_run ne "" or $msg_none ne "") {
system "mail -s \"$MSG_SUB\"  $RECIPIENTEMAIL <<EOF
$msg_all
EOF";
}
;

–Calling in an sh

export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_2
export ORACLE_SID=DATABASE_NAME
export PATH=$PATH_BCK:$ORACLE_HOME/bin

/usr/bin/perl /home/oracle/bin/check_services.pl

Not : Problem with the actual script is when you set preferred and available node list in a non-ordered or reverse ordered way oracle uses the same order you gave in the output of srvctl config .Because The script compares srvctl config and srvctl status outputs it shows nodes are not running on the preferred node. I am sure this could be sorted but I prefer to set my services in the right order so I did not change anything to sort this problem.

4 Comments »

  1. […] change freeze) which is our main OLTP and batch node. We already knew the issue (thanks to our service check script) but it was change freeze time and we did not have chance to move service to the right nodes. When […]

    Pingback by Analysing Temp usage on 11GR2 – Temp space is not released « Coskan’s Approach to Oracle — January 24, 2011 @ 2:47 pm

  2. Hi Coskan,

    Thanks for sharing. I am trying to check “srvctl config database” across multiple host if the database is not registered then have to perform the registration from central server.
    Any idea how to perform this task. Really appreciate your input and time.

    Thanks
    Bala

    Comment by Bala — February 6, 2013 @ 2:50 am

  3. I am not able to run this script after upgrade 11g2 to 12c cluster

    Comment by Anand sharma — June 11, 2014 @ 4:46 pm

  4. After upgrade 11g2 to 12c cluster i am not able to run this script. please suggest

    Comment by Anand sharma — June 11, 2014 @ 4:47 pm


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.