Coskan’s Approach to Oracle

December 29, 2010

How to monitor services on 11GR2 ?

Filed under: Monitoring, RAC — coskan @ 8:10 pm

It is very important to have our services run on the preferred nodes for the stability of our system (as long as all nodes are available)  and we sometimes see some of our services move between nodes without any good reason on 11GR2 grid. It was a big shame we discover it after business realizes the situation so we decided to have a check on service availability. At first I think about using DB to check the info but oracle does not keep the preferred node information in the DB itself :( remaining option is to use srvctl output which means coding needed. Time was limited to write it from scratch with  my coding coding speed so  I asked the question on Oracle-L and get a very good response from Yong Huang . His script was written in perl and running well on 10G and 11GR1 but not giving the expected results on our 11GR2 cluster,  for this reason, I modified the script a bit to use 11GR2 output (did I mention I’m better modifying codes than writing them :) ).

Below is the actual perl script to check the service availability.
Nothing rocket science here , It first gets the service names from crs_stat (I know it is deprecated please do not mention ) and in a for loop it checks the output if srvctl config and srvctl status for the services and than mails the problem services in a single mail.

We are calling perl script in a bash script (bottom of the page) which could also be optimized to run everything in the perl and I am again lazy not doing it and I find current way easier to move it between servers. only tiny bash script needs to be modified. I configure it to run every hour and job is done.

I hope it helps somebody. Thank you very much to Yong for sharing his work with me.

—Actual Script

#!/usr/bin/perl -w
# Modifier = Coskan Gundogar (original script is from YONG HUANG)
# Purpose = checking if services are running on the right node or not
# Maintenance needs
# CRS_HOME is hardcoded. Needs to be changed when CRS home changes
# Script needs to be called by a script which set env variables

$HOST=`hostname -a`;
$RECIPIENTEMAIL = 'yourdbagroup@yourdomain.com';
$ENV{PATH}="/usr/bin:/bin";
$CRS_HOME="/u01/crs/oracle/product/11.2.0/grid/bin/";
$ORASID=$ENV{'ORACLE_SID'};
$ORAHOME=$ENV{'ORACLE_HOME'};
$msg_run="";
$msg_none="";
$msg_all="";
$MSG_SUB = "";

                $_=`$CRS_HOME/crs_stat | awk -F. '/^NAME.*$ORASID.*\.svc\$/ {print \$3}' | sort | uniq`; #Get all service names (domain stripped) 11.2G

@line = split /\n/;

foreach (@line)
{
                 $service = "$_"; #Our domain names are guaranteed to be this.
                 $prefinst=`$ORAHOME/bin/srvctl config service -d $ORASID -s $service | grep -e Preferred | awk '{print \$3}'`;
                 $prefinst=~ s/\s+$//;
                 $statusline = `$ORAHOME/bin/srvctl status service -d $ORASID -s $service`;

                 if ($statusline =~ /is running on instance\(s\) (.*)$/)
                 {
                 $runinst = $1;
                 if ($prefinst ne $runinst)  # if service is running on the wrong node
                   {
                     $msg_avail = "Service \"$service \" preferred instance list differs from service running instance list: \n Preferred : $prefinst \n Running on: $runinst\n";
			         $msg_run=$msg_run . "\n" . $msg_avail;
		             $MSG_SUB = 'PRD-WARNING: ' . $HOST . ' DB: ' . $ORASID . ' Service Availability Problem'
		  }
                 }
                 else #This service is not even running.
                 {
		   			$msg_none=$msg_none . "\n" . $statusline . "\n";
					$MSG_SUB = "PRD-CRITICAL: " . $HOST . " DB: " . $ORASID . " Service Availability Problem"
		 }
                 @prefinst = ();
                 @runinst = ();
}
$msg_all=$msg_run . "\n" . $msg_none;

if ($msg_run ne "" or $msg_none ne "") {
system "mail -s \"$MSG_SUB\"  $RECIPIENTEMAIL <<EOF
$msg_all
EOF";
}
;

–Calling in an sh

export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_2
export ORACLE_SID=DATABASE_NAME
export PATH=$PATH_BCK:$ORACLE_HOME/bin

/usr/bin/perl /home/oracle/bin/check_services.pl

Not : Problem with the actual script is when you set preferred and available node list in a non-ordered or reverse ordered way oracle uses the same order you gave in the output of srvctl config .Because The script compares srvctl config and srvctl status outputs it shows nodes are not running on the preferred node. I am sure this could be sorted but I prefer to set my services in the right order so I did not change anything to sort this problem.

December 6, 2010

Alert Log Monitoring script via ADRCI

Filed under: How To, Tips — coskan @ 4:47 pm

Before I start to write about the blog series in my mind which will be the base to my first presentation again in my mind, I would like to share simple alert log monitoring with adrci.

At first I was completely against Diagnostics Dest idea but after a bit of searching and learning new tips tricks of adrci command like I think I am a big fan of diagnostics dest. Below is the script I wrote to monitor alert log hourly and with a daily summary. Script is self explanatory, the only thing it does is gets ADRCI homes put them in a bash array and grep the ORA-, TNS- errors for the last hour or last day. It is not rocket since and it is based on a simple command which can also be added to your profiles as an alias.

--last day
adrci exec="set home ${adrci_home}; show alert -p \\\"message_text like '%ORA-%' and originating_timestamp > systimestamp-1\\\"" 
--last hour 
adrci exec="set home ${adrci_home}; show alert -p \\\"message_text like '%ORA-%' and originating_timestamp > systimestamp-1/24\\\"" 

I am pretty sure this could be shortened but I did not bother to shorten it so I leave it to your imagination:)

I did not add listener checks to hourly checks but to daily summary, because there can be many which will cause you to get mail every hour.

Good part of using ADRCI and hourly checks is you do not need to edit the file to get rid of the error because you will never see that alert again in the next hour slot.

Script will send the output to the files and read those files back to the mail content and I am sure there may be better way to do this.

Complete script

#################################################
###### ALERT LOG CHECKING VIA ADRCI #############
#################################################
# Author : Coskan Gundogar
# Version Date: 02/12/2010
# Usage :  
# To run for last 24 hours - ./check_alert.sh D
# To run for last hour - ./check_alert.sh 
# Edit variables below for moving between servers 
# Changes To the Script 
#

export ORACLE_SID=+ASM1
export ORACLE_HOME=/u01/crs/oracle/product/11.2.0/grid/
export PATH=$PATH:$ORACLE_HOME/bin
DAILY_LOG=/home/oracle/bin/alert_log_check_daily.txt
HOURLY_LOG=/home/oracle/bin/alert_log_check_hourly.txt
MAIL_SUBJ="PRD:WARNING HOST: " 
MAIL_RECIPIENT="your_dba_group@your_company.com"

HOST_NAME=`hostname -a`


if [ "$1" = "D" ]
then

		############################################
		###############DAILY CHECKS ################
		############DBMS AND ASM CHECK##############
		############################################

		adrci_homes=( $(adrci exec="show homes" | grep -e rdbms -e asm))

		echo '####################################################' > $DAILY_LOG 
		echo '####### ALERT LOG OUTPUT FOR LAST 24 HOURS #########' >> $DAILY_LOG
		echo '####################################################' >> $DAILY_LOG 
		echo ''  >> $DAILY_LOG 
		echo ''  >> $DAILY_LOG 
		echo ''  >> $DAILY_LOG 

		for adrci_home in ${adrci_homes[@]}
		do 
			echo $adrci_home' Alert Log' >> $DAILY_LOG 
			adrci exec="set home ${adrci_home}; show alert -p \\\"message_text like '%ORA-%' and originating_timestamp > systimestamp-1\\\"" -term >> $DAILY_LOG 
		done


		############################################
		############## DAILY CHECKS ################
		############# LISTENER  CHECK###############
		############################################


		adrci_lsnr_homes=( $(adrci exec="show homes" | grep -e tnslsnr))

		echo ''  >> $DAILY_LOG 
		echo ''  >> $DAILY_LOG 
		echo ''  >> $DAILY_LOG 
		echo '####################################################' >> $DAILY_LOG 
		echo '###### LISTENER LOG OUTPUT FOR LAST 24 Hours #######' >> $DAILY_LOG
		echo '####################################################' >> $DAILY_LOG 
		echo ''  >> $DAILY_LOG 
		echo ''  >> $DAILY_LOG 
		echo ''  >> $DAILY_LOG 


		for adrci_lsnr_home in ${adrci_lsnr_homes[@]}
		do 
			echo $adrci_lsnr_home' Listener Log' >> $DAILY_LOG 
			adrci exec="set home ${adrci_lsnr_home}; show alert -p \\\"message_text like '%TNS-%' and originating_timestamp > systimestamp-1\\\""  -term >> $DAILY_LOG 
		done
		

		num_errors=`grep -c -e 'TNS' -e 'ORA' $DAILY_LOG`
		if [ $num_errors != 0 ]
		then 
		MAIL_SUBJ=$MAIL_SUBJ$HOST_NAME" Errors Found in Daily Alert Summary"
		mailx -s "$MAIL_SUBJ" $MAIL_RECIPIENT  $HOURLY_LOG 
		echo ''  >> $HOURLY_LOG 
		echo ''  >> $HOURLY_LOG 
		echo '####################################################' >> $HOURLY_LOG 
		echo '######### ALERT LOG OUTPUT FOR LAST HOUR ###########' >> $HOURLY_LOG
		echo '####################################################' >> $HOURLY_LOG 
		echo ''  >> $HOURLY_LOG 
		echo ''  >> $HOURLY_LOG 
		echo ''  >> $HOURLY_LOG 

		for adrci_home in ${adrci_homes[@]}
		do 
			echo $adrci_home' Alert Log' >> $HOURLY_LOG 
			adrci exec="set home ${adrci_home}; show alert -p \\\"message_text like '%ORA-%' and originating_timestamp > systimestamp-1/24\\\"" -term >> $HOURLY_LOG 
		done


		num_errors=`grep -c -e 'TNS' -e 'ORA' $HOURLY_LOG`
		if [ $num_errors != 0 ]
		then 
		MAIL_SUBJ=$MAIL_SUBJ$HOST_NAME" Errors Found in Hourly Alert Summary"
		mailx -s "$MAIL_SUBJ" $MAIL_RECIPIENT < $HOURLY_LOG
		fi

fi

The Silver is the New Black Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 203 other followers