Coskan’s Approach to Oracle

December 7, 2009

root.sh failed after ASM disk creation for 11GR2 Grid Infrastructure

Filed under: RAC — coskan @ 4:51 pm

Update 22/12/2009 After the first comment on the post I now know that  there is an easier way to deal with the problem.

How to Proceed from Failed 11gR2 Grid Infrastructure (CRS) Installation [ID 942166.1] (Last update is later than my post might be related :) )

Basically

Step 1: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force” on all nodes, except the last one.

Step 2: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force -lastnode” on last node. This command will zero out OCR and VD disk also.


Last 3 days I was a bit busy with installing Oracle RAC on Solaris 10 x64 on VMWare. I am planning to write a detailed documentation ,but I want to write an issue beforehand, which I managed to solve during the installation .

During grid infrastructure everything went fine till I ran root.sh script for cluster configuration. Script failed with the error stack below (I truncated the worked part)

# /u01/app/11.2.0/grid/root.sh
....
....
....
ASM created and started successfully.

DiskGroup DATA created successfully.

Errors in file :
ORA-27091: unable to queue I/O
ORA-15081: failed to submit an I/O operation to a disk
ORA-06512: at line 4
PROT-1: Failed to initialize ocrconfig
Command return code of 255 (65280) from command: /u01/grid/11.2.0/bin/ocrconfig -upgrade grid oinstall
Failed to create Oracle Cluster Registry configuration, rc 255
CRS-2500: Cannot stop resource 'ora.crsd' as it is not running
CRS-4000: Command Stop failed, or completed with errors.
Command return code of 1 (256) from command: /u01/grid/11.2.0/bin/crsctl stop resource ora.crsd -init
Stop of resource "ora.crsd -init" failed
Failed to stop CRSD
CRS-2673: Attempting to stop 'ora.asm' on 'solarac2'
CRS-2677: Stop of 'ora.asm' on 'solarac2' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'solarac2'
CRS-2677: Stop of 'ora.ctssd' on 'solarac2' succeeded
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'solarac2'
CRS-2677: Stop of 'ora.cssdmonitor' on 'solarac2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'solarac2'
CRS-2677: Stop of 'ora.cssd' on 'solarac2' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'solarac2'
CRS-2677: Stop of 'ora.gpnpd' on 'solarac2' succeeded
CRS-2679: Attempting to clean 'ora.gpnpd' on 'solarac2'
CRS-2681: Clean of 'ora.gpnpd' on 'solarac2' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'solarac2'
CRS-2677: Stop of 'ora.gipcd' on 'solarac2' succeeded
CRS-2673: Attempting to stop 'ora.mdnsd' on 'solarac2'
CRS-2677: Stop of 'ora.mdnsd' on 'solarac2' succeeded
Initial cluster configuration failed.  See /u01/grid/11.2.0/cfgtoollogs/crsconfig/rootcrs_solarac2.log for details

I tried to run root.sh again which I shouldn’t have done because it is documented not to do. (I have to confess that I did not read the installation document well)

The error stack was different like below

# /u01/app/11.2.0/grid/root.sh
Running Oracle 11g root.sh script...
.........
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2009-12-06 22:57:05: Parsing the host name
2009-12-06 22:57:05: Checking for super user privileges
2009-12-06 22:57:05: User has super user privileges
Using configuration parameter file: /u01/11.2.0/grid/crs/install/crsconfig_params
CRS is already configured on this node for crshome=0
Cannot configure two CRS instances on the same cluster.
Please deconfigure before proceeding with the configuration of new home.

As you see it didn’t allow me to re-run it. I needed to find a way to deconfigure the configuration. After a quick search on official doc I found the way here.

According to the doc, all I needed to do is run the command below and re-run the root.sh

/crs/install/rootcrs.pl -deconfig

Here is what happened when I run deconfigure

2009-12-07 00:35:17: Parsing the host name
2009-12-07 00:35:17: Checking for super user privileges
2009-12-07 00:35:17: User has super user privileges
Using configuration parameter file: /u01/grid/11.2.0/crs/install/crsconfig_params
Oracle Clusterware stack is not active on this node
Restart the clusterware stack (use /u01/grid/11.2.0/bin/crsctl start crs) and retry
Failed to verify resources

Still wasn’t working ??? I tried force option and it seemed like it de-configured successfully (maybe :) )

# /u01/grid/11.2.0/crs/install/rootcrs.pl -deconfig -force
2009-12-07 00:39:13: Parsing the host name
2009-12-07 00:39:13: Checking for super user privileges
2009-12-07 00:39:13: User has super user privileges
Using configuration parameter file: /u01/grid/11.2.0/crs/install/crsconfig_params
PRCR-1035 : Failed to look up CRS resource ora.cluster_vip.type for 1
PRCR-1068 : Failed to query resources
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.gsd is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.ons is registered
Cannot communicate with crsd
PRCR-1070 : Failed to check if resource ora.eons is registered
Cannot communicate with crsd

CRS-4133: Oracle High Availability Services has been stopped.
Successfully deconfigured Oracle clusterware stack on this node

It says it did successfully deconfigured but when I run the root.sh again I got this

Disk Group DATA already exists. Cannot be created again

Configuration of ASM failed, see logs for details
Did not succssfully configure and start ASM
CRS-2500: Cannot stop resource 'ora.crsd' as it is not running
CRS-4000: Command Stop failed, or completed with errors.
Command return code of 1 (256) from command: /u01/grid/11.2.0/bin/crsctl stop resource ora.crsd -init
Stop of resource "ora.crsd -init" failed
Failed to stop CRSD
CRS-2500: Cannot stop resource 'ora.asm' as it is not running
CRS-4000: Command Stop failed, or completed with errors.
Command return code of 1 (256) from command: /u01/grid/11.2.0/bin/crsctl stop resource ora.asm -init
Stop of resource "ora.asm -init" failed
Failed to stop ASM
CRS-2673: Attempting to stop 'ora.ctssd' on 'solarac1'
CRS-2677: Stop of 'ora.ctssd' on 'solarac1' succeeded
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'solarac1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'solarac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'solarac1'
CRS-2677: Stop of 'ora.cssd' on 'solarac1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'solarac1'
CRS-2677: Stop of 'ora.gpnpd' on 'solarac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'solarac1'
CRS-2677: Stop of 'ora.gipcd' on 'solarac1' succeeded
CRS-2673: Attempting to stop 'ora.mdnsd' on 'solarac1'
CRS-2677: Stop of 'ora.mdnsd' on 'solarac1' succeeded
Initial cluster configuration failed.  See /u01/grid/11.2.0/cfgtoollogs/crsconfig/rootcrs_solarac2.log for details

On the mentioned logfile it says

2009-12-07 00:43:26: Executing as grid: /u01/grid/11.2.0/bin/asmca -silent -diskGroupName DATA -diskList /dev/rdsk/c1t1d0s1,/dev/rdsk/c1t2d0s1,/dev/rdsk/c1t3
d0s1,/dev/rdsk/c1t4d0s1 -redundancy EXTERNAL -configureLocalASM
2009-12-07 00:43:26: Running as user grid: /u01/grid/11.2.0/bin/asmca -silent -diskGroupName DATA -diskList /dev/rdsk/c1t1d0s1,/dev/rdsk/c1t2d0s1,/dev/rdsk/c
1t3d0s1,/dev/rdsk/c1t4d0s1 -redundancy EXTERNAL -configureLocalASM
2009-12-07 00:43:26:   Invoking "/u01/grid/11.2.0/bin/asmca -silent -diskGroupName DATA -diskList /dev/rdsk/c1t1d0s1,/dev/rdsk/c1t2d0s1,/dev/rdsk/c1t3d0s1,/d
ev/rdsk/c1t4d0s1 -redundancy EXTERNAL -configureLocalASM" as user "grid"
2009-12-07 00:43:30: Configuration of ASM failed, see logs for details

Basically it configures asm with asmca command. asmca utility does not have drop diskgroup option which makes it unusable for this situation. (there is deleteasm option but it does not work fine because it needs a working asm instance which wasn’t possible after failed root.sh)

I didn’t want to delete all CRS installation so I needed a way to remove diskgroup information from ASM disks?

All I needed was dd command to remove the disk header information from the devices.

I had 4 disk presented for that disk group so I used dd command for all of them (I am not sure maybe I needed only the firs device I need to check invaluable presentation of Julian Dyke about ASM Internals)

# dd if=/dev/zero of=/dev/rdsk/c1t2d0s1 bs=1024K count=100
dd: bad numeric argument: "1024K"
bash-3.00# dd if=/dev/zero of=/dev/rdsk/c1t2d0s1 bs=1k count=1000000
1000000+0 records in
1000000+0 records out
# dd if=/dev/zero of=/dev/rdsk/c1t1d0s1 bs=1k count=1000000
1000000+0 records in
1000000+0 records out
# dd if=/dev/zero of=/dev/rdsk/c1t3d0s1 bs=1k count=1000000
1000000+0 records in
1000000+0 records out
# dd if=/dev/zero of=/dev/rdsk/c1t4d0s1 bs=1k count=1000000
1000000+0 records in
1000000+0 records out

After this deletion I re-run the deconfigure script and re-run the root.sh. Everything worked fine without any problem at all. The story will continue with How to install 11GR2 RAC on Solaris 10 on VMware (give me a bit more time to finish)

footnoteSmilar issue reported on metalink for Linux ( ML 955550.1)

Sources used
Oracle® Grid Infrastructure Installation Guide 11g Release 2 (11.2) for Solaris Operating System

How to use Files in place of Real Disk Devices for ASM – (Solaris) by Jeff Hunter

How to rerun root.sh during initial installation of GRID Infrastructure. by RACHELP

Theme: Silver is the New Black. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 206 other followers