Update 22/12/2009 After the first comment on the post I now know that there is an easier way to deal with the problem.
| How to Proceed from Failed 11gR2 Grid Infrastructure (CRS) Installation [ID 942166.1] (Last update is later than my post might be related Basically Step 1: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force” on all nodes, except the last one. Step 2: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force -lastnode” on last node. This command will zero out OCR and VD disk also. |
Last 3 days I was a bit busy with installing Oracle RAC on Solaris 10 x64 on VMWare. I am planning to write a detailed documentation ,but I want to write an issue beforehand, which I managed to solve during the installation .
During grid infrastructure everything went fine till I ran root.sh script for cluster configuration. Script failed with the error stack below (I truncated the worked part)
# /u01/app/11.2.0/grid/root.sh .... .... .... ASM created and started successfully. DiskGroup DATA created successfully. Errors in file : ORA-27091: unable to queue I/O ORA-15081: failed to submit an I/O operation to a disk ORA-06512: at line 4 PROT-1: Failed to initialize ocrconfig Command return code of 255 (65280) from command: /u01/grid/11.2.0/bin/ocrconfig -upgrade grid oinstall Failed to create Oracle Cluster Registry configuration, rc 255 CRS-2500: Cannot stop resource 'ora.crsd' as it is not running CRS-4000: Command Stop failed, or completed with errors. Command return code of 1 (256) from command: /u01/grid/11.2.0/bin/crsctl stop resource ora.crsd -init Stop of resource "ora.crsd -init" failed Failed to stop CRSD CRS-2673: Attempting to stop 'ora.asm' on 'solarac2' CRS-2677: Stop of 'ora.asm' on 'solarac2' succeeded CRS-2673: Attempting to stop 'ora.ctssd' on 'solarac2' CRS-2677: Stop of 'ora.ctssd' on 'solarac2' succeeded CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'solarac2' CRS-2677: Stop of 'ora.cssdmonitor' on 'solarac2' succeeded CRS-2673: Attempting to stop 'ora.cssd' on 'solarac2' CRS-2677: Stop of 'ora.cssd' on 'solarac2' succeeded CRS-2673: Attempting to stop 'ora.gpnpd' on 'solarac2' CRS-2677: Stop of 'ora.gpnpd' on 'solarac2' succeeded CRS-2679: Attempting to clean 'ora.gpnpd' on 'solarac2' CRS-2681: Clean of 'ora.gpnpd' on 'solarac2' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on 'solarac2' CRS-2677: Stop of 'ora.gipcd' on 'solarac2' succeeded CRS-2673: Attempting to stop 'ora.mdnsd' on 'solarac2' CRS-2677: Stop of 'ora.mdnsd' on 'solarac2' succeeded Initial cluster configuration failed. See /u01/grid/11.2.0/cfgtoollogs/crsconfig/rootcrs_solarac2.log for details
I tried to run root.sh again which I shouldn’t have done because it is documented not to do. (I have to confess that I did not read the installation document well)
The error stack was different like below
# /u01/app/11.2.0/grid/root.sh Running Oracle 11g root.sh script... ......... Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root.sh script. Now product-specific root actions will be performed. 2009-12-06 22:57:05: Parsing the host name 2009-12-06 22:57:05: Checking for super user privileges 2009-12-06 22:57:05: User has super user privileges Using configuration parameter file: /u01/11.2.0/grid/crs/install/crsconfig_params CRS is already configured on this node for crshome=0 Cannot configure two CRS instances on the same cluster. Please deconfigure before proceeding with the configuration of new home.
As you see it didn’t allow me to re-run it. I needed to find a way to deconfigure the configuration. After a quick search on official doc I found the way here.
According to the doc, all I needed to do is run the command below and re-run the root.sh
/crs/install/rootcrs.pl -deconfig
Here is what happened when I run deconfigure
2009-12-07 00:35:17: Parsing the host name 2009-12-07 00:35:17: Checking for super user privileges 2009-12-07 00:35:17: User has super user privileges Using configuration parameter file: /u01/grid/11.2.0/crs/install/crsconfig_params Oracle Clusterware stack is not active on this node Restart the clusterware stack (use /u01/grid/11.2.0/bin/crsctl start crs) and retry Failed to verify resources
Still wasn’t working ??? I tried force option and it seemed like it de-configured successfully (maybe
)
# /u01/grid/11.2.0/crs/install/rootcrs.pl -deconfig -force 2009-12-07 00:39:13: Parsing the host name 2009-12-07 00:39:13: Checking for super user privileges 2009-12-07 00:39:13: User has super user privileges Using configuration parameter file: /u01/grid/11.2.0/crs/install/crsconfig_params PRCR-1035 : Failed to look up CRS resource ora.cluster_vip.type for 1 PRCR-1068 : Failed to query resources Cannot communicate with crsd PRCR-1070 : Failed to check if resource ora.gsd is registered Cannot communicate with crsd PRCR-1070 : Failed to check if resource ora.ons is registered Cannot communicate with crsd PRCR-1070 : Failed to check if resource ora.eons is registered Cannot communicate with crsd CRS-4133: Oracle High Availability Services has been stopped. Successfully deconfigured Oracle clusterware stack on this node
It says it did successfully deconfigured but when I run the root.sh again I got this
Disk Group DATA already exists. Cannot be created again Configuration of ASM failed, see logs for details Did not succssfully configure and start ASM CRS-2500: Cannot stop resource 'ora.crsd' as it is not running CRS-4000: Command Stop failed, or completed with errors. Command return code of 1 (256) from command: /u01/grid/11.2.0/bin/crsctl stop resource ora.crsd -init Stop of resource "ora.crsd -init" failed Failed to stop CRSD CRS-2500: Cannot stop resource 'ora.asm' as it is not running CRS-4000: Command Stop failed, or completed with errors. Command return code of 1 (256) from command: /u01/grid/11.2.0/bin/crsctl stop resource ora.asm -init Stop of resource "ora.asm -init" failed Failed to stop ASM CRS-2673: Attempting to stop 'ora.ctssd' on 'solarac1' CRS-2677: Stop of 'ora.ctssd' on 'solarac1' succeeded CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'solarac1' CRS-2677: Stop of 'ora.cssdmonitor' on 'solarac1' succeeded CRS-2673: Attempting to stop 'ora.cssd' on 'solarac1' CRS-2677: Stop of 'ora.cssd' on 'solarac1' succeeded CRS-2673: Attempting to stop 'ora.gpnpd' on 'solarac1' CRS-2677: Stop of 'ora.gpnpd' on 'solarac1' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on 'solarac1' CRS-2677: Stop of 'ora.gipcd' on 'solarac1' succeeded CRS-2673: Attempting to stop 'ora.mdnsd' on 'solarac1' CRS-2677: Stop of 'ora.mdnsd' on 'solarac1' succeeded Initial cluster configuration failed. See /u01/grid/11.2.0/cfgtoollogs/crsconfig/rootcrs_solarac2.log for details
On the mentioned logfile it says
2009-12-07 00:43:26: Executing as grid: /u01/grid/11.2.0/bin/asmca -silent -diskGroupName DATA -diskList /dev/rdsk/c1t1d0s1,/dev/rdsk/c1t2d0s1,/dev/rdsk/c1t3 d0s1,/dev/rdsk/c1t4d0s1 -redundancy EXTERNAL -configureLocalASM 2009-12-07 00:43:26: Running as user grid: /u01/grid/11.2.0/bin/asmca -silent -diskGroupName DATA -diskList /dev/rdsk/c1t1d0s1,/dev/rdsk/c1t2d0s1,/dev/rdsk/c 1t3d0s1,/dev/rdsk/c1t4d0s1 -redundancy EXTERNAL -configureLocalASM 2009-12-07 00:43:26: Invoking "/u01/grid/11.2.0/bin/asmca -silent -diskGroupName DATA -diskList /dev/rdsk/c1t1d0s1,/dev/rdsk/c1t2d0s1,/dev/rdsk/c1t3d0s1,/d ev/rdsk/c1t4d0s1 -redundancy EXTERNAL -configureLocalASM" as user "grid" 2009-12-07 00:43:30: Configuration of ASM failed, see logs for details
Basically it configures asm with asmca command. asmca utility does not have drop diskgroup option which makes it unusable for this situation. (there is deleteasm option but it does not work fine because it needs a working asm instance which wasn’t possible after failed root.sh)
I didn’t want to delete all CRS installation so I needed a way to remove diskgroup information from ASM disks?
All I needed was dd command to remove the disk header information from the devices.
I had 4 disk presented for that disk group so I used dd command for all of them (I am not sure maybe I needed only the firs device I need to check invaluable presentation of Julian Dyke about ASM Internals)
# dd if=/dev/zero of=/dev/rdsk/c1t2d0s1 bs=1024K count=100 dd: bad numeric argument: "1024K" bash-3.00# dd if=/dev/zero of=/dev/rdsk/c1t2d0s1 bs=1k count=1000000 1000000+0 records in 1000000+0 records out # dd if=/dev/zero of=/dev/rdsk/c1t1d0s1 bs=1k count=1000000 1000000+0 records in 1000000+0 records out # dd if=/dev/zero of=/dev/rdsk/c1t3d0s1 bs=1k count=1000000 1000000+0 records in 1000000+0 records out # dd if=/dev/zero of=/dev/rdsk/c1t4d0s1 bs=1k count=1000000 1000000+0 records in 1000000+0 records out
After this deletion I re-run the deconfigure script and re-run the root.sh. Everything worked fine without any problem at all. The story will continue with How to install 11GR2 RAC on Solaris 10 on VMware (give me a bit more time to finish)
footnoteSmilar issue reported on metalink for Linux ( ML 955550.1)
Sources used
Oracle® Grid Infrastructure Installation Guide 11g Release 2 (11.2) for Solaris Operating System
How to use Files in place of Real Disk Devices for ASM – (Solaris) by Jeff Hunter
How to rerun root.sh during initial installation of GRID Infrastructure. by RACHELP
