4 months ago while we were archiving our database (2 node RAC) we got continuous Ora-1555 error for archiving queries.
I decided to update undo_retention parameter with command
alter system set undo_retention=36000 scope=both;
than DB hanged. One node was accepting “sqlplus / as sysdba” while
other node (the node i alter the DB ) wasn’t.
srvctl stop database -d DB_NAME
srvctl stop instance -d DB_NAME -i INSTANCE_NAME
commands did not work for the unresponding node.
I took the system state dumps from the working node and killed the other nodes processes from OS.
After CRS restart everthing worked fine.
I did not understand why till the respond from Metalink Tar.
This was related with the bug https://metalink.oracle.com/metalink/plsql/showdoc?db=Bug&id=4220405
ALTER SYSTEM SET UNDO_RETENTION=<N> HANGS RAC INSTANCES which has been closed as base bug 3023661)
Bug 4220405 ALTER SYSTEM SET UNDO_RETENTION=<N> HANGS RAC INSTANCES which has been closed as base bug 3023661)
The problem is caused by deadlock between CKPT and PZ99 slave. The internal algorithm to query the
current value of undo_retention of each instance and modify it has problem. It does unnecessary gv$ query and
lob$ update when spfile is used. It is coding problem. The bug has been fixed in 10.2 and it is not backportable to 10.1.0.x due to code structure change.
The Usage of UNDO_RETENTION is below
alter system set undo_retention=1800 sid=’RAC1′;
alter system set undo_retention=1800 sid=’RAC2′;
Be carefull while changing undo_retention on RAC