advertisements
_____________________________________________________________________________________________________________________
Error Description
Cluster is up and running in one node but when I try to start the cluster in second node gives following error messages.
crsctl check crs command failed with following error messages.
[root@Node2 ~]# crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4535: Cannot communicate with Cluster Ready Services CRS-4530: Communications failure contacting Cluster Synchronization Services daemon CRS-4534: Cannot communicate with Event Manager
Also crsctl start res ora.cluster_interconnect.haip -init Command failed with CRS-2674: Start of 'ora.cssd' on 'Node2' failed
[root@Node2 ~]# crsctl start res ora.cluster_interconnect.haip -init CRS-2672: Attempting to start 'ora.cssd' on 'Node2' CRS-2672: Attempting to start 'ora.diskmon' on 'Node2' CRS-2676: Start of 'ora.diskmon' on 'Node2' succeeded CRS-2674: Start of 'ora.cssd' on 'Node2' failed CRS-2679: Attempting to clean 'ora.cssd' on 'Node2' CRS-2681: Clean of 'ora.cssd' on 'Node2' succeeded CRS-4000: Command Start failed, or completed with errors.
Solution Description
When I checked the occsd.trc file I noticed that there are few repeated lines mentioning like has a disk HB, but no network HB. Now I realized that there are some network issues and when I pinged the interconnect/private IPs between the RAC nodes it is not pinging.
Log/Trace File: /u01/app/grid/diag/crs/Node2/crs/trace/ocssd.trc
2016-09-23 08:22:06.378508 : CSSD:2643093248: clssscWaitOnEventValue: after CmInfo State val 3, eval 1 waited 1000 with cvtimewait status 4294967186 2016-09-23 08:22:06.636082 : CSSD:2620225280: clssnmPollingThread: state(1) clusterState(0) exit 2016-09-23 08:22:06.636092 : CSSD:2620225280: clssscExit: removeNode() already called 2016-09-23 08:22:06.636095 : CSSD:2620225280: clssscExit: abort already set 0 2016-09-23 08:22:06.742376 : CSSD:2623379200: clssnmvDHBValidateNCopy: node 1, Node1, has a disk HB, but no network HB, DHB has rcfg 370075968, wrtcnt, 966100, LATS 16055504, lastSeqNo 966099, uniqueness 1474627759, timestamp 1474633343/1310023344 2016-09-23 08:22:07.378630 : CSSD:2643093248: clssscWaitOnEventValue: after CmInfo State val 3, eval 1 waited 1000 with cvtimewait status 4294967186 2016-09-23 08:22:07.743076 : CSSD:2623379200: clssnmvDHBValidateNCopy: node 1, Node1, has a disk HB, but no network HB, DHB has rcfg 370075968, wrtcnt, 966101, LATS 16056504, lastSeqNo 966100, uniqueness 1474627759, timestamp 1474633344/1310024344 2016-09-23 08:22:08.378747 : CSSD:2643093248: clssscWaitOnEventValue: after CmInfo State val 3, eval 1 waited 1000 with cvtimewait status 4294967186 2016-09-23 08:22:08.743849 : CSSD:2623379200: clssnmvDHBValidateNCopy: node 1, Node1, has a disk HB, but no network HB, DHB has rcfg 370075968, wrtcnt, 966102, LATS 16057504, lastSeqNo 966101, uniqueness 1474627759, timestamp 1474633345/1310025344
advertisements
Tried to ping the interconnect ip from node 1 to 2 it is not reachable and there was some VLAN problems associated with that. Network team fixed the issue as per my request and issue resolved.
[root@Node2 ~]# ping Node1priv.abnsayrate.net PING Node1priv.abnsayrate.net (10.188.60.61) 56(84) bytes of data. From Node2priv.abnsayrate.net (10.188.60.62) icmp_seq=1 Destination Host Unreachable From Node2priv.abnsayrate.net (10.188.60.62) icmp_seq=2 Destination Host Unreachable From Node2priv.abnsayrate.net (10.188.60.62) icmp_seq=3 Destination Host Unreachable From Node2priv.abnsayrate.net (10.188.60.62) icmp_seq=4 Destination Host Unreachable ^C --- Node1priv.abnsayrate.net ping statistics --- 7 packets transmitted, 0 received, +4 errors, 100% packet loss, time 6000ms pipe 4 [root@Node2 ~]# ^C [root@Node2 ~]# ping 10.188.60.61 PING 10.188.60.61 (10.188.60.61) 56(84) bytes of data. From 10.188.60.61 icmp_seq=1 Destination Host Unreachable From 10.188.60.61 icmp_seq=2 Destination Host Unreachable From 10.188.60.61 icmp_seq=3 Destination Host Unreachable From 10.188.60.61 icmp_seq=4 Destination Host Unreachable ^C^C --- 10.188.60.61 ping statistics --- 6 packets transmitted, 0 received, +4 errors, 100% packet loss, time 5000ms pipe 4
Now after fixing the issue with the network I am able to ping the IP and able to start the cluster thereafter.
[oracle@Node2 ~]$ ping 10.188.60.61 PING 10.188.60.61 (10.188.60.61) 56(84) bytes of data. 64 bytes from 10.188.60.61: icmp_seq=1 ttl=64 time=0.116 ms 64 bytes from 10.188.60.61: icmp_seq=2 ttl=64 time=0.049 ms 64 bytes from 10.188.60.61: icmp_seq=3 ttl=64 time=0.122 ms ^C --- 10.188.60.61 ping statistics --- Start the cluster [root@Node2 ~]# crsctl start cluster CRS-2672: Attempting to start 'ora.crf' on 'Node2' CRS-2672: Attempting to start 'ora.cssd' on 'Node2' CRS-2672: Attempting to start 'ora.diskmon' on 'Node2' CRS-2676: Start of 'ora.diskmon' on 'Node2' succeeded CRS-2676: Start of 'ora.crf' on 'Node2' succeeded CRS-2676: Start of 'ora.cssd' on 'Node2' succeeded CRS-2672: Attempting to start 'ora.ctssd' on 'Node2' CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'Node2' CRS-2676: Start of 'ora.ctssd' on 'Node2' succeeded CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'Node2' succeeded CRS-2672: Attempting to start 'ora.asm' on 'Node2' CRS-2676: Start of 'ora.asm' on 'Node2' succeeded CRS-2672: Attempting to start 'ora.storage' on 'Node2' CRS-2676: Start of 'ora.storage' on 'Node2' succeeded CRS-2672: Attempting to start 'ora.crsd' on 'Node2' CRS-2676: Start of 'ora.crsd' on 'Node2' succeeded
_____________________________________________________________________________________________________________________
0 comments:
Post a Comment