I was working kind of late last night, trying to get a new install working. I ran into a bug where the permissions on the ohasd were incorrect after patching GI. I went out to a working server to see what the permissions should be, build my chown and chmod statements and pasted them into my terminal window. Unfortunately I got them in the wrong terminal, and had managed to copy the wrong permissions. I changed the ownership on ohasd on the first node of my production RAC Cluster. Apparently the permissions are really important because the whole node went down.

A little bit of panic set in and I wasn’t sure what I had done. I didn’t realize i had pasted the permission statements into the wrong window, and the error messages weren’t very helpful.

[root@node1 bin]# ./crsctl stop crs -f
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Stop failed, or completed with errors.
[root@node1 bin]# ps -ef | grep d.bin
root 33615 1 0 16:53 ? 00:00:00 /u01/app/ reboot
root 53203 1 0 16:57 ? 00:00:00 /u01/app/ reboot
root 79744 1 0 17:07 ? 00:00:00 /u01/app/ reboot
root 105598 103980 0 17:17 pts/2 00:00:00 grep d.bin
[root@node1 bin]# kill -9 33615 53203 79744
[root@node1 bin]# ps -ef | grep d.bin
root 106623 103980 0 17:17 pts/2 00:00:00 grep d.bin
[root@node1 bin]# date
Thu Jan 5 17:18:02 EST 2017
[root@node1 bin]# ./crsctl start crs
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.

I was getting nothing in the logs.

I figured it must be a permission issue, but I wasn’t quite sure what to reset them to.

Apparently I am not the first person to do this since Oracle has a document for fixing this!

How to check and fix file permissions on Grid Infrastructure environment (Doc ID 1931142.1)

I ran

./rootcrs.pl -init

rebooted the node, and all was right with the world, except for my ego being kind of damaged from making such a silly mistake.

