The "watchdog" pipe is created by the ProcD when it starts up, and is only ever deleted by Condor when the ProcD shuts down.

Is it possible that something outside of Condor is deleting the pipe? We have seen problems like this before with programs like tmpwatch (although I guess it's doubtful that tmpwatch is running over your /home/condor/hosts/wolf10/log/ directory).

Come to think of it, /home/condor/hosts/wolf10/log sounds like it could be on NFS. It's perfectly fine to have your LOG directory on NFS, but it is in that case required to have a separate local LOCK directory (where things like the ProcD's pipes are stored). Please make sure that your LOCK setting refers to a local directory.


Fernando Rannou wrote:
I'm getting he following error in one of the StaterLog
1/28 11:20:04 About to exec /home/mpetct/sampproc --universal
1/28 11:20:04 error opening watchdog pipe /home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog: No such file or directory (2)
1/28 11:20:04 ProcFamilyClient: error initializing LocalClient
1/28 11:20:04 ProcFamilyProxy: error initializing ProcFamilyClient
1/28 11:20:04 ERROR "ProcD has failed" at line 599 in file proc_family_proxy.C
1/28 11:20:04 ShutdownFast all jobs.
Clealry the "pipe" files are not there. What should I do.
We restarted condor on all nodes but the files did not appear.

This has happened in a couple of nodes. All other nodes do have the
watchdog file:

prw-rw----    1 root     isl             0 Nov  4 16:08 procd_pipe.STARTD
prw-rw---- 1 root isl 0 Nov 4 16:08 procd_pipe.STARTD.watchdog