[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] watchdog pipe file missing



Hi Greg
I got some more info.

1. after executing condor_restart -startd the watchdog
files are not created. There is only the StartLog
which shows an error:
1/29 10:10:56 PERMISSION DENIED to unauthenticated user from host <192.168.10.10:32851> for command 60005 (DC_OFF_GRACEFUL), access level ADMINISTRATOR


However, the node still shows on condor_status ??

2. when I submit my first job, I get this error on StarterLog.slot1

1/29 10:25:37 About to exec /bin/date --universal
1/29 10:25:37 error opening watchdog pipe /home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog: No such file or directory (2)
1/29 10:25:37 ProcFamilyClient: error initializing LocalClient
1/29 10:25:37 ProcFamilyProxy: error initializing ProcFamilyClient
1/29 10:25:37 ERROR "ProcD has failed" at line 599 in file proc_family_proxy.C
1/29 10:25:37 ShutdownFast all jobs.


Thanks for your patience, Greg
Fernando
On Wed, Jan 28, 2009 at 5:13 PM, Greg Quinn <gquinn@xxxxxxxxxxx> wrote:
Fernando Rannou wrote:
> Thanks Greg
>
> but then, what should I do to create the file
> in the meantime?
>
> Fernando

I'm pretty sure that restarting the StartD (condor_restart -startd) on
each machine that is missing the file should do the trick.

Greg

> On Wed, Jan 28, 2009 at 4:53 PM, Greg Quinn <gquinn@xxxxxxxxxxx
> <mailto:gquinn@xxxxxxxxxxx>> wrote:
>
>     Fernando,
>
>     The "watchdog" pipe is created by the ProcD when it starts up, and is
>     only ever deleted by Condor when the ProcD shuts down.
>
>     Is it possible that something outside of Condor is deleting the pipe? We
>     have seen problems like this before with programs like tmpwatch
>     (although I guess it's doubtful that tmpwatch is running over your
>     /home/condor/hosts/wolf10/log/ directory).
>
>     Come to think of it, /home/condor/hosts/wolf10/log sounds like it could
>     be on NFS. It's perfectly fine to have your LOG directory on NFS, but it
>     is in that case required to have a separate local LOCK directory (where
>     things like the ProcD's pipes are stored). Please make sure that your
>     LOCK setting refers to a local directory.
>
>     Thanks,
>
>     Greg Quinn
>     Condor Team
>
>     Fernando Rannou wrote:
>      > Hello,
>      > I'm getting he following error in one of the StaterLog
>      > ------------------------
>      > 1/28 11:20:04 About to exec /home/mpetct/sampproc --universal
>      > 1/28 11:20:04 error opening watchdog pipe
>      > /home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog: No such
>     file
>      > or directory (2)
>      > 1/28 11:20:04 ProcFamilyClient: error initializing LocalClient
>      > 1/28 11:20:04 ProcFamilyProxy: error initializing ProcFamilyClient
>      > 1/28 11:20:04 ERROR "ProcD has failed" at line 599 in file
>      > proc_family_proxy.C
>      > 1/28 11:20:04 ShutdownFast all jobs.
>      > --------------------------
>      > Clealry the "pipe" files are not there. What should I do.
>      > We restarted condor on all nodes but the files did not appear.
>      >
>      > This has happened in a couple of nodes. All other nodes do have the
>      > watchdog file:
>      >
>      > prw-rw----    1 root     isl             0 Nov  4 16:08
>     procd_pipe.STARTD
>      > prw-rw----    1 root     isl             0 Nov  4 16:08
>      > procd_pipe.STARTD.watchdog
>      > -
>      > Thanks
>      >
>      > Fernando
>     _______________________________________________
>     Condor-users mailing list
>     To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx> with a
>     subject: Unsubscribe
>     You can also unsubscribe by visiting
>     https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>     The archives can be found at:
>     https://lists.cs.wisc.edu/archive/condor-users/
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/