[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] watchdog pipe file missing



Hello,

Fernando Rannou wrote:

1. after executing condor_restart -startd the watchdog
files are not created. There is only the StartLog
which shows an error:
1/29 10:10:56 PERMISSION DENIED to unauthenticated user from host <192.168.10.10:32851 <http://192.168.10.10:32851>> for command 60005 (DC_OFF_GRACEFUL), access level ADMINISTRATOR

The StartD was never actually restarted, since your condor_restart command was denied permission. The HOSTALLOW_ADMINISTRATOR setting is what determines the machines from which you can issue a condor_restart. Your HOSTALLOW_ADMINSTRATOR setting is probably at its default setting, which includes only the central manager. So you could:

1) Issue all the needed condor_restart commands from the central manager
   using the form "condor_restart -startd <hostname>"

2) Loosen your HOSTALLOW_ADMINISTRATOR setting if the security
   implications of doing so don't concern you. For example, setting

   HOSTALLOW_ADMINISTRATOR = $(CONDOR_HOST), $(FULL_HOSTNAME)

   would give someone logged into any host in your pool the ability to
   send administrative commands to the Condor daemons running on that
   host.

However, the node still shows on condor_status ??

Right, the StartD never exited and is still reporting itself to the Collector.

2. when I submit my first job, I get this error on StarterLog.slot1

1/29 10:25:37 About to exec /bin/date --universal
1/29 10:25:37 error opening watchdog pipe /home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog: No such file or directory (2)
1/29 10:25:37 ProcFamilyClient: error initializing LocalClient
1/29 10:25:37 ProcFamilyProxy: error initializing ProcFamilyClient
1/29 10:25:37 ERROR "ProcD has failed" at line 599 in file proc_family_proxy.C
1/29 10:25:37 ShutdownFast all jobs.

Sure - same error as before since the StartD hasn't been restarted.

Later,

Greg Quinn
Condor Team

Thanks for your patience, Greg
Fernando
On Wed, Jan 28, 2009 at 5:13 PM, Greg Quinn <gquinn@xxxxxxxxxxx <mailto:gquinn@xxxxxxxxxxx>> wrote:

    Fernando Rannou wrote:
     > Thanks Greg
     >
     > but then, what should I do to create the file
     > in the meantime?
     >
     > Fernando

    I'm pretty sure that restarting the StartD (condor_restart -startd) on
    each machine that is missing the file should do the trick.

    Greg

     > On Wed, Jan 28, 2009 at 4:53 PM, Greg Quinn <gquinn@xxxxxxxxxxx
    <mailto:gquinn@xxxxxxxxxxx>
     > <mailto:gquinn@xxxxxxxxxxx <mailto:gquinn@xxxxxxxxxxx>>> wrote:
     >
     >     Fernando,
     >
     >     The "watchdog" pipe is created by the ProcD when it starts
    up, and is
     >     only ever deleted by Condor when the ProcD shuts down.
     >
     >     Is it possible that something outside of Condor is deleting
    the pipe? We
     >     have seen problems like this before with programs like tmpwatch
     >     (although I guess it's doubtful that tmpwatch is running over
    your
     >     /home/condor/hosts/wolf10/log/ directory).
     >
     >     Come to think of it, /home/condor/hosts/wolf10/log sounds
    like it could
     >     be on NFS. It's perfectly fine to have your LOG directory on
    NFS, but it
     >     is in that case required to have a separate local LOCK
    directory (where
     >     things like the ProcD's pipes are stored). Please make sure
    that your
     >     LOCK setting refers to a local directory.
     >
     >     Thanks,
     >
     >     Greg Quinn
     >     Condor Team
     >
     >     Fernando Rannou wrote:
     >      > Hello,
     >      > I'm getting he following error in one of the StaterLog
     >      > ------------------------
     >      > 1/28 11:20:04 About to exec /home/mpetct/sampproc --universal
     >      > 1/28 11:20:04 error opening watchdog pipe
     >      > /home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog:
    No such
     >     file
     >      > or directory (2)
     >      > 1/28 11:20:04 ProcFamilyClient: error initializing LocalClient
     >      > 1/28 11:20:04 ProcFamilyProxy: error initializing
    ProcFamilyClient
     >      > 1/28 11:20:04 ERROR "ProcD has failed" at line 599 in file
     >      > proc_family_proxy.C
     >      > 1/28 11:20:04 ShutdownFast all jobs.
     >      > --------------------------
     >      > Clealry the "pipe" files are not there. What should I do.
     >      > We restarted condor on all nodes but the files did not appear.
     >      >
     >      > This has happened in a couple of nodes. All other nodes do
    have the
     >      > watchdog file:
     >      >
     >      > prw-rw----    1 root     isl             0 Nov  4 16:08
     >     procd_pipe.STARTD
     >      > prw-rw----    1 root     isl             0 Nov  4 16:08
     >      > procd_pipe.STARTD.watchdog
     >      > -
     >      > Thanks
     >      >
     >      > Fernando
     >     _______________________________________________
     >     Condor-users mailing list
     >     To unsubscribe, send a message to
    condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx>
     >     <mailto:condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx>> with a
     >     subject: Unsubscribe
     >     You can also unsubscribe by visiting
     >     https://lists.cs.wisc.edu/mailman/listinfo/condor-users
     >
     >     The archives can be found at:
     >     https://lists.cs.wisc.edu/archive/condor-users/
     >
     >
     >
     >
    ------------------------------------------------------------------------
     >
     > _______________________________________________
     > Condor-users mailing list
     > To unsubscribe, send a message to
    condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx> with a
     > subject: Unsubscribe
     > You can also unsubscribe by visiting
     > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
     >
     > The archives can be found at:
     > https://lists.cs.wisc.edu/archive/condor-users/

    _______________________________________________
    Condor-users mailing list
    To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx> with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/mailman/listinfo/condor-users

    The archives can be found at:
    https://lists.cs.wisc.edu/archive/condor-users/



------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/