[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] watchdog pipe file missing



thanks for your help. Condor is running fine now.


Fernando

On Thu, Jan 29, 2009 at 7:03 PM, Greg Quinn <gquinn@xxxxxxxxxxx> wrote:
Hi Fernando,

Fernando Rannou wrote:
> Greg the files are now there, great!
>
> prw-------    1 root     isl             0 Jan 29 12:53 procd_pipe.STARTD
> prw-------    1 root     isl             0 Jan 29 12:53
> procd_pipe.STARTD.watchdog
>
>
> however, I got a permission denied on StarterLog.slot1

Is this directory on NFS? If so, I think root squash may be to blame and
you should use configure Condor's LOCK setting to point to a local
directory. If this isn't NFS, then this is strange and I'll investigate
further.

Thanks,

Greg

> 1/29 13:11:57 About to exec
> /home/rannou/GateInstall/gate_v2.2.0/bin/Linux-g++/Gate -a ROOT_FILE
> plane24/hitsfrompixel_7_12_24 -a X_POS 4.6950 -a Y_POS 7.8250 -a Z_POS
> -34.7430 -a MATERIAL BGO -a ACTIVITY 10609300 -a SEED_INDEX 24 -a PETBox
> 0 main.mac
> 1/29 13:11:57 error opening watchdog pipe
> /home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog: Permission
> denied (13)
>
>
> Fer
>
> On Thu, Jan 29, 2009 at 4:21 PM, Greg Quinn <gquinn@xxxxxxxxxxx
> <mailto:gquinn@xxxxxxxxxxx>> wrote:
>
>     Hello,
>
>     Fernando Rannou wrote:
>
>      > 1. after executing condor_restart -startd the watchdog
>      > files are not created. There is only the StartLog
>      > which shows an error:
>      > 1/29 10:10:56 PERMISSION DENIED to unauthenticated user from host
>      > <192.168.10.10:32851 <http://192.168.10.10:32851>
>     <http://192.168.10.10:32851>> for command 60005
>      > (DC_OFF_GRACEFUL), access level ADMINISTRATOR
>
>     The StartD was never actually restarted, since your condor_restart
>     command was denied permission. The HOSTALLOW_ADMINISTRATOR setting is
>     what determines the machines from which you can issue a condor_restart.
>     Your HOSTALLOW_ADMINSTRATOR setting is probably at its default setting,
>     which includes only the central manager. So you could:
>
>     1) Issue all the needed condor_restart commands from the central manager
>        using the form "condor_restart -startd <hostname>"
>
>     2) Loosen your HOSTALLOW_ADMINISTRATOR setting if the security
>        implications of doing so don't concern you. For example, setting
>
>        HOSTALLOW_ADMINISTRATOR = $(CONDOR_HOST), $(FULL_HOSTNAME)
>
>        would give someone logged into any host in your pool the ability to
>        send administrative commands to the Condor daemons running on that
>        host.
>
>      > However, the node still shows on condor_status ??
>
>     Right, the StartD never exited and is still reporting itself to the
>     Collector.
>
>      > 2. when I submit my first job, I get this error on StarterLog.slot1
>      >
>      > 1/29 10:25:37 About to exec /bin/date --universal
>      > 1/29 10:25:37 error opening watchdog pipe
>      > /home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog: No such
>     file
>      > or directory (2)
>      > 1/29 10:25:37 ProcFamilyClient: error initializing LocalClient
>      > 1/29 10:25:37 ProcFamilyProxy: error initializing ProcFamilyClient
>      > 1/29 10:25:37 ERROR "ProcD has failed" at line 599 in file
>      > proc_family_proxy.C
>      > 1/29 10:25:37 ShutdownFast all jobs.
>
>     Sure - same error as before since the StartD hasn't been restarted.
>
>     Later,
>
>     Greg Quinn
>     Condor Team
>
>      > Thanks for your patience, Greg
>      > Fernando
>      > On Wed, Jan 28, 2009 at 5:13 PM, Greg Quinn <gquinn@xxxxxxxxxxx
>     <mailto:gquinn@xxxxxxxxxxx>
>      > <mailto:gquinn@xxxxxxxxxxx <mailto:gquinn@xxxxxxxxxxx>>> wrote:
>      >
>      >     Fernando Rannou wrote:
>      >      > Thanks Greg
>      >      >
>      >      > but then, what should I do to create the file
>      >      > in the meantime?
>      >      >
>      >      > Fernando
>      >
>      >     I'm pretty sure that restarting the StartD (condor_restart
>     -startd) on
>      >     each machine that is missing the file should do the trick.
>      >
>      >     Greg
>      >
>      >      > On Wed, Jan 28, 2009 at 4:53 PM, Greg Quinn
>     <gquinn@xxxxxxxxxxx <mailto:gquinn@xxxxxxxxxxx>
>      >     <mailto:gquinn@xxxxxxxxxxx <mailto:gquinn@xxxxxxxxxxx>>
>      >      > <mailto:gquinn@xxxxxxxxxxx <mailto:gquinn@xxxxxxxxxxx>
>     <mailto:gquinn@xxxxxxxxxxx <mailto:gquinn@xxxxxxxxxxx>>>> wrote:
>      >      >
>      >      >     Fernando,
>      >      >
>      >      >     The "watchdog" pipe is created by the ProcD when it starts
>      >     up, and is
>      >      >     only ever deleted by Condor when the ProcD shuts down.
>      >      >
>      >      >     Is it possible that something outside of Condor is
>     deleting
>      >     the pipe? We
>      >      >     have seen problems like this before with programs like
>     tmpwatch
>      >      >     (although I guess it's doubtful that tmpwatch is
>     running over
>      >     your
>      >      >     /home/condor/hosts/wolf10/log/ directory).
>      >      >
>      >      >     Come to think of it, /home/condor/hosts/wolf10/log sounds
>      >     like it could
>      >      >     be on NFS. It's perfectly fine to have your LOG
>     directory on
>      >     NFS, but it
>      >      >     is in that case required to have a separate local LOCK
>      >     directory (where
>      >      >     things like the ProcD's pipes are stored). Please make
>     sure
>      >     that your
>      >      >     LOCK setting refers to a local directory.
>      >      >
>      >      >     Thanks,
>      >      >
>      >      >     Greg Quinn
>      >      >     Condor Team
>      >      >
>      >      >     Fernando Rannou wrote:
>      >      >      > Hello,
>      >      >      > I'm getting he following error in one of the StaterLog
>      >      >      > ------------------------
>      >      >      > 1/28 11:20:04 About to exec /home/mpetct/sampproc
>     --universal
>      >      >      > 1/28 11:20:04 error opening watchdog pipe
>      >      >      >
>     /home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog:
>      >     No such
>      >      >     file
>      >      >      > or directory (2)
>      >      >      > 1/28 11:20:04 ProcFamilyClient: error initializing
>     LocalClient
>      >      >      > 1/28 11:20:04 ProcFamilyProxy: error initializing
>      >     ProcFamilyClient
>      >      >      > 1/28 11:20:04 ERROR "ProcD has failed" at line 599
>     in file
>      >      >      > proc_family_proxy.C
>      >      >      > 1/28 11:20:04 ShutdownFast all jobs.
>      >      >      > --------------------------
>      >      >      > Clealry the "pipe" files are not there. What should
>     I do.
>      >      >      > We restarted condor on all nodes but the files did
>     not appear.
>      >      >      >
>      >      >      > This has happened in a couple of nodes. All other
>     nodes do
>      >     have the
>      >      >      > watchdog file:
>      >      >      >
>      >      >      > prw-rw----    1 root     isl             0 Nov  4 16:08
>      >      >     procd_pipe.STARTD
>      >      >      > prw-rw----    1 root     isl             0 Nov  4 16:08
>      >      >      > procd_pipe.STARTD.watchdog
>      >      >      > -
>      >      >      > Thanks
>      >      >      >
>      >      >      > Fernando
>      >      >     _______________________________________________
>      >      >     Condor-users mailing list
>      >      >     To unsubscribe, send a message to
>      >     condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx>
>      >     <mailto:condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx>>
>      >      >     <mailto:condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx>
>      >     <mailto:condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx>>> with a
>      >      >     subject: Unsubscribe
>      >      >     You can also unsubscribe by visiting
>      >      >     https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>      >      >
>      >      >     The archives can be found at:
>      >      >     https://lists.cs.wisc.edu/archive/condor-users/
>      >      >
>      >      >
>      >      >
>      >      >
>      >
>     ------------------------------------------------------------------------
>      >      >
>      >      > _______________________________________________
>      >      > Condor-users mailing list
>      >      > To unsubscribe, send a message to
>      >     condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx>
>      >     <mailto:condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx>> with a
>      >      > subject: Unsubscribe
>      >      > You can also unsubscribe by visiting
>      >      > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>      >      >
>      >      > The archives can be found at:
>      >      > https://lists.cs.wisc.edu/archive/condor-users/
>      >
>      >     _______________________________________________
>      >     Condor-users mailing list
>      >     To unsubscribe, send a message to
>     condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx>
>      >     <mailto:condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx>> with a
>      >     subject: Unsubscribe
>      >     You can also unsubscribe by visiting
>      >     https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>      >
>      >     The archives can be found at:
>      >     https://lists.cs.wisc.edu/archive/condor-users/
>      >
>      >
>      >
>      >
>     ------------------------------------------------------------------------
>      >
>      > _______________________________________________
>      > Condor-users mailing list
>      > To unsubscribe, send a message to
>     condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx> with a
>      > subject: Unsubscribe
>      > You can also unsubscribe by visiting
>      > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>      >
>      > The archives can be found at:
>      > https://lists.cs.wisc.edu/archive/condor-users/
>     _______________________________________________
>     Condor-users mailing list
>     To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx> with a
>     subject: Unsubscribe
>     You can also unsubscribe by visiting
>     https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>     The archives can be found at:
>     https://lists.cs.wisc.edu/archive/condor-users/
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/