[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] watchdog pipe file missing



Greg the files are now there, great!

prw-------    1 root     isl             0 Jan 29 12:53 procd_pipe.STARTD
prw-------    1 root     isl             0 Jan 29 12:53 procd_pipe.STARTD.watchdog


however, I got a permission denied on StarterLog.slot1


1/29 13:11:57 About to exec /home/rannou/GateInstall/gate_v2.2.0/bin/Linux-g++/Gate -a ROOT_FILE plane24/hitsfrompixel_7_12_24 -a X_POS 4.6950 -a Y_POS 7.8250 -a Z_POS -34.7430 -a MATERIAL BGO -a ACTIVITY 10609300 -a SEED_INDEX 24 -a PETBox 0 main.mac
1/29 13:11:57 error opening watchdog pipe /home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog: Permission denied (13)


Fer

On Thu, Jan 29, 2009 at 4:21 PM, Greg Quinn <gquinn@xxxxxxxxxxx> wrote:
Hello,

Fernando Rannou wrote:

> 1. after executing condor_restart -startd the watchdog
> files are not created. There is only the StartLog
> which shows an error:
> 1/29 10:10:56 PERMISSION DENIED to unauthenticated user from host
> <192.168.10.10:32851 <http://192.168.10.10:32851>> for command 60005
> (DC_OFF_GRACEFUL), access level ADMINISTRATOR

The StartD was never actually restarted, since your condor_restart
command was denied permission. The HOSTALLOW_ADMINISTRATOR setting is
what determines the machines from which you can issue a condor_restart.
Your HOSTALLOW_ADMINSTRATOR setting is probably at its default setting,
which includes only the central manager. So you could:

1) Issue all the needed condor_restart commands from the central manager
   using the form "condor_restart -startd <hostname>"

2) Loosen your HOSTALLOW_ADMINISTRATOR setting if the security
   implications of doing so don't concern you. For example, setting

   HOSTALLOW_ADMINISTRATOR = $(CONDOR_HOST), $(FULL_HOSTNAME)

   would give someone logged into any host in your pool the ability to
   send administrative commands to the Condor daemons running on that
   host.

> However, the node still shows on condor_status ??

Right, the StartD never exited and is still reporting itself to the
Collector.

> 2. when I submit my first job, I get this error on StarterLog.slot1
>
> 1/29 10:25:37 About to exec /bin/date --universal
> 1/29 10:25:37 error opening watchdog pipe
> /home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog: No such file
> or directory (2)
> 1/29 10:25:37 ProcFamilyClient: error initializing LocalClient
> 1/29 10:25:37 ProcFamilyProxy: error initializing ProcFamilyClient
> 1/29 10:25:37 ERROR "ProcD has failed" at line 599 in file
> proc_family_proxy.C
> 1/29 10:25:37 ShutdownFast all jobs.

Sure - same error as before since the StartD hasn't been restarted.

Later,

Greg Quinn
Condor Team

> Thanks for your patience, Greg
> Fernando
> On Wed, Jan 28, 2009 at 5:13 PM, Greg Quinn <gquinn@xxxxxxxxxxx
> <mailto:gquinn@xxxxxxxxxxx>> wrote:
>
>     Fernando Rannou wrote:
>      > Thanks Greg
>      >
>      > but then, what should I do to create the file
>      > in the meantime?
>      >
>      > Fernando
>
>     I'm pretty sure that restarting the StartD (condor_restart -startd) on
>     each machine that is missing the file should do the trick.
>
>     Greg
>
>      > On Wed, Jan 28, 2009 at 4:53 PM, Greg Quinn <gquinn@xxxxxxxxxxx
>     <mailto:gquinn@xxxxxxxxxxx>
>      > <mailto:gquinn@xxxxxxxxxxx <mailto:gquinn@xxxxxxxxxxx>>> wrote:
>      >
>      >     Fernando,
>      >
>      >     The "watchdog" pipe is created by the ProcD when it starts
>     up, and is
>      >     only ever deleted by Condor when the ProcD shuts down.
>      >
>      >     Is it possible that something outside of Condor is deleting
>     the pipe? We
>      >     have seen problems like this before with programs like tmpwatch
>      >     (although I guess it's doubtful that tmpwatch is running over
>     your
>      >     /home/condor/hosts/wolf10/log/ directory).
>      >
>      >     Come to think of it, /home/condor/hosts/wolf10/log sounds
>     like it could
>      >     be on NFS. It's perfectly fine to have your LOG directory on
>     NFS, but it
>      >     is in that case required to have a separate local LOCK
>     directory (where
>      >     things like the ProcD's pipes are stored). Please make sure
>     that your
>      >     LOCK setting refers to a local directory.
>      >
>      >     Thanks,
>      >
>      >     Greg Quinn
>      >     Condor Team
>      >
>      >     Fernando Rannou wrote:
>      >      > Hello,
>      >      > I'm getting he following error in one of the StaterLog
>      >      > ------------------------
>      >      > 1/28 11:20:04 About to exec /home/mpetct/sampproc --universal
>      >      > 1/28 11:20:04 error opening watchdog pipe
>      >      > /home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog:
>     No such
>      >     file
>      >      > or directory (2)
>      >      > 1/28 11:20:04 ProcFamilyClient: error initializing LocalClient
>      >      > 1/28 11:20:04 ProcFamilyProxy: error initializing
>     ProcFamilyClient
>      >      > 1/28 11:20:04 ERROR "ProcD has failed" at line 599 in file
>      >      > proc_family_proxy.C
>      >      > 1/28 11:20:04 ShutdownFast all jobs.
>      >      > --------------------------
>      >      > Clealry the "pipe" files are not there. What should I do.
>      >      > We restarted condor on all nodes but the files did not appear.
>      >      >
>      >      > This has happened in a couple of nodes. All other nodes do
>     have the
>      >      > watchdog file:
>      >      >
>      >      > prw-rw----    1 root     isl             0 Nov  4 16:08
>      >     procd_pipe.STARTD
>      >      > prw-rw----    1 root     isl             0 Nov  4 16:08
>      >      > procd_pipe.STARTD.watchdog
>      >      > -
>      >      > Thanks
>      >      >
>      >      > Fernando
>      >     _______________________________________________
>      >     Condor-users mailing list
>      >     To unsubscribe, send a message to
>     condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx>
>      >     <mailto:condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx>> with a
>      >     subject: Unsubscribe
>      >     You can also unsubscribe by visiting
>      >     https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>      >
>      >     The archives can be found at:
>      >     https://lists.cs.wisc.edu/archive/condor-users/
>      >
>      >
>      >
>      >
>     ------------------------------------------------------------------------
>      >
>      > _______________________________________________
>      > Condor-users mailing list
>      > To unsubscribe, send a message to
>     condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx> with a
>      > subject: Unsubscribe
>      > You can also unsubscribe by visiting
>      > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>      >
>      > The archives can be found at:
>      > https://lists.cs.wisc.edu/archive/condor-users/
>
>     _______________________________________________
>     Condor-users mailing list
>     To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
>     <mailto:condor-users-request@xxxxxxxxxxx> with a
>     subject: Unsubscribe
>     You can also unsubscribe by visiting
>     https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>     The archives can be found at:
>     https://lists.cs.wisc.edu/archive/condor-users/
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/