[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] watchdog pipe file missing



Hi Fernando,

Fernando Rannou wrote:
Greg the files are now there, great!

prw-------    1 root     isl             0 Jan 29 12:53 procd_pipe.STARTD
prw------- 1 root isl 0 Jan 29 12:53 procd_pipe.STARTD.watchdog


however, I got a permission denied on StarterLog.slot1

Is this directory on NFS? If so, I think root squash may be to blame and you should use configure Condor's LOCK setting to point to a local directory. If this isn't NFS, then this is strange and I'll investigate further.

Thanks,

Greg

1/29 13:11:57 About to exec /home/rannou/GateInstall/gate_v2.2.0/bin/Linux-g++/Gate -a ROOT_FILE plane24/hitsfrompixel_7_12_24 -a X_POS 4.6950 -a Y_POS 7.8250 -a Z_POS -34.7430 -a MATERIAL BGO -a ACTIVITY 10609300 -a SEED_INDEX 24 -a PETBox 0 main.mac 1/29 13:11:57 error opening watchdog pipe /home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog: Permission denied (13)


Fer

On Thu, Jan 29, 2009 at 4:21 PM, Greg Quinn <gquinn@xxxxxxxxxxx <mailto:gquinn@xxxxxxxxxxx>> wrote:

    Hello,

    Fernando Rannou wrote:

     > 1. after executing condor_restart -startd the watchdog
     > files are not created. There is only the StartLog
     > which shows an error:
     > 1/29 10:10:56 PERMISSION DENIED to unauthenticated user from host
     > <192.168.10.10:32851 <http://192.168.10.10:32851>
    <http://192.168.10.10:32851>> for command 60005
     > (DC_OFF_GRACEFUL), access level ADMINISTRATOR

    The StartD was never actually restarted, since your condor_restart
    command was denied permission. The HOSTALLOW_ADMINISTRATOR setting is
    what determines the machines from which you can issue a condor_restart.
    Your HOSTALLOW_ADMINSTRATOR setting is probably at its default setting,
    which includes only the central manager. So you could:

    1) Issue all the needed condor_restart commands from the central manager
       using the form "condor_restart -startd <hostname>"

    2) Loosen your HOSTALLOW_ADMINISTRATOR setting if the security
       implications of doing so don't concern you. For example, setting

       HOSTALLOW_ADMINISTRATOR = $(CONDOR_HOST), $(FULL_HOSTNAME)

       would give someone logged into any host in your pool the ability to
       send administrative commands to the Condor daemons running on that
       host.

     > However, the node still shows on condor_status ??

    Right, the StartD never exited and is still reporting itself to the
    Collector.

     > 2. when I submit my first job, I get this error on StarterLog.slot1
     >
     > 1/29 10:25:37 About to exec /bin/date --universal
     > 1/29 10:25:37 error opening watchdog pipe
     > /home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog: No such
    file
     > or directory (2)
     > 1/29 10:25:37 ProcFamilyClient: error initializing LocalClient
     > 1/29 10:25:37 ProcFamilyProxy: error initializing ProcFamilyClient
     > 1/29 10:25:37 ERROR "ProcD has failed" at line 599 in file
     > proc_family_proxy.C
     > 1/29 10:25:37 ShutdownFast all jobs.

    Sure - same error as before since the StartD hasn't been restarted.

    Later,

    Greg Quinn
    Condor Team

     > Thanks for your patience, Greg
     > Fernando
     > On Wed, Jan 28, 2009 at 5:13 PM, Greg Quinn <gquinn@xxxxxxxxxxx
    <mailto:gquinn@xxxxxxxxxxx>
     > <mailto:gquinn@xxxxxxxxxxx <mailto:gquinn@xxxxxxxxxxx>>> wrote:
     >
     >     Fernando Rannou wrote:
     >      > Thanks Greg
     >      >
     >      > but then, what should I do to create the file
     >      > in the meantime?
     >      >
     >      > Fernando
     >
     >     I'm pretty sure that restarting the StartD (condor_restart
    -startd) on
     >     each machine that is missing the file should do the trick.
     >
     >     Greg
     >
     >      > On Wed, Jan 28, 2009 at 4:53 PM, Greg Quinn
    <gquinn@xxxxxxxxxxx <mailto:gquinn@xxxxxxxxxxx>
     >     <mailto:gquinn@xxxxxxxxxxx <mailto:gquinn@xxxxxxxxxxx>>
     >      > <mailto:gquinn@xxxxxxxxxxx <mailto:gquinn@xxxxxxxxxxx>
    <mailto:gquinn@xxxxxxxxxxx <mailto:gquinn@xxxxxxxxxxx>>>> wrote:
     >      >
     >      >     Fernando,
     >      >
     >      >     The "watchdog" pipe is created by the ProcD when it starts
     >     up, and is
     >      >     only ever deleted by Condor when the ProcD shuts down.
     >      >
     >      >     Is it possible that something outside of Condor is
    deleting
     >     the pipe? We
     >      >     have seen problems like this before with programs like
    tmpwatch
     >      >     (although I guess it's doubtful that tmpwatch is
    running over
     >     your
     >      >     /home/condor/hosts/wolf10/log/ directory).
     >      >
     >      >     Come to think of it, /home/condor/hosts/wolf10/log sounds
     >     like it could
     >      >     be on NFS. It's perfectly fine to have your LOG
    directory on
     >     NFS, but it
     >      >     is in that case required to have a separate local LOCK
     >     directory (where
     >      >     things like the ProcD's pipes are stored). Please make
    sure
     >     that your
     >      >     LOCK setting refers to a local directory.
     >      >
     >      >     Thanks,
     >      >
     >      >     Greg Quinn
     >      >     Condor Team
     >      >
     >      >     Fernando Rannou wrote:
     >      >      > Hello,
     >      >      > I'm getting he following error in one of the StaterLog
     >      >      > ------------------------
     >      >      > 1/28 11:20:04 About to exec /home/mpetct/sampproc
    --universal
     >      >      > 1/28 11:20:04 error opening watchdog pipe
     >      >      >
    /home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog:
     >     No such
     >      >     file
     >      >      > or directory (2)
     >      >      > 1/28 11:20:04 ProcFamilyClient: error initializing
    LocalClient
     >      >      > 1/28 11:20:04 ProcFamilyProxy: error initializing
     >     ProcFamilyClient
     >      >      > 1/28 11:20:04 ERROR "ProcD has failed" at line 599
    in file
     >      >      > proc_family_proxy.C
     >      >      > 1/28 11:20:04 ShutdownFast all jobs.
     >      >      > --------------------------
     >      >      > Clealry the "pipe" files are not there. What should
    I do.
     >      >      > We restarted condor on all nodes but the files did
    not appear.
     >      >      >
     >      >      > This has happened in a couple of nodes. All other
    nodes do
     >     have the
     >      >      > watchdog file:
     >      >      >
     >      >      > prw-rw----    1 root     isl             0 Nov  4 16:08
     >      >     procd_pipe.STARTD
     >      >      > prw-rw----    1 root     isl             0 Nov  4 16:08
     >      >      > procd_pipe.STARTD.watchdog
     >      >      > -
     >      >      > Thanks
     >      >      >
     >      >      > Fernando
     >      >     _______________________________________________
     >      >     Condor-users mailing list
     >      >     To unsubscribe, send a message to
     >     condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx>
     >     <mailto:condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx>>
     >      >     <mailto:condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx>
     >     <mailto:condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx>>> with a
     >      >     subject: Unsubscribe
     >      >     You can also unsubscribe by visiting
     >      >     https://lists.cs.wisc.edu/mailman/listinfo/condor-users
     >      >
     >      >     The archives can be found at:
     >      >     https://lists.cs.wisc.edu/archive/condor-users/
     >      >
     >      >
     >      >
     >      >
> ------------------------------------------------------------------------
     >      >
     >      > _______________________________________________
     >      > Condor-users mailing list
     >      > To unsubscribe, send a message to
     >     condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx>
     >     <mailto:condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx>> with a
     >      > subject: Unsubscribe
     >      > You can also unsubscribe by visiting
     >      > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
     >      >
     >      > The archives can be found at:
     >      > https://lists.cs.wisc.edu/archive/condor-users/
     >
     >     _______________________________________________
     >     Condor-users mailing list
     >     To unsubscribe, send a message to
    condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx>
     >     <mailto:condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx>> with a
     >     subject: Unsubscribe
     >     You can also unsubscribe by visiting
     >     https://lists.cs.wisc.edu/mailman/listinfo/condor-users
     >
     >     The archives can be found at:
     >     https://lists.cs.wisc.edu/archive/condor-users/
     >
     >
     >
     >
    ------------------------------------------------------------------------
     >
     > _______________________________________________
     > Condor-users mailing list
     > To unsubscribe, send a message to
    condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx> with a
     > subject: Unsubscribe
     > You can also unsubscribe by visiting
     > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
     >
     > The archives can be found at:
     > https://lists.cs.wisc.edu/archive/condor-users/
    _______________________________________________
    Condor-users mailing list
    To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx> with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/mailman/listinfo/condor-users

    The archives can be found at:
    https://lists.cs.wisc.edu/archive/condor-users/



------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/