[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] job exit hook timout

i'm running a job exit hook to trap exit codes from my jobs.  the
behavior i see is that the exit hook is run and completes up to a
certain point.

This point seems to be fairly random.  My script does a bunch of
things, but where it tends to get stuck is that i'm using
condor_config_val to change some of the slot level classads on a
machine and then reconfig the startd

however, as i iterate over the slots i might get 1, 1-2, 1-5, etc and
it fails to run the reconfig most of the time

my question is, is there a watchdog mechanism that kills the script it
if runs too long?  the script can run for several seconds as it
updates the classads, but it seems it always just stops as if it's
being killed