[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Bug in starter HOOK_JOB_EXIT terminated immediately



There's a bug in condor_starter (I'm using version 7.8.7) which affects the execution of a HOOK_JOB_EXIT. The bug causes the starter to terminate the hook immediately. Happens in my configuration where the startd is configured to run only one job at a time but will probably happen always if there's just one job running and this job terminates. In this case the starter executes the function ShutdownGraceful in condor_starter.V6.1/baseStarter.cpp
 
The code piece
 
 if (!jobRunning) {
dprintf(D_FULLDEBUG,
"Got ShutdownGraceful when no jobs running.\n");
this->allJobsDone();
return 1;
}
 
is erroneous as it reports that job termination AND hook termination has happened when it returns 1. Returning 1 leads to immediate termination of the condor_starter and kills all running hooks. The correct version reads:
 
 if (!jobRunning) {
dprintf(D_FULLDEBUG,
"Got ShutdownGraceful when no jobs running.\n");
return (this->allJobsDone());
   }
 
allJobsDone will return 0 if some hooks or other tasks are still running.
 
I applied the fix to my version of condor and can confirm that it works.