[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] How to tell if a machine is trying to shut down from a job hook?



Ian Chesal wrote:
> I've hit an odd snag in my job hook setup:
> 
> I have machine advertising two slots. One slot is running a really long
> job. The other is empty.
> 
> I tell the machine to shut down peacefully with: condor_off -peaceful
> 
> The problem I'm seeing is that my hook fetch work script for the empty
> slot keeps getting called even though condor_off has been issued *and*
> the State of the slot is Unclaimed. So the hook keeps trying to give the
> slot work, but the slot (presumably because it's been told to shut down
> peacefully and its just waiting for the other slot to finish with its
> work) is rejecting the work.
> 
> I'm trying to figure out how to tell, from my hook script, that my
> machine is being told to shut down.
> 
> I got a machine setup running one job in one slot, then issued
> condor_off -peaceful, and captured the before and after machine add that
> my fetch work hook was being passed.
> 
> It looks like after condor_off -peaceful is called the machine ad passed
> to the fetch work hook script no longer contains a Start variable and
> Requirements has been set to False.
> 
> Does that sound right?
> 
> Any other way to tell the machine is being told to shut down?
> 
> - Ian

I don't think there is an explicit, non-fragile, way to detect a slot
that is shutting down.

It's possible that the startd should add an attribute to slot ads to
reflect the fact that they are shutting down. That would be less fragile
than indirectly checking for Start and Requirements.

It may even be pretty straight forward to do within Resource.cpp in the
startd.

Best,


matt