[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Preemption/Killing state forever

Try to log on to the node in question and see if there are
any condor_starter processes with either no child processes
or only zombie/hung processes as children.. if so you
should be able to kill those condor_starter without disrupting
the other daemons.  Failing that you should be
able to start the one node that is hung without affecting other nodes.


On Thu, 30 Sep 2010, Zoran Vitez wrote:

Hi, I've been having trouble with some nodes stuck in a Preempting/Killing
state for a long time. I'm not sure how to proceed to debug the
situation..Restarting the deamons is really not an option unless absolutely
necessary. How would you go about finding out what are the jobs that won't
terminate, or what would you do in this situation?

Thanks for any help,

Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.