[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] How to make condor auto-reschedule my jobs executing on nodes which are failed because of hardware?



Hi, I found condor would not reschedule my jobs executing on nodes which were failed because of hardware or power. I think there is a way to tell condor do it. Can anyone point it to me?  Or I must monitor the log file and do it by myself?
I also found that if a node shutdown, jobs executing on it would be terminated abnormally by signal 9.
 
I hope that the solution can apply to cluster job. Thanks very much!