[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] How to configure condor to detect node failure and reschedule jobs in 1 minute?



Hi, It seems that condor take long time to determine reschedule jobs on crashed machines, and they'll be in X state when I remove them manually. It is not feel good if there are only 4 machines in pool. 

Condor is very configurable, so how to make it more responsive in this case? 

 Thanks in advance! 

Kyle Qian