I've investigated more into the matter of the rescheduling of jobs after an execution node has died, and although it appears to be working, it's taking too long. If I shutdown an execute node with a job running on it, and then restart it, it takes two hours for condor to remove the failed job (until that point Condor thinks it's still running) and reschedule it (sometimes to run on the same node, which was unclaimed since the restart). I searched the manual, but I can't seem to find where to configure this two hour delay. Can someone please point me in the right direction? Thank you,
Share your memories online with anyone you want anyone you want.