[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Condor held jobs should retry/release after certain configured timeout automatically



Hi,

I am using condor grid submit files for launching ec2 instances. Sometimes, when condor is trying to launch instances, it is gettingÂInstanceLimitExceededÂfrom aws. Due to this, condor jobs are going into held state.Â

Is there way to avoid this scenario? or Do we have any configuration variable to retry/release held jobs after certain time period so that It will try and see whether able to execute or not?Â

Please share any info related to this.. That would be great..