[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor held jobs should retry/release after certain configured timeout automatically



Hi Sridhar,

The configuration seems reasonable.  However, weâd need more context to know if itâs working as expected.

1) Did you run condor_reconfig after changing the configuration?
2) Can you give an example classad of a job you think should be released under this policy?

Thanks,

Brian

On Apr 7, 2015, at 8:40 AM, Sridhar Thumma <deadman.den@xxxxxxxxx> wrote:

Hi,

I added SYSTEM_PERIODIC_RELEASE in my configuration(/etc/condor/config.d/00personal_condor.config). It seems, it is not releasing any jobs. 

Can you please check the configuration and suggest me if anything is wrong?

SYSTEM_PERIODIC_RELEASE =(JobRunCount < 5 && (time() - EnteredCurrentStatus) > 600 )

On Tue, Feb 24, 2015 at 7:56 PM, Ben Cotton <ben.cotton@xxxxxxxxxxxxxxxxxx> wrote:
On Tue, Feb 24, 2015 at 7:43 AM, Sridhar Thumma <deadman.den@xxxxxxxxx> wrote:

> I am using condor grid submit files for launching ec2 instances. Sometimes,
> when condor is trying to launch instances, it is getting
> InstanceLimitExceeded from aws. Due to this, condor jobs are going into held
> state.
>
> Is there way to avoid this scenario?

One solution is to request an limit increase from AWS (this may or may
not be desirable from a business perspective).

> or Do we have any configuration
> variable to retry/release held jobs after certain time period so that It
> will try and see whether able to execute or not?
>
There are several periodic expressions that might help. For example,
periodic_release defines when a job will be released
(SYSTEM_PERIODIC_RELEASE would apply to all jobs). In this case, you
might set a job to release after 10 minutes:

  periodic_release = (CurrentTime - EnteredCurrentStatus > 600)

See the condor_submit man page [1] and the schedd configuration
settings [2] for more details:

[1] http://research.cs.wisc.edu/htcondor/manual/v8.2/condor_submit.html
[2] http://research.cs.wisc.edu/htcondor/manual/v8.2/3_3Configuration.html#SECTION004311000000000000000


Thanks,
BC

--
Ben Cotton
main: 888.292.5320

Cycle Computing
Better Answers. Faster.

http://www.cyclecomputing.com
twitter: @cyclecomputing
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/