[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Excluding execute nodes after multiple job failures



Hi Jason,

Yes, that should cover all the issues I've been having. Thanks!

Duncan

On 6 February 2018 at 16:41, Jason Patton <jpatton@xxxxxxxxxxx> wrote:
Hi Duncan,

Does this htcondor wiki article help?

https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=AvoidingBlackHoles

Jason

On Tue, Feb 6, 2018 at 3:38 PM, Duncan Meacher <duncan.meacher@xxxxxxx> wrote:
> Hi all,
>
> I'm just wondering if there is any way of excluding nodes from the pool of
> available nodes if a certain number of submitted jobs have failed on the
> node within a given time. This is something I've experienced a few times,
> either due to a node missing some packages, or an issue with the node etc.
> In these cases, jobs submitted to the offending node will fail, and then
> immediately be re-submitted to the same node. This can easily results in a
> larger number of jobs being marked as failed after using all the retrys.
>
> Thanks, Duncan
>
> --
> ==========================
>
> Duncan Meacher, PhD
> Postdoctoral Researcher
> Institute for Gravitation and the Cosmos
> Department of Physics
> Pennsylvania State University
> 104 Davey Lab #040
> University Park, PA 16802
> Tel: +1 814 865 3243
> ==========================
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
==========================

Duncan Meacher, PhD
Postdoctoral Researcher
Institute for Gravitation and the Cosmos
Department of Physics
Pennsylvania State University
104 Davey Lab #040
University Park, PA 16802
Tel: +1 814 865 3243
==========================