[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Excluding execute nodes after multiple job failures



Duncan,

Sorry, I misread your question as being from the submitter side, not
the admin side. I think there's a recipe for adding a machine to
HOSTDENY_WRITE (
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToBanMachine )
if jobs fail on it so many times, but I'm struggling to find it. If
someone else can chime in... :)

Jason

On Tue, Feb 6, 2018 at 3:41 PM, Jason Patton <jpatton@xxxxxxxxxxx> wrote:
> Hi Duncan,
>
> Does this htcondor wiki article help?
>
> https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=AvoidingBlackHoles
>
> Jason
>
> On Tue, Feb 6, 2018 at 3:38 PM, Duncan Meacher <duncan.meacher@xxxxxxx> wrote:
>> Hi all,
>>
>> I'm just wondering if there is any way of excluding nodes from the pool of
>> available nodes if a certain number of submitted jobs have failed on the
>> node within a given time. This is something I've experienced a few times,
>> either due to a node missing some packages, or an issue with the node etc.
>> In these cases, jobs submitted to the offending node will fail, and then
>> immediately  be re-submitted to the same node. This can easily results in a
>> larger number of jobs being marked as failed after using all the retrys.
>>
>> Thanks, Duncan
>>
>> --
>> ==========================
>>
>> Duncan Meacher, PhD
>> Postdoctoral Researcher
>> Institute for Gravitation and the Cosmos
>> Department of Physics
>> Pennsylvania State University
>> 104 Davey Lab #040
>> University Park, PA 16802
>> Tel: +1 814 865 3243
>> ==========================
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/