Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] startd job count limit to limit the damage of black holes

Date: Tue, 29 Aug 2017 12:22:02 -0400
From: Wayne Betts <wbetts@xxxxxxx>
Subject: Re: [HTCondor-users] startd job count limit to limit the damage of black holes

Hello Michael,

Thank you for the reply, but I don't see how it is helpful. Before adding individual new machines to the cluster, I set NUM_SLOTS =1, which for my purposes is the same effect as your suggestion. It doesn't stop the machine from rapidly draining the queue if the jobs are failing immediately (though of course it is less rapid, but still rapid nonetheless).

-Wayne

Michael Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx> wrote:

> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of Betts, Wayne
> Sent: Monday, August 28, 2017 4:37 PM
> To: htcondor-users@xxxxxxxxxxx
> Subject: [HTCondor-users] startd job count limit to limit the damage of
> black holes
>
>
> START = (TotalJobsStarted < 2)Â # where TotalJobsStarted is the missing
> piece that I've yet to find, so am seeking your help.

You can make the startd lie to the negotiator about how many CPU cores the machine has via NUM_CPUS in the configuration, or configure the unproven system with a single static whole-machine slot instead of a partitionable slot or collection of static slots.

With that approach, you wouldn't need to alter the start _expression_ at all.

-Michael Pelletier.

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Prev by Date: Re: [HTCondor-users] BOSCO question
Next by Date: Re: [HTCondor-users] startd job count limit to limit the damage of black holes
Previous by thread: Re: [HTCondor-users] startd job count limit to limit the damage of black holes
Next by thread: Re: [HTCondor-users] startd job count limit to limit the damage of black holes
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] startd job count limit to limit the damage of black holes