Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_defrag only some machines?

Date: Fri, 13 Jan 2017 15:53:13 +0000
From: Michael Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx>
Subject: Re: [HTCondor-users] condor_defrag only some machines?

Mike,

I also turned off preemption in my pools - I had a situation in the first two weeks of my HTCondor existence where jobs were sloshing back and forth because I didn't really understand the rank expressions - half the jobs would get preempted at 75% complete, then the other half of the jobs would get to 75% complete and then get preempted by the first half, leading to zero goodput.

However, one pool in particular continues to struggle with the large-job starvation issue. They're managing it manually at the moment, since it's a small group of users ("Fry! Pizza goin' out! COME ON!!"), but I've put some thought into the issue and have come up with a few ideas, one of which I'm hoping I can present at this year's HTCondor Week.

One thing you may consider is setting aside certain machines which outright reject non-whole-machine jobs to keep them available for the large ones. You could set a machine requirement that the job must request at least a certain number of CPUs, for example, to be allowed to match to the machine. You could apply that requirement on a schedule so the machine would go into that mode overnight or on the weekends, and thus would let the small jobs drain out peacefully without eviction. You might even have them go into that mode depending on the state of the job queue, for that matter.

	-Michael Pelletier.

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Michael Di Domenico
Sent: Friday, January 13, 2017 8:48 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] condor_defrag only some machines?

i've not.  we turned off all preemption sometime ago because it was wreaking havoc with our users.  condor was fine, but the users were getting very displeased to see their jobs preempted like 2 mins before the job was to finish.  i'm sure there's probably some tweaking and user training that might correct this, but i'm not sure i can stomach that again

Follow-Ups:
- Re: [HTCondor-users] condor_defrag only some machines?
  - From: Michael Di Domenico

References:
- [HTCondor-users] condor_defrag only some machines?
  - From: Michael Di Domenico
- Re: [HTCondor-users] condor_defrag only some machines?
  - From: Michael Di Domenico
- Re: [HTCondor-users] condor_defrag only some machines?
  - From: Michael Pelletier
- Re: [HTCondor-users] condor_defrag only some machines?
  - From: Michael Di Domenico

Prev by Date: Re: [HTCondor-users] condor_defrag only some machines?
Next by Date: [HTCondor-users] 22nd ACM International Conference on Intelligent User Interfaces (IUI 2017): Student Travel Grants
Previous by thread: Re: [HTCondor-users] condor_defrag only some machines?
Next by thread: Re: [HTCondor-users] condor_defrag only some machines?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] condor_defrag only some machines?