[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs being preempted with default rank settings



To update some info on my own question.

I think what's going on is the standard dynamic user priority. I had a number of jobs running, so the new user evicted some of my jobs to run hers.

While I could see this being a cool feature, it kind of sucks for us since we're losing 6 hours of computation when this happens.

The thing that seems broken is that there should be enough resources for everything to get matched. We have 128 unclaimed slots. My jobs take up 23, and the next users jobs take up 44 slots.

However, instead of co-existing peacefully, her 44 slots are evicting my jobs. Has anyone seen anything like this?

The slots are all on a cluster of identical 1U's, so there shouldn't be any preferential matching.

Also, we're using really simple submit ClassAds (our only requirement is FileSystemDomain so we wind up on the cluster...)

David Dynerman wrote:
Hey there,

I was wondering if anyone's encountered problems with jobs of equal rank preempting eachother.

Our condor pool is running mostly the default configuration files. We don't have RANK set on our execute nodes (it's commented out by default)

The behavior I'm noticing is that once we have enough jobs to match multiple jobs to one node, existing running jobs are being preempted by new jobs. The odd thing is that we haven't set any preferences - these jobs are both from internal lab users, and we have all the default settinings.

This is in the StartLog of the execute node that evicted a job:

7/17 14:33:27 slot2: match_info called
7/17 14:33:29 slot2: Preempting claim has correct ClaimId.
7/17 14:33:29 slot2: New claim has sufficient rank, preempting current claim.
7/17 14:33:29 slot2: State change: preempting claim based on user priority
7/17 14:33:29 slot2: State change: claim retirement ended/expired
7/17 14:33:29 slot2: Changing state and activity: Claimed/Busy -> Preempting/Vacating

Does anyone know what might be going on? Do long-running jobs by default become "evictable" after a certain timeframe?

We're running in vanilla, so evictions really suck since everything has to start over...

David
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/