[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] negotiation weirdness

The negotiation process can be a subtle process to debug. The condor negotiator creates matches between schedulers and machines. These matches mean that a slot will be claimed by a scheduler. This claim will span multiple job executions, so for efficiency reasons, when the slot is done with a job, it requests another job from the scheduler with the same significant attributes.

In this case, I suspect that the other Sergey jobs are finishing, and the new ones are started because the machine is still "Claimed" by the sergey scheduler. You can specify how long this claim will stay in effect using the "CLAIM_WORKLIFE" setting in 6.8.*. The default is -1, and will thus cause *all* the sergey jobs to finish executing. If you set it to say, 1 second, then the first job to execute should finish executing (presumably longer than 1 second) and the claim will be released for a new match-making cycle.

Good luck, I believe this is the issue, and let me know how this works out for you.



Jason A. Stowe

Phone: 607.227.9686

Cycle Computing, LLC
Enterprise Condor Support

On 10/31/07, Grant Goodyear < grant@xxxxxxxxxxxxxxxxx> wrote:
> I'm seeing somewhat strange results in job negotiation/scheduling.
> We're running a small (~60-node) condor cluster on a dozen or so windows
> boxes.  One box (crossroads) is the central manager (submit,manage), and
> the rest are all dedicated submit,execute machines with preemption
> turned off.  (The node config can be seen in
> http://www.grantgoodyear.org/~grant/condorlogs/condor_config.txt )
> When one user submits a large number of jobs, we're seeing his jobs get
> scheduled despite the fact that other users have better priorities.
> Here's a 10-minute view of what's running and the user priorities:
> Oct. 30, 10:40am
> http://www.grantgoodyear.org/~grant/condorlogs/running_200710301040.txt
> http://www.grantgoodyear.org/~grant/condorlogs/priorities_200710301040.txt
> Oct. 30, 10:50am
> http://www.grantgoodyear.org/~grant/condorlogs/running_200710301050.txt
> http://www.grantgoodyear.org/~grant/condorlogs/priorities_200710301050.txt
> We script the submission files, and use group accounting, so even though
> all jobs have the same owner, all of the jobs run from c:\sergey have
> +AccountingGroup = "sergey" set, the c:\jgalford jobs are in the
> "jgalford" group, and the c:\ljacobson job is in the "ljacobson" group.
> At 10:40, sergey has an effective priority of 9.57, jobs 52800-52877
> (submitted on crossroads) are running, and jobs 52878-53481 (crossroads)
> are waiting.  Group ljacobson has job 270 (submitted from littleboy)
> running, and nothing waiting in the queue.  His priority is 0.51, but
> since he has nothing waiting it doesn't matter.  Group jgalford has job 498
> (submitted from fatman) running, jobs 483-487 (submitted from
> greenhouse) running, and jobs 499-514 (submitted from fatman) waiting.
> The jgalford effective priority is 3.66.
> So, if I understand the way the negotiation process works, the waiting
> jobs should be sorted so that the jgalford job 499 (fatman) should be
> the next job chosen when a resource frees up, and that would be followed
> by 500 (fatman), ....
> At 10:50, sergey jobs 52800-52808 (crossroads) have finished, and now
> sergey jobs 52809-52904 (crossroads) are running.  No new jgalford
> jobs have started, despite the lower effective priority.
> I've included the crossroads log files
> ( http://www.grantgoodyear.org/~grant/condorlogs/) for this time
> period.  I'm not seeing anything in the logs that explains this
> behavior, but I'm hoping somebody else has better insight.
> I'm thoroughly confused.
> Help?
> Thanks,
> Grant Goodyear
> --
> Grant Goodyear
> web: http://www.grantgoodyear.org
> e-mail: grant@xxxxxxxxxxxxxxxxx
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/