[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] Negotiator gets stuck



Hi all,

I think the behaviour mentioned in this thread can be changed via
condor_config. In Part 4 of the file, set 

NEGOTIATE_ALL_JOBS_IN_CLUSTER = True

This seems to work for us. From the comments for this macro

##  By default, when the schedd fails to start an idle job, it will
##  not try to start any other idle jobs in the same cluster during
##  that negotiation cycle.  This makes negotiation much more
##  efficient for large job clusters.  However, in some cases other
##  jobs in the cluster can be started even though an earlier job
##  can't.  For example, the jobs' requirements may differ, because of
##  different disk space, memory, or operating system requirements.
##  Or, machines may be willing to run only some jobs in the cluster,
##  because their requirements reference the jobs' virtual memory size
##  or other attribute.  Setting NEGOTIATE_ALL_JOBS_IN_CLUSTER to True
##  will force the schedd to try to start all idle jobs in each
##  negotiation cycle.  This will make negotiation cycles last longer,
##  but it will ensure that all jobs that can be started will be
##  started.

Cheers,
Chris.

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Nick LeRoy
Sent: Saturday, 19 February 2005 04:23
To: condor-users@xxxxxxxxxxx
Subject: Re: [Condor-users] Negotiator gets stuck

On Fri February 18 2005 11:01 am, Andrey Kaliazin wrote:
> Thanks Erik,
Andrey,

> I do not doubt the Negotiator's logic in this case it is perfectly valid.
> But I can see that I did
> not explain the problem I have. Let me try again:
<snip>
> So this is the key point of my problem -
> Negotiator quits the cycle immediately after one communication failure.

Do you have more than one schedd?  If all your jobs are from a single
schedd, 
then, yes, this is exactly what will happen..  The negotiator gets the job
ad 
list from the collector, pulls the first job from it, tries to contact it's 
schedd, fails, and then ignores all jobs from that schedd for the remainder 
of the cycle.  If there are no other schedd's in the pool, that will 
effectively end the negotiation cycle.

Or, is there something else going on?

-Nick

-- 
           <<< The matrix has you. >>>
 /`-_    Nicholas R. LeRoy               The Condor Project
{     }/ http://www.cs.wisc.edu/~nleroy  http://www.cs.wisc.edu/condor
 \    /  nleroy@xxxxxxxxxxx              The University of Wisconsin
 |_*_|   608-265-5761                    Department of Computer Sciences
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users