[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] negotiating with schedds when a client has FW



Hello
 
Sure, it sounds logical to stop matchmaking if one job of the whole cluster is rejected. All the other jobs in that cluster have same requirements.
But that is not the problem. The problem is, that you can't say 100% "no match found" for a job if you only try to connect to the first machine that has the highest rank.
Especially if that first machine blocks traffic, you cannot assume that all the other lower ranked machines will also block traffic.
 
I am not sure, if it would be better to try to matchmake every job of a cluster with all machines (in rank order) which comply with the requirements until the first machine is found which has no network problems. The negotiator could get too much work in some extreme cases...
 
Thomas Lisson
NRW-Grid
 
Just to be clear - the two "clusters" here are different things.
NEGOTIATE_ALL_JOBS_IN_CLUSTER means for each job in a job cluster (ie
someone did "queue 50" in their submit file to get a cluster of 50 jobs) the
schedd should send each job in that cluster to the matchmaker. By default,
the schedd makes an optimzation in that it stops matchmaking for a cluster
of jobs the first time one of the jobs are rejected with "No Match Found".
Traditionally, the requirements expressions for each job in a cluster
is the same, so if one is rejected they're all likely to be rejected.

It has nothing to do with the number of computers in your cluster.

-Erik