[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Match-making delays/errors in Condor-G




Todd may be able to correct me or explain why, but my guess is that you can work around this problem with the following configuration setting for the negotiator:

NEGOTIATOR_MATCHLIST_CACHING = False

--Dan

Jan Ploski wrote:

Hello,

I have a problem with match-making not working properly/being too slow in Condor 7.0.0. The scenario is as follows: I submit 120 Condor-G jobs together from a script. They all begin in status Idle (as expected). I then repeatedly invoke condor_q to observe changes in JobStatus and LastRejMatchReason of all jobs as time progresses. What I see is that in the increasing order of ClusterIds every jobs hits LastRejMatchReason == "no match found" once. When this happens, the match-making cycle is immediately aborted, so that all jobs with a greater ClusterId remain Idle. In one of the following cycles the job which previously had trouble is matched correctly, but then the job with ClusterId+1 runs into the same "no match found" problem. So it takes at least as many match-making cycles as there are jobs in the queue to get every one matched to the target resource (which is the same for all jobs and does not constraint the accepted job count in any way). Here is an excerpt from NegotiatorLog:

4/25 09:53:28     Getting reply from schedd ...
4/25 09:53:28     Got JOB_INFO command; getting classad/eom
4/25 09:53:28     Request 20658.00000:
4/25 09:53:28 Attempting to use cached MatchList: Failed (MatchList length: 0, Autocluster: 0, Schedd Name: jploski@xxxxxxxxxxxxxxxxxxxxxx, Schedd Address: <134.106.52.210:20346>) 4/25 09:53:28 Rejected 20658.0 jploski@xxxxxxxxxxxxxxxxxxxxxx <134.106.52.210:20346>: no match found
4/25 09:53:28     Sending SEND_JOB_INFO/eom
4/25 09:53:28     Getting reply from schedd ...
4/25 09:53:28     Got NO_MORE_JOBS;  done negotiating
4/25 09:53:28 Schedd jploski@xxxxxxxxxxxxxxxxxxxxxx got all it wants; removing it.

Can anyone explain why this is happening?

Regards,
Jan Ploski

--
Dipl.-Inform. (FH) Jan Ploski
OFFIS
Betriebliches Informationsmanagement
Escherweg 2  - 26121 Oldenburg - Germany
Fon: +49 441 9722 - 184 Fax: +49 441 9722 - 202
E-Mail: Jan.Ploski@xxxxxxxx - URL: http://www.offis.de
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/