[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] rooster on linux, take 3





On 11/28/11 12:33 PM, Dimitri Maziuk wrote:
On 11/28/2011 09:18 AM, Dan Bradley wrote:

So the next question is how do I figure out what's up with the negotiator?

(E.g.) with 40 cores busy and 4 cores sleeping condor_q -analyze 961082
says:

-- Submitter: minnow.bmrb.wisc.edu :
<144.92.167.254:9617?sock=13250_c2fa_3>  : minnow.bmrb.wisc.edu
---
961082.000:  Run analysis summary.  Of 44 machines,
...
       4 match but are currently offline
       0 are available to run your job
         No successful match recorded.
         Last failed match: Fri Nov 25 18:18:55 2011
         Reason for last match failure: no match found
-----------------------------------------------------

NegotiatorLog (on D_FULLDEBUG) is not very informative as to why the "4
matching but offline" cores are not a "successful match":

11/25/11 18:17:55     Sending SEND_JOB_INFO/eom
11/25/11 18:17:55     Getting reply from schedd ...
11/25/11 18:17:55     Got JOB_INFO command; getting classad/eom
11/25/11 18:17:55     Request 961082.00000:
11/25/11 18:17:55 matchmakingAlgorithm: limit 4.000000 used 0.000000
pieLeft 4.000000
11/25/11 18:17:55       Rejected 961082.0 bbee@xxxxxxxxxxxxx
<144.92.167.254:9617?sock=13250_c2fa_3>: no match found
--------------------------------------------------------




If you add D_JOB and D_MACHINE to NEGOTIATOR_DEBUG, you will get verbose logging of every machine considered by the negotiator when trying to match the job. Is it even considering the offline machine? If so, and if it matches, I would expect the following to be logged by the negotiator:

"Registering attempt to match offline machine <host.name> by <user.name>."

--Dan