[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to investigate matchmaking internals?



Hi Max,

yes, that's the problem :(
For example, -better-analyze showed me the five machines to match the
job's requested resources - however, the jobs did not start on the nodes
although the they were empty?!

Cheers,
  Thomas

On 2017-09-15 13:57, Fischer, Max (SCC) wrote:
> Hi Thomas,
> 
> condor_q allows you to find out why jobs are accepted/rejected by certain slots. See the -better-analyze, -machine and -slotads options.
> This is usually enough to figure out why a specific slot does not run jobs.
> 
> If you want to peek into the matching of ClassAds in general, the debug function is very helpful:
> 	AnyType debug(AnyType expression)
> 	This function evaluates its argument, and it returns the result. Thus, it is a no-operation. However, a side-effect of the function is that information about the evaluation is logged to the evaluating program's log file, at the
> 	D_FULLDEBUG debug level. This is useful for determining why a given ClassAd expression is evaluating the way it does. For example, if a condor_startd START expression is unexpectedly evaluating to UNDEFINED, then
> 	wrapping the expression in this debug() function will log information about each component of the expression to the log file, making it easier to understand the expression.
> Put it into the START expression of a node, and you can get other tools (like condor_q or the Negotiator) to log every decision about it.
> - you can increase the condor_q debug level as needed via the _CONDOR_TOOL_DEBUG environment variable and the -debug flag, e.g. as
> 	_CONDOR_TOOL_DEBUG=D_ALL condor_q 1801203 -debug
> Note that debug() can give you massive amounts of output if you use it in the wrong place.
> 
> Cheers,
> Max
> 
>> Am 15.09.2017 um 10:15 schrieb Thomas Hartmann <thomas.hartmann@xxxxxxx>:
>>
>> Hi all,
>>
>> is there actually a way to peak into the matchmaking process between a
>> node and the negotiator?
>>
>> Thing is, that I have a few test nodes, which have all the same configs
>> and mostly similar hardware. However, jobs targeted at these nodes start
>> only on one of them and that sometimes with quite some delay.
>>
>> In some nodes' StartLog I found for some of these jobs(?) (assuming
>> correlation in time), that while the Negotiator scheduled a job to a
>> slot the node itself rejected the job (although nothing else run and all
>> resources were available as far as I saw...)
>>
>> On the other hand, for some nodes/times their StartLog is effectively
>> empty without signs, that the node was actually considered during
>> matchmaking? (-avail/-long stats looked OK to me skimming over the ads)
>>
>> So, is there a reasonable way to peak into the Negotiators decision
>> making why or why not to consider a node for a given job? As far as I
>> see with condor_status one can get only a node/slot's offers, or?
>>
>> Cheers and thanks for ideas,
>>  Thomas
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature