[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to investigate matchmaking internals?



Hi Thomas,

condor_q allows you to find out why jobs are accepted/rejected by certain slots. See the -better-analyze, -machine and -slotads options.
This is usually enough to figure out why a specific slot does not run jobs.

If you want to peek into the matching of ClassAds in general, the debug function is very helpful:
	AnyType debug(AnyType expression)
	This function evaluates its argument, and it returns the result. Thus, it is a no-operation. However, a side-effect of the function is that information about the evaluation is logged to the evaluating program's log file, at the
	D_FULLDEBUG debug level. This is useful for determining why a given ClassAd expression is evaluating the way it does. For example, if a condor_startd START expression is unexpectedly evaluating to UNDEFINED, then
	wrapping the expression in this debug() function will log information about each component of the expression to the log file, making it easier to understand the expression.
Put it into the START expression of a node, and you can get other tools (like condor_q or the Negotiator) to log every decision about it.
- you can increase the condor_q debug level as needed via the _CONDOR_TOOL_DEBUG environment variable and the -debug flag, e.g. as
	_CONDOR_TOOL_DEBUG=D_ALL condor_q 1801203 -debug
Note that debug() can give you massive amounts of output if you use it in the wrong place.

Cheers,
Max

> Am 15.09.2017 um 10:15 schrieb Thomas Hartmann <thomas.hartmann@xxxxxxx>:
> 
> Hi all,
> 
> is there actually a way to peak into the matchmaking process between a
> node and the negotiator?
> 
> Thing is, that I have a few test nodes, which have all the same configs
> and mostly similar hardware. However, jobs targeted at these nodes start
> only on one of them and that sometimes with quite some delay.
> 
> In some nodes' StartLog I found for some of these jobs(?) (assuming
> correlation in time), that while the Negotiator scheduled a job to a
> slot the node itself rejected the job (although nothing else run and all
> resources were available as far as I saw...)
> 
> On the other hand, for some nodes/times their StartLog is effectively
> empty without signs, that the node was actually considered during
> matchmaking? (-avail/-long stats looked OK to me skimming over the ads)
> 
> So, is there a reasonable way to peak into the Negotiators decision
> making why or why not to consider a node for a given job? As far as I
> see with condor_status one can get only a node/slot's offers, or?
> 
> Cheers and thanks for ideas,
>  Thomas
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME cryptographic signature