[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to see why jobs suspend/continue



On 9/6/2013 11:22 AM, Ralph Finch wrote:
Running HTC 8.0.2 on an all-Windows 7x64 pool. I condor_config set to have
KILL and PREEMPT to FALSE, and use SUSPEND and CONTINUE instead to reduce
load for interactive users. Jobs are suspending and continuing just seconds
apart, and I'd like to examine in more detail what part of my SUSPEND and
CONTINUE expressions are triggering. Is this information available, perhaps
using DEBUG or some sort of condor_status query?


The trick here is the debug() classad function; see

http://research.cs.wisc.edu/htcondor/manual/v8.0/4_1HTCondor_s_ClassAd.html#40732
I cut-n-pasted from the Manual below. My guess is your issue is likely related to load average; in a windows environment, I would suggest setting suspend/continue expressions based solely on keyboard activity.

-Todd

debug(AnyType expression)
This function evaluates its argument, and it returns the result. Thus, it is a no-operation. However, a side-effect of the function is that information about the evaluation is logged to the evaluating program's log file. This is useful for determining why a given ClassAd expression is evaluating the way it does. For example, if a condor_startd START expression is unexpectedly evaluating to UNDEFINED, then wrapping the expression in this debug() function will log information about each component of the expression to the log file, making it easier to understand the expression.