[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Fwd: How to see why jobs suspend/continue



messages from the debug() classad function will be of the form

date time Classad debug:  [N.Nms] _expression_ -> value

The log fragment shown below doesn't have any messages of that form.  do you have D_FULLDEBUG in the logging level of STARTD_DEBUG ?

On 9/7/2013 7:05 PM, Ralph Finch wrote:

Right, that looks obvious now. I tried it but get no more information:
StartLog:
09/06/13 17:03:33 ** condor_startd.exe (CONDOR_STARTD) STARTING UP <===restart everything
09/06/13 17:03:33 ** C:\Condor\bin\condor_startd.exe
...
09/06/13 17:25:57 slot2: State change: SUSPEND is TRUE
09/06/13 17:25:57 slot2: Changing activity: Busy -> Suspended
09/06/13 17:26:02 slot1: State change: SUSPEND is TRUE
09/06/13 17:26:02 slot1: Changing activity: Busy -> Suspended
09/06/13 17:26:02 slot2: State change: CONTINUE is TRUE
09/06/13 17:26:02 slot2: Changing activity: Suspended -> Busy
09/06/13 17:26:07 slot4: State change: SUSPEND is TRUE
09/06/13 17:26:07 slot4: Changing activity: Busy -> Suspended
09/06/13 17:26:07 slot1: State change: CONTINUE is TRUE
09/06/13 17:26:07 slot1: Changing activity: Suspended -> Busy
09/06/13 17:26:12 slot3: State change: SUSPEND is TRUE
09/06/13 17:26:12 slot3: Changing activity: Busy -> Suspended
09/06/13 17:26:12 slot4: State change: CONTINUE is TRUE
09/06/13 17:26:12 slot4: Changing activity: Suspended -> Busy
09/06/13 17:26:17 slot2: State change: SUSPEND is TRUE
09/06/13 17:26:17 slot2: Changing activity: Busy -> Suspended

etc.

and the condor_config:
MachineBusy        = ($(CPUBusy) || $(KeyboardBusy) || $(ConsoleBusy))
MachineNotBusy        = ($(CPUIdle) && $(KeyboardNotBusy) && $(ConsoleNotBusy))
# each test defined in default portion of condor_config
SUSPEND        = debug($(MachineBusy))
CONTINUE     = debug($(MachineNotBusy))



On Fri, Sep 6, 2013 at 2:56 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
On 9/6/2013 2:00 PM, Ralph Finch wrote:
I misunderstand the implementation. I have in each machine's condor_config
file:

SUSPEND = $(MachineBusy)
junk = debug($(MachineBusy))

[snip]

How should I implement the debug(_expression_) statement?


I believe you want to do

  SUSPEND = debug( $(MachineBusy) )

and then look in the StartLog file (not StarterLog*).


And is the load average calculation on windows machines unreliable? The
reason I need it is, we all sometimes run long-running (hours) numerical
models with no interactive use, so testing only keyboard and console is
insufficient to prevent my HTC jobs from interfering with the machine
owner's use.


Understood.

I just played around with it a bit. I think the load average calculation is pretty good , but what looks wonky to me is the assignment of load out to different slots.

Assuming you started from the default condor_config, I think if you change

   NonCondorLoadAvg     = (LoadAvg - CondorLoadAvg)

to instead be

   NonCondorLoadAvg     = (TotalLoadAvg - TotalCondorLoadAvg)

I think you will get results much along the lines of what you were hoping for.

(of course do not forget to do a condor_reconfig as usual after changing the config file)

regards,
Todd





_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/