[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to see why jobs suspend/continue



On 9/6/2013 2:00 PM, Ralph Finch wrote:
I misunderstand the implementation. I have in each machine's condor_config
file:

SUSPEND = $(MachineBusy)
junk = debug($(MachineBusy))

[snip]
How should I implement the debug(expression) statement?


I believe you want to do

  SUSPEND = debug( $(MachineBusy) )

and then look in the StartLog file (not StarterLog*).

And is the load average calculation on windows machines unreliable? The
reason I need it is, we all sometimes run long-running (hours) numerical
models with no interactive use, so testing only keyboard and console is
insufficient to prevent my HTC jobs from interfering with the machine
owner's use.


Understood.

I just played around with it a bit. I think the load average calculation is pretty good , but what looks wonky to me is the assignment of load out to different slots.

Assuming you started from the default condor_config, I think if you change

   NonCondorLoadAvg	= (LoadAvg - CondorLoadAvg)

to instead be

   NonCondorLoadAvg	= (TotalLoadAvg - TotalCondorLoadAvg)

I think you will get results much along the lines of what you were hoping for.

(of course do not forget to do a condor_reconfig as usual after changing the config file)

regards,
Todd