[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] What is an efficient Condor setting for a Windows pool PC?

We've been running a Condor pool on about 20 Windows machines, 2 to 4
cores (slots), for a few years.

First, be sure to install the very latest stable version of Condor
using the .msi install file, so that Condor properly detects keyboard
and mouse.

StartIdleTime and ContinueIdleTime really are a function of your
interactive and submitting users' patience. Are your machines multiple
core, so that if a submitted job continues running on one core,
interactive users may not notice it? Are your submitted jobs rather
short in running (a few minutes), so any delays in starting annoy the
submitting users? Etc.

We never kill, vacate or preempt jobs in our setup. We delay their
start just a few minutes on interactive use, and suspend them a
maximum of a couple of hours, then resume them. Our jobs typically
take 30mins to a few hours, and there are thousands in a batch, so if
some are delayed in completion somewhat it's no matter.
If $(MaxSuspendTime) == $(ContinueIdleTime) then MaxSuspendTime
doesn't really come into use, so it should be higher than
ContinueIdleTime. Remember, MaxSuspendTime is the TOTAL time a job is
suspended over its life, not its time in a single suspension.  Our
settings are

# time keyboard must be idle to start job
StartIdleTime         = 5 * $(MINUTE)
# max time to allow a job in suspension
MaxSuspendTime  =  2 * $(HOUR)
# if keyboard idle for this time, continue suspended job
ContinueIdleTime   = 5 * $(MINUTE)

But we also set

WANT_SUSPEND                          = TRUE
PREEMPT                                     = FALSE
KILL                                              = FALSE

and since now all our machines are 4-core, we set


so that the interactive user gets at most 2 free cores to use...they
report that with 2 free cores, they do not notice the background
Condor jobs.

On Wed, Sep 7, 2011 at 12:49 AM, Rob <spamrefuse@xxxxxxxxx> wrote:
> Hi,
> I have configured a pool of public library PCs for a Condor network.
> I wonder whether the values are correct. This is what I use:
> # Amount of time in sec the pool pc must be idle before Condor will start a job.
> StartIdleTime    = 5 * $(MINUTE)
> # Amount of time in sec the pool pc must be idle before resumption of a suspended job.
> ContinueIdleTime = 5 * $(MINUTE)
> # Amount of time in sec a job may be suspended before more drastic measure are taken.
> MaxSuspendTime   = 5 * $(MINUTE)
> # Amount of time in sec a job may be checkpointing before we give up and kill it outright.
> MaxVacateTime    = 5 * $(MINUTE)
> Is it wrong to set $(MaxSuspendTime) equal to $(ContinueIdleTime) ?
> Is my understanding correct that $(MaxSuspendTime) should always be larger than $(ContinueIdleTime), because otherwise suspended jobs are always thrown off the machine after 5 minutes....Is that correct?
> Thank you,
> Rob.
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/