[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[condor-users] How to avoid preempting completely?



Hello,

We're operating a 180-node dual-CPU pool under Condor. Since Condor's
checkpointing would not be feasible due to networking bandwidth and disk
space limitations (worst case would be 180GB to be sent through fast
ethernet) we're focusing ourselves to "vanilla" universe.

Yesterday a guest user started a bunch of "standard" jobs, with the
result that a big part of the jobs that had already been running for
days was kicked out ("preempted") and their already accumulated CPU time
was lost :-(

condor_config has

WANT_SUSPEND            = $(TESTINGMODE_WANT_SUSPEND)
WANT_VACATE             = $(TESTINGMODE_WANT_VACATE)
START                   = $(UWCS_START)
SUSPEND                 = $(TESTINGMODE_SUSPEND)
CONTINUE                = $(TESTINGMODE_CONTINUE)
PREEMPT                 = $(TESTINGMODE_PREEMPT)
KILL                    = $(UWCS_KILL)
PERIODIC_CHECKPOINT     = $(TESTINGMODE_PERIODIC_CHECKPOINT)
PREEMPTION_REQUIREMENTS = $(TESTINGMODE_PREEMPTION_REQUIREMENTS)
PREEMPTION_RANK         = $(TESTINGMODE_PREEMPTION_RANK)

where

TESTINGMODE_WANT_SUSPEND        = False
TESTINGMODE_WANT_VACATE         = False
TESTINGMODE_START               = True
TESTINGMODE_SUSPEND             = False
TESTINGMODE_CONTINUE            = True
TESTINGMODE_PREEMPT             = False
TESTINGMODE_KILL                = False
TESTINGMODE_PERIODIC_CHECKPOINT = False
TESTINGMODE_PREEMPTION_REQUIREMENTS = False
TESTINGMODE_PREEMPTION_RANK = 0
 
version is still 6.4.7 (upgraded from 6.3.1 IIRC)

Is there something I have missed, or is my attempt in vain to have some
kind of "never kill a running job" policy?

Regards,
 Steffen

-- 
Steffen Grunewald * * * Merlin cluster admin (http://pandora.aei.mpg.de)
Albert-Einstein-Institut (MPI Gravitationsphysik, http://www.aei.mpg.de)
       Science Park Golm, Am Mühlenberg 1, 14476 Potsdam, Germany
e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>