[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] New to Condor

Hrant P. Hratchian wrote:
I'm new to Condor, but my group has gotten Condor working reasonably well for now. We aren't doing anything too sophisticated at this point - we have a couple of clusters of machines dedicated to our group's research and Condor is simply being used to do basic queue-ing.

There are two features that we're still trying to figure out how to employ and I'm hoping that someone here can help:

1. We'd like to have Condor be able to force a job onto a particular node. For example, I'd like to know how to start a job on node3 regardless of the current load average, etc. It would also be useful to nice the other jobs (if any) currently running on the node. In this way we'd like to have a mechanism in place for letting someone run an "emergency" job without having to wait in the queue or waiting for specific resources to free-up.

You're wanting to change the START requirements for the execute nodes. By default, you'll have the following macros scattered through the condor_config:

NonCondorLoadAvg        = (LoadAvg - CondorLoadAvg)
BackgroundLoad          = 0.3

CPUIdle                 = ($(NonCondorLoadAvg) <= $(BackgroundLoad))

##  When is this machine willing to start a job?
START                   = $(UWCS_START)

UWCS_START      = ( (KeyboardIdle > $(StartIdleTime)) \
                    && ( $(CPUIdle) || \
                         (State != "Unclaimed" && State != "Owner")) )

If you want jobs to start regardless of the load on a machine, remove the "$(CPUIdle) " requirement from the start condition. You can also tweak it to allow for a higher background load by adjusting BackgroundLoad to something higher than 0.3. There's not much point in forcing a Condor job to start on a single processor machine with a load average of 20.

2. If we submit multiple jobs to Condor in a short time span Condor will overload machines since the load average hasn't picked up yet from the first job. Is there a way to make Condor check how many actual Condor jobs are running on a particular node?

Unless you've manually set the number of CPUs to something higher than the actual number, Condor will start only one process per virtual machine, and there will be one virtual machine per CPU (except in the case of those lying HT Pentium IVs). Condor shouldn't overload machines, unless the individual processes are forking, running up IOWait, or otherwise causing high loads.

To give you an idea of how we're using Condor, here's a typical submission file. The script gdv-run is a shell script that actually executes the run of our job. The program being used is available on all machines, so all we do is ship the input file to the remote node, run the job, and return the output file(s).

universe = vanilla
executable = gdv-run
transfer_input_files = test.com
should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
arguments = test.com test.log
getenv = true
log = test.clog
requirements = memory >= 2 && LoadAvg <= 0.2

Thanks in advance...Hrant Hratchian

Hrant P. Hratchian, Ph.D.
E. R. Davidson Fellow
Department of Chemistry
Indiana University
Bloomington, Indiana 47405-7102

"Liberty without learning is always in peril; learning without liberty is always in vain."
John F. Kennedy

Condor-users mailing list