[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] New to Condor

I'm new to Condor, but my group has gotten Condor working reasonably well for now. We aren't doing anything too sophisticated at this point - we have a couple of clusters of machines dedicated to our group's research and Condor is simply being used to do basic queue-ing.

There are two features that we're still trying to figure out how to employ and I'm hoping that someone here can help:

1. We'd like to have Condor be able to force a job onto a particular node. For example, I'd like to know how to start a job on node3 regardless of the current load average, etc. It would also be useful to nice the other jobs (if any) currently running on the node. In this way we'd like to have a mechanism in place for letting someone run an "emergency" job without having to wait in the queue or waiting for specific resources to free-up.

2. If we submit multiple jobs to Condor in a short time span Condor will overload machines since the load average hasn't picked up yet from the first job. Is there a way to make Condor check how many actual Condor jobs are running on a particular node?

To give you an idea of how we're using Condor, here's a typical submission file. The script gdv-run is a shell script that actually executes the run of our job. The program being used is available on all machines, so all we do is ship the input file to the remote node, run the job, and return the output file(s).

universe = vanilla
executable = gdv-run
transfer_input_files = test.com
should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
arguments = test.com test.log
getenv = true
log = test.clog
requirements = memory >= 2 && LoadAvg <= 0.2

Thanks in advance...Hrant Hratchian

Hrant P. Hratchian, Ph.D.
E. R. Davidson Fellow
Department of Chemistry
Indiana University
Bloomington, Indiana 47405-7102

"Liberty without learning is always in peril; learning without liberty is always in vain."
John F. Kennedy