[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] adding service slot
- Date: Wed, 15 Feb 2012 18:47:16 -0600 (CST)
- From: Marco Mambelli <marco@xxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] adding service slot
Thank you for both the suggestion and the pointers.
I'm in an environment where one or few users run most of the jobs and
condor is used as queue manager for linux clusters.
I was thinking more to "users service jobs" so that users could collect
monitoring or debug information about other jobs or install user software
or upload small files. Without administrator rights and while regular jobs
Anyway your observation made me think that there may be better tools like
cluster monitoring tools or condor_ssh_to_job
I'll make some tests.
On Wed, 15 Feb 2012, Ian Chesal wrote:
On Wednesday, 15 February, 2012 at 4:24 PM, Marco Mambelli wrote:
I'd like to configure my Condor pool with an additional job slot for
E.g. 4 core machines will have 4 job slots plus a "service slot" that
allows to run only service jobs (I can identify them with a classad) so
that these jobs do not have to wait in the queue
Service jobs should be considered not requiring any resource and let run
in parallel to other jobs on the machine.
Any suggestion on how I should setup this?
Can I talk you out of it? :)
Condor is an awesome way to run user jobs. It's not a great way to run administrative tasks against loads of machines. The two prominent problems are:
1. You don't know what you hit and what you missed. Condor's collector database isn't static. Machines come and go. If you run your administrative job when half your machines are off line you'll need some way to remember which machines you missed so, when they come back online, they get caught up on administrative changes.
2. Your jobs don't run as administrator accounts. On Linux they don't run as root. And on Windows they don't run as an account in the Administrators group. At least, not without some finagling and the changes can leave your systems open to some abuse.
Those are the big two reasons not to do administrative tasks through Condor. There are more, but those two seem big enough to me.
I recommend looking at a tool specifically intended for configuration management of your machines. We're big fan's of OpsCode's chef platform (http://www.opscode.com/chef/) here at Cycle. It's proven to be a very scalable and robust configuration management and deployment tool.
If I can't talk you out of it, the quick gist of what you need to do is to create two slot types: one type gets basically no resources on your machine. Since every slot needs at least one CPU you may even want to consider faking the number of CPUs in the box because Condor won't let you assign more CPUs than it detects in the machine. So if you have a 4-CPU box, you'd do:
NUM_CPUS = 1
SLOT_TYPE_1 = cpus=1, ram=1, swap=1, disk=1
SLOT_TYPE_2 = cpus=1, ram=auto, swap=auto, disk=auto
NUM_SLOTS_TYPE_1 = 1
NUM_SLOTS_TYPE_2 = 4
And now you need per-slot policies to control what runs where. For example:
START = ((SlotId == 1) && (IsAdminJob == True)) || ((SlotId != 1))
Would let non-admin jobs run always in slots 2-5 and only admin jobs on to slot 1.
For details see: http://research.cs.wisc.edu/condor/manual/v7.6/3_13Setting_Up.html#sec:SMP-Divide
I have to admit, I'm not even sure how well Condor will deal if you carve off a tiny amount of disk and ram and swap for a slot like that and then tell it to auto divide up the remainder. Might work well, might not.
Travel cautiously down this path.
Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx (mailto:condor-users-request@xxxxxxxxxxx) with a
You can also unsubscribe by visiting
The archives can be found at: