[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Generating a current status web page when using Dynamic Slots



(Resending without the image attachments)

Hi Ian,

In CycleServer we report total used CPUs and total running jobs on two separate plots so that things make sense.

I've setup a small test pool to illustrate what I'm talking about. I've got 4 execute nodes. Each with 4 CPUs, but setup with only 1 slot that's partition-able. The relevant configuration settings are:

NUM_CPUS=4
SLOT_TYPE_1 = cpus=100%, ram=100%, swap=100%, disk=100%
NUM_SLOTS_TYPE_1 = 1
SLOT_TYPE_1_PARTITIONABLE = True

I've submitted three types of jobs:

1 CPU jobs
2 CPU jobs
4 CPU jobs

And in that order so that's how they ran.

If you look at this picture (http://bit.ly/mLuFJr) you'll see that we're capturing the dynamic slots as Condor changes slot counts on the execute nodes based on what the jobs are consuming.

This picture (http://bit.ly/jkXYYs) shows that, for the duration of the run, all 16 potential slots in my pool were occupied by jobs (the dips are me removing jobs and generally messing around a bit with stuff).

And this picture (http://bit.ly/lI6eTL) shows the idle and running jobs. It's a bit hard to see but you squint at the green on the bottom of the graph, the running jobs, you can see that they go from 16 -> 8 -> 4. Indicating that the 1, 2 and 4 CPU jobs are using my 4 execute nodes. In retrospect I was a little overzealous with the jobs. I wanted to make sure states changed slowly on the graph so transitions would be easy to see. :)

For correlation, here are samples of the condor_status output at times when the machines were running 1, 2 and 4 CPU jobs.

Potential for the pool comes from the sum of the CPUS value on the machine ads, for all machines in the pool. Cores used in the pool comes from machine ads themselves, it's a sum of the CPUS attribute on all the machines in the Claimed state . At least as of 7.4.4, Condor reports the right number of CPUS for the partitioned slot so you know what the job asked for and what it got. And the running and idle counts come from condor_status -schedd output.

It lines up pretty nicely. And the use data logically tracks the jobs as they request more and more resources from Condor for the dynamic slot they want to claim.

Regards,
- Ian

-- 
Ian Chesal
ichesal@xxxxxxxxxxxxxxxxxx
http://www.cyclecomputing.com/

On Thursday, June 2, 2011 at 11:09 AM, Ian Cottam wrote:

Hi all,

we are very pleased with Condor's Dynamic Slots.

However, it makes generating a web-based current status page rather
difficult.
For example, whilst it is easy to calculate how many cores the pool has in
total, the Claimed total is purely in terms of slots (I.e. we don't know
how many cores have been given to that slot).

Have other people tackled this?
cheers
-Ian
ps. don't bother checking our current status page as we have made no
attempt to update it since we went to dynamic slots, so it is a little
misleading


--
Ian Cottam
IT Services for Research
Faculty of EPS
The University of Manchester




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/