[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Questions/Comments on dynamic slot for SMP computer



Hi,

I just found recently about the dynamic slot in condor. This is something that I wanted for some time and I'm happy to find it. Also, the current limitation that it have a high probability to starve jobs that ask for more ressource can be easily avoided if you put only a port of the pool as partionable. You can find more information at http://www.cs.wisc.edu/condor/manual/v7.4/3_13Setting_Up.html#SECTION004139900000000000000

My questions first then my comment:
1) After 15 minutes that a job run in a dynamic slot, the SIZE column of condor_q get updated to the size used by the jobs. I have in my configuration file this:
STARTER_UPDATE_INTERVAL=60
TOUCH_LOG_INTERVAL         = 60
MASTER_UPDATE_INTERVAL     = 60
UPDATE_INTERVAL            = 60
SCHEDD_INTERVAL            = 60

What else do I need to have it updated more frequently as each minutes or 5 minutes?
2) I send a job on a dynamic slot. After 15 minutes, condor_q -l give
ImageSize_RAW = 216592
ImageSize = 220000

but top give 100M used, 105M virtual. Why their is such a big difference? In another case I have:
ImageSize_RAW = 626192
ImageSize = 700000

but top give 500M used, 505M virtual. So their seam to be  around ~100M difference. Where this could come from? Can I do something about this?

Here is a few addition to make the doc at the gived link more usefull. All this can be found by experimentaiton, but this take time and will same time to other person who would like to use it too.
1) give the unit of request_memory (Meg)
2) tell about DynamicSlot and PartitionableSlot. can link to their definition elsewhere.
3) tell that request_memory won't affect non partionalble slot(so we need to put it in requirements too). Could this be done automatically as to only need to set request_* and not change the requirements?
4) what if request_cpus is not set? Default to 1?
5) Tell that if a job use more memory then what was requested, we will only remove the amount requestion from the partionable slot. It would be better that we remove the max of the two as user and bug can cause swap and this would make it less trouble some as a swapping compute host won't start new jobs.

Thanks for all your work for this feature. I still have to upgrade my main condor pool to be able to use it. But I should do this shortly.

Frédéric Bastien