[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Dynamic memory for SMP
- Date: Wed, 28 May 2008 12:20:08 -0400
- From: "Frédéric Bastien" <nouiz@xxxxxxxxx>
- Subject: Re: [Condor-users] Dynamic memory for SMP
Thanks for your answer. I will add a reference to Target.Memory. But
their is other problems that I try to isolate. I have used in the past
STARTD_SLOT_ATTRS for another purpose and it was working. Now it don't
and I don't understand why. I need to understand as it use the same
mechanism as what I want to use.
I have this in my config:
STARTD_SLOT_ATTRS = IOJob
NBIOJob = ( 0 + (slot1_IOJob=?=True) + (slot2_IOJob=?=True) )
START = (OWNER=="myusername") && ( (Target.IOJob =!=
True) || ( $(NBIOJob) < 1 ) )
If I launch 2 jobs at 1 minutes intervals with IOJob = True in the
submit file, they both get executed at the same time!
do you understand why? I have tried many different thing and none
work. I use condor 7.0.1
On Tue, May 27, 2008 at 7:36 PM, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
>> I have added some rules to have condor behave in a better way on smp
>> machine(8 slots of 1 cores, 1G ram):
>> STARTD_JOB_EXPRS = $(STARTD_JOB_EXPRS), ImageSize
>> TotalMemoryUsed = ( 0 + slot1_ImageSize +
>> slot2_ImageSize + slot3_ImageSize + slot4_ImageSize + slot5_ImageSize
>> + slot6_ImageSize + slot7_ImageSize + slot8_ImageSize )
>> START = $(START) && TotalMemoryUsed < TotalMemory
>> We need this as sometimes we have jobs that need more then 1G
>> or ram and if we let it fill the computer, it will trash too much.
>> What the rules make is that if the current jobs use more then
>> the TotalMemory of the computer, it won't start new jobs
>> event if slot are available. This limit the trashing on the server.
>> But I have one trouble, if one such jobs get killed, it won't
>> restart as the requiment "((Memory * 1024) >= ImageSize)" is
>> false. This requirement is not in the submit file, so I
>> suppose condor add it as some others. What I would like is to
>> replace it by
>> "(((TotalMemory-TotalMemoryUsed) * 1024) >= ImageSize)". So
>> those jobs that are killed can be restarted.
>> Is their a way to do it?
> Cool idea. Let us know how it works out in practice for you.
> Condor needs a reference to TARGET.Memory to appear in the requirements
> expression else it inserts its own rule. I define my own image size
> estimate in my job submissions called AlteraImageSize, which is static
> and doesn't get updated by Condor at job runtime, and then simply
> require that the target machine's memory be greater or equal to that
> value instead of ImageSize. Submit ticket looks like this:
> +AlteraImageSize = 10000
> requirements = (TARGET.Memory >= AlteraImageSize)
> I do the same for disk space requirements as well.
> This ensures that a preempted job, regardless of how long its been
> running, uses the same disk and memory estimates re-negotiating and not
> something extremely low (and comletey false) because it had run for only
> a small amount of time. In your case this expression might need to be
> more complicated so a job that gets booted (do they get booted) that's
> using more than its original ImageSize estimate doesn't re-negotiate
> with a lower-than-seen requirement.
>> p.s. I know the rules I added have a trouble. If the server
>> is empty and the user don't specifie an ImageSize, we will
>> start 8 jobs. To have it work correctly when the server is
>> empty we must have the user estimate the ImageSize needed.
> I personally think this is the correct way: user estimates stay and
> should not be changed by Condor and then you can use the user estimate
> with the Condor calculated value in ImageSize to make better scheduling
> decisions should a job get vacated and have an ImageSize that possibly
> doesn't reflect a good upper bound estimate from the job.
> - Ian
> Confidentiality Notice. This message may contain information that is confidential or otherwise protected from disclosure.
> If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,
> or copying of this message, or any attachments, is strictly prohibited. If you have received this message in error,
> please advise the sender by reply e-mail, and delete the message and any attachments. Thank you.
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> The archives can be found at: