[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Dynamic memory for SMP



Hi,

Thanks for your answer. I will add a reference to Target.Memory. But
their is other problems that I try to isolate. I have used in the past
STARTD_SLOT_ATTRS for another purpose and it was working. Now it don't
and I don't understand why. I need to understand as it use the same
mechanism as what I want to use.

I have this in my config:

STARTD_SLOT_ATTRS       = IOJob
NBIOJob                 = ( 0 + (slot1_IOJob=?=True) + (slot2_IOJob=?=True) )
START                   = (OWNER=="myusername") && ( (Target.IOJob =!=
True) || ( $(NBIOJob) < 1 ) )


If I launch 2 jobs at 1 minutes intervals with IOJob = True in the
submit file, they both get executed at the same time!

do you understand why? I have tried many different thing and none
work. I use condor 7.0.1

thanks

Frederic Bastien

On Tue, May 27, 2008 at 7:36 PM, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
>> I have added some rules to have condor behave in a better way on smp
>> machine(8 slots of 1 cores, 1G ram):
>> STARTD_JOB_EXPRS        = $(STARTD_JOB_EXPRS), ImageSize
>> TotalMemoryUsed               = ( 0 + slot1_ImageSize +
>> slot2_ImageSize + slot3_ImageSize + slot4_ImageSize + slot5_ImageSize
>> + slot6_ImageSize + slot7_ImageSize + slot8_ImageSize )
>> START = $(START) && TotalMemoryUsed < TotalMemory
>>
>> We need this as sometimes we have jobs that need more then 1G
>> or ram and if we let it fill the computer, it will trash too much.
>>
>> What the rules make is that if the current jobs use more then
>> the TotalMemory of the computer, it won't start new jobs
>> event if slot are available. This limit the trashing on the server.
>>
>> But I have one trouble, if one such jobs get killed, it won't
>> restart as the requiment "((Memory * 1024) >= ImageSize)" is
>> false. This requirement is not in the submit file, so I
>> suppose condor add it as some others. What I would like is to
>> replace it by
>> "(((TotalMemory-TotalMemoryUsed) * 1024) >= ImageSize)". So
>> those jobs that are killed can be restarted.
>>
>> Is their a way to do it?
>
> Cool idea. Let us know how it works out in practice for you.
>
> Condor needs a reference to TARGET.Memory to appear in the requirements
> expression else it inserts its own rule. I define my own image size
> estimate in my job submissions called AlteraImageSize, which is static
> and doesn't get updated by Condor at job runtime, and then simply
> require that the target machine's memory be greater or equal to that
> value instead of ImageSize. Submit ticket looks like this:
>
> +AlteraImageSize = 10000
> requirements = (TARGET.Memory >= AlteraImageSize)
>
> I do the same for disk space requirements as well.
>
> This ensures that a preempted job, regardless of how long its been
> running, uses the same disk and memory estimates re-negotiating and not
> something extremely low (and comletey false) because it had run for only
> a small amount of time. In your case this expression might need to be
> more complicated so a job that gets booted (do they get booted) that's
> using more than its original ImageSize estimate doesn't re-negotiate
> with a lower-than-seen requirement.
>
>> p.s. I know the rules I added have a trouble. If the server
>> is empty and the user don't specifie an ImageSize, we will
>> start 8 jobs. To have it work correctly when the server is
>> empty we must have the user estimate the ImageSize needed.
>
> I personally think this is the correct way: user estimates stay and
> should not be changed by Condor and then you can use the user estimate
> with the Condor calculated value in ImageSize to make better scheduling
> decisions should a job get vacated and have an ImageSize that possibly
> doesn't reflect a good upper bound estimate from the job.
>
> - Ian
>
>
> Confidentiality Notice.  This message may contain information that is confidential or otherwise protected from disclosure.
> If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,
> or copying of this message, or any attachments, is strictly prohibited.  If you have received this message in error,
> please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.
>
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>