[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] 7.8 and defrag or similar of dynamic slots



On 18/05/2012 20:04, "Dan Bradley" <dan@xxxxxxxxxxxx> wrote:

>
>
>On 5/18/12 12:19 PM, Ian Cottam wrote:
>> We are thinking about updating to 7.8.0.
>> I noticed that there is, with 7.8, a defrag daemon for dynamic slots.
>> On our (main) pool we have preemption off anyway: am I right in thinking
>> that this defrag then is not for us?
>
>Defragmentation is desirable when jobs requiring large slots (e.g. many
>cores or big memory) suffer from starvation (rarely or never getting
>scheduled to run) due to fragmented machines.  Machines become
>fragmented when they are partitioned into small slots to fit small
>jobs.  If many small jobs are running on a machine at the same time, the
>chance is small that they will all exit at the same time, freeing up a
>large chunk of resources for large jobs to use.  The Condor negotiator's
>resource allocation algorithm currently just works with the slots that
>exist.  It does not make reservations or preempt multiple slots, so some
>method of defragmenting machines is needed to avoid the problem of
>starvation of large jobs.
>
>Defragmentation can cause jobs to be killed.  If you do not want that,
>MaxJobRetirementTime can be used to specify how long jobs should be
>allowed to run on machines that are being drained.
>
>>
>> I only ask because sometimes (with 7.4/7.6) and dynamic slots we see
>> partial matches that don't go through and wondered if there was
>>something
>> in 7.8 that helps with this.
>
>If by "partial matches that don't go through" you mean the starvation
>problem I mentioned above, then condor_degrag can help.  If it is some
>other problem, then it may or may not.
>
>--Dan


What we have is jobs that Match but never start.
We have just demonstrated that if we move the Memory requirement from the
Requirements line to a Request_memory=n line, they work.
We are not entirely sure why.
-Ian










>