[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] RAM allocated to dynamic slots



On Mon, Jul 30, 2012 at 04:01:33PM +0100, Mark Calleja wrote:
> You've just discovered what comes as a rude surprise to many of us.
> Condor does not enforce such resource requests, but rather it gives
> you a way to do so indirectly via a wrapper script: http://research.cs.wisc.edu/condor/manual/v7.6/3_13Setting_Up.html#SECTION0041313000000000000000.

No, I don't think that's it. The actual memory usage of this job is about
520M (it's actually a shell script which runs a pipeline of two other
processes; one uses about 20M and the other about 500M)

Whether the submit file has
  Request_Memory = 1991
or
  Request_Memory = 750
makes no difference to the actual amount of memory *used* by the job,
and so therefore I don't think the amount reported by condor_status bears
any relationship to the *used* memory either.

> You may also want to investigate use of the Starter property
> ENFORCE_CPU_AFFINITY to stop jobs from forking/multi-threading their
> way onto too many cores/processors.

I'm happy in this case for the two processes to sit on two cores, and if
not, I could always set Request_Cpus = 2.

It's the RAM accounting that appears to be off the mark, and I'm trying to
understand, if it's not getting that value from Request_Memory, where is it
actually coming from?

Regards,

Brian.



> 
> Best regards,
> Mark
> 
> On 30/07/2012 15:22, Brian Candler wrote:
> >I wonder if someone can explain the following to me.  I am just getting
> >started with partitioned slots, but I find that if I submit jobs with
> >RequestMemory = 1991 then the apparent memory allocated for the slot is 2688
> >(which is 35% higher than I requested)
> >
> >$ condor_status
> >
> >Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime
> >
> >slot1@dev-storage1 LINUX      X86_64 Unclaimed Idle     1.000  1792  0+00:04:42
> >slot1_1@dev-storag LINUX      X86_64 Claimed   Busy     1.010  2688  0+00:03:00
> >slot1_2@dev-storag LINUX      X86_64 Claimed   Suspende 1.910  2688  0+00:00:04
> >slot1@dev-storage2 LINUX      X86_64 Unclaimed Idle     1.000  1792  0+00:04:42
> >slot1_1@dev-storag LINUX      X86_64 Claimed   Busy     0.940  2688  0+00:03:31
> >slot1_2@dev-storag LINUX      X86_64 Claimed   Busy     1.440  2688  0+00:03:29
> >                      Total Owner Claimed Unclaimed Matched Preempting Backfill
> >         X86_64/LINUX     6     0       4         2       0          0        0
> >                Total     6     0       4         2       0          0        0
> >
> >The slots are defined as follows in /home/condor/condor_config.local:
> >
> ># On dev-storage1: a 2-core CPU with hyperthreading, 8GB RAM
> >SLOT_TYPE_1 = cpus=2, ram=90%, swap=100%, disk=100%
> >SLOT_TYPE_1_PARTITIONABLE = True
> >NUM_SLOTS_TYPE_1 = 1
> >
> ># On dev-storage2: a 4-core CPU without hyperthreading, 8GB RAM
> >SLOT_TYPE_1 = cpus=4, ram=90%, swap=100%, disk=100%
> >SLOT_TYPE_1_PARTITIONABLE = True
> >NUM_SLOTS_TYPE_1 = 1
> >
> >Now, if I retry with the job having RequestMemory = 750 I get an apparent
> >Mem used in the dynamic slot of 896, which is only a 20% premium.
> >condor_status -long also shows Memory=896 and TotalSlotMemory=896 for the
> >used dynamic slots.
> >
> >Is Condor over-estimating the declared memory usage on purpose? If so, where
> >can I look to tweak this down a bit?
> >
> >Thanks,
> >
> >Brian.
> >
> >P.S. In the partitionable slots I see
> >   TotalMemory = 7965
> >   TotalSlotMemory = 7168
> >which correctly reflects the "ram=90%" for the slot.
> >_______________________________________________
> >Condor-users mailing list
> >To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> >subject: Unsubscribe
> >You can also unsubscribe by visiting
> >https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> >The archives can be found at:
> >https://lists.cs.wisc.edu/archive/condor-users/
>