[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Disk space consumability



On 11/16/2019 12:52 PM, Oliver Freyermuth wrote:
> Hi together,
> 
> we are running into issues with some jobs requiring a lot of disk space, making our execute directories overflow.
> Those jobs are requesting the necessary disk space via Request_Disk correctly, but the problem arises when multiple of these jobs arrive on a single node (via partitionable slots)
> since HTCondor does not regard disk space as consumable (even though it is consumed, of course).
> 
> Does somebody have a good solution at hand for this issue? Is there a hidden knob to make disk space consumable?
> 
> Cheers,
> 	Oliver
> 

Hi Oliver,

What version of HTCondor are you using?

Not sure what you mean by "HTCondor does not regard disk space as consumable...", since at least for me with HTCondor v8.8+ with partitionable slots, when a dslot is created with Disk=X, then the partitionable slot has its Disk attribute reduced by X.

In other words, on my laptop with ~250GB of free disk space, when I submit the following job 10 jobs, only one job will run at a time as you would hope:
   
   executable = c:\utils\sleep.exe
   arguments = 30
   transfer_executable = false                   
   request_cpus = 1                              
   request_memory = 20                           
   request_disk = 200GB                          
   queue 10                                      

And periodically running condor_status I see the Disk space in the pslot decrease as expected when the dslot is created:

Î condor_status -server
Name                   OpSys       Arch   LoadAv Memory   Disk      
slot1@TODDS480S        WINDOWS     X86_64  0.000   16217  244047488

[then once a job is running]

Î condor_status -server
Name                   OpSys       Arch   LoadAv Memory   Disk      
slot1@TODDS480S        WINDOWS     X86_64  0.000    16089  34166649
slot1_1@TODDS480S      WINDOWS     X86_64  0.000      128 209880840

It looks like you will want to be running HTCondor v8.6.11 or newer for this to work
properly with partitionable slots, and make sure you did not redefine the 
condor_config knob STARTD_RECOMPUTE_DISK_FREE away from its default value of false.

Some developer wisdom/notes on all this is at
  https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6301
and the derived tickets #6424 and #6614.
   
Hope the above helps
Todd