[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] What can hinder condor_startd to set DISK?



Hi Steffen,

I guess
  RESERVED_DISK = 131072
might be the culprit. I just checked in the documentation and the ad is in MB, i.e., the reservation for non-condor stuff of ~131GB would surpass the available 124G on your /var (unfortunately, prefixes/sizes are sometimes a bit inconsistent)

Another thing I noticed - is the execute directory on a dedicated volume? Else
  STARTD_RECOMPUTE_DISK_FREE = false
might be a problem in cases, where /var get filled by other processes (like logs) and the available disk space shrinks for jobs as well.

Cheers,
  Thomas

On 21/04/2023 10.30, Steffen Grunewald wrote:
Good morning,

after setting up HTCondor 10.0.3 on our local cluster, I'm running into
issues related to disk space and requirements.

root@h0402:~# condor_config_val -dump -expand EXECUTE
# Configuration from machine: h0402.hypatia.local

# Parameters with names that match EXECUTE:
ENCRYPT_EXECUTE_DIRECTORY = false
ENCRYPT_EXECUTE_DIRECTORY_FILENAMES = false
EXECUTE = /var/lib/condor/execute
GANGLIAD_PER_EXECUTE_NODE_METRICS = true
LOCAL_UNIV_EXECUTE = /var/lib/condor/spool/local_univ_execute
# Contributing configuration file(s):
#       /etc/condor/condor_config
#       /etc/condor/condor_config_local|
root@h0402:~# df -h /var/lib/condor/execute
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5       125G  455M  124G   1% /var
root@h0402:~# condor_config_val -dump -expand DISK
# Configuration from machine: h0402.hypatia.local

# Parameters with names that match DISK:
CONSUMPTION_DISK = quantize(target.RequestDisk,{1024})
CREATE_LOCKS_ON_LOCAL_DISK = true
FILE_TRANSFER_DISK_LOAD_THROTTLE = 2.0
FILE_TRANSFER_DISK_LOAD_THROTTLE_LONG_HORIZON = 5m
FILE_TRANSFER_DISK_LOAD_THROTTLE_SHORT_HORIZON = 1m
FILE_TRANSFER_DISK_LOAD_THROTTLE_WAIT_BETWEEN_INCREMENTS = 60
JOB_DEFAULT_REQUESTDISK = 131072
LOCAL_DISK_LOCK_DIR =
MODIFY_REQUEST_EXPR_REQUESTDISK = quantize(RequestDisk,{1024})
RESERVED_DISK = 131072
SCHEDD_ROUND_ATTR_DiskUsage = 25%
STARTD_RECOMPUTE_DISK_FREE = false
# Contributing configuration file(s):
#       /etc/condor/condor_config
#       /etc/condor/condor_config_local|
root@h0402:~# condor_status -l `hostname`| grep ^Disk
Disk = 0


Since $(JOB_DEFAULT_REQUEST_DISK) > $(DISK) there's no way to run vanilla
universe jobs.

The manual, under DISK and RESERVED_DISK, suggests that the startd would
determine the amount of available space (of which there's plenty), but
for me obviously it doesn't. Is there a means to find out why?

Thanks, Steffen

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature