[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] What can hinder condor_startd to set DISK?



On Fri, 2023-04-21 at 11:18:17 +0200, Steffen Grunewald wrote:
> Hi,
> 
> On Fri, 2023-04-21 at 11:10:20 +0200, Thomas Hartmann wrote:
> > Hi Steffen,
> > 
> > I guess
> >   RESERVED_DISK = 131072
> > might be the culprit. I just checked in the documentation and the ad is in
> > MB, i.e., the reservation for non-condor stuff of ~131GB would surpass the
> > available 124G on your /var (unfortunately, prefixes/sizes are sometimes a
> > bit inconsistent)
> 
> Hm, my printed copy of the manual (10.0.0) must be wrong then. It has "(in kB)"
> for both DISK and RESERVED_DISK - while the unit for RESERVED_SWAP is "MiB".
> 
> Also, matching is done by comparing TARGET.RequestDisk and DISK without any
> unit conversions, so the JOB_DEFAULT_REQUESTDISK would be affected as well?
> 
> Previously I had a very low setting - which I'll restore now.
> 
> > 
> > Another thing I noticed - is the execute directory on a dedicated volume?
> > Else
> >   STARTD_RECOMPUTE_DISK_FREE = false
> > might be a problem in cases, where /var get filled by other processes (like
> > logs) and the available disk space shrinks for jobs as well.
> 
> Since my partitionable slot gets only 75% of the total disk I'm not worried
> about that, and there will be a watchdog checking for disk (partition)
> shortages.
> 
> Thanks so far, I'll report about the outcome,

.... and here it is, from a different node though.
I have set
	RESERVED_DISK = 128
and the /var filesystem reports 129177320 kB free.
>From "condor_status -l ... | grep Disk" I get
	TotalDisk = 129046416
	Disk = 96784812
- the latter being exactly 75% of the total space, as configured.
The difference between the free capacity and the TotalDIsk value is 130904,
which is close to 131072 (but not identical), meaning that RESERVED_DISK
is indeed multiplied by 1024 to get MB (the same as RESERVED_SWAP), and the
entry in my 10.0.0 manual (subsection 4.5.1, p.209) is wrong - but has been
fixed in the online version for 10.0.3. Lesson learned...
(BTW a negative value would have given me a wink to look closer...)

I'm now trying to get a grip on =?=/=!= expressions and a means to extend
MOUNT_UNDER_SCRATCH (for the latter, "$(MOUNT_UNDER_SCRATCH),/something/else"
will produce unexpected results), but the major issue is fixed it seems.

Thanks for your suggestions!

- Steffen

-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~