Wow, thanks for the tip on NEGOTIATOR_READ_CONFIG_BEFORE_CYCLE! I had thought about trying to tweak concurrency limits in the config but wound up stymied by my lack of awareness of that parameter. Itâs funny, apparently thatâs been available since I started using HTCondor 7.8 in 2013 and Iâd never noticed it, or at least noticed it sufficiently to recognize its usefulness.
I like the fact that it meshes with the concurrency limits which weâd want people to use for licenses anywayâ it actually seems like itâs the âcorrectâ way to handle license allocations rather than a startd attribute. I had been puzzling over how often to update the machine ad with the license counts to be sure to catch the negotiation cycle, which is not necessary with this read-config parameter.
It looks like when youâre tweaking the available license count concurrency limit, youâd have to omit licenses checked out to HTCondor jobs, otherwise theyâd be double-counted â do you recognize these with hostname or some such, or just add the current-running concurrency limit count to the available count?
Concurrency_limits = volume_1:5242880
Youâd define the limit using kilobytes to match the units of request_disk, and this job would claim 5GB out of the volume_1 space. This value would probably be lower than the request_disk number for jobs which use scratch space and output transfers â some jobs we run have a scratch space high-water mark thatâs nearly double the final output. If output is compressed before output-transfer, thatâd also reduce the concurrency limit request.
Iâve got my pool-wide configuration set up in a shared NFS path in /home/condor/config/config.d, so itâd be easy enough to drop the limit data in there, but I suppose the only place it would be needed is on the CM server, so it could just go into /etc/condor/config.d â is that the approach you take? Thereâs no need for any other daemons to look at <NAME>_LIMIT values, right?
I think in most of our pools weâd have a good bead on what the target volumes will be â most of them donât have a wide variety of users and so the output typically goes to the same big volume every time, so weâre okay there. Maybe in your case you could cook up a submit _expression_ that looked at the Iwd or output directory of the job with a regexp, and if it spotted a covered filesystem it would apply the limit:
Concurrency_limit = ifThenElse(regexp(Iwd, â^/volume_1â), concat(âvolume_1:â, $(request_disk)), ââ)
Probably not really practical (or syntactically correct), but at least itâs an interesting example of the power of ClassAd expressions. J
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
On Behalf Of Edward Labao
For our farm we do something similar for floating licenses (e.g. flexLM or sesi for Houdini) in that we have an external process polling license servers. In our studio, licenses can be used both on the farm
and off the farm where condor can't track it so it's a little more involved that just parsing for the total available license, but basically we come up with a number of how many licenses are either in use or available on the farm and write that to a condor
config file on the negotiator host.
nuke_LIMIT = 1000
maya_fluid_sim_LIMIT = 200
These basically set up concurrency limits for our licenses. Jobs that will need to use a particular license specify them in their submission description files with a line like:
ConcurrencyLimit = nuke
When licenses get used outside of the farm, we adjust the values written to the 99_license_limit file. For example, if we know that 20 of our maya_fluid_sim licenses are being used outside of HTCondor, we update the config file with:
maya_fluid_sim_LIMIT = 180
There's a configuration parameter called NEGOTIATOR_READ_CONFIG_BEFORE_CYCLE that makes the negotiator reread the configuration files before each negotiation cycle so it will have the latest (for some definition of "latest") license limit values before doing any match-making.
This may be overkill for you license situation, but it seems like this could probably be used for your file server throttling. We need something similar for throttling our NFS servers.
Create a limiter for each filer like:
volume_1_LIMIT = 99999
volume_2_LIMIT = 99999
Under normal circumstances, the value is set to a number higher than the total number of job slots on your farm. When your external script detects that the filer is at capacity or otherwise overloaded, update the values to 1 (I don't remember if 0 is a valid value or not). This prevents any new jobs requiring the filer limits from starting.
Full disclosure, however, we didn't use this for very long because most users had no idea what filers their jobs would access at run time, but maybe you'll have better luck.
In any case, it sounds like you've already got an alternate solution, but just wanted to share what we did for a similar problem.
On Wed, Jan 4, 2017 at 4:11 PM, Michael Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx> wrote: