[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Limit jobs per node



That's correct - one of our cases was that the simulation software started up a service which listened on a particular TCP port, and wasn't designed to share due to the way in which the real-life system operates.

By setting up a "SimService" machine resource, it allowed the sim, which didn't use much scratch space or an inordinate amount of CPU or memory, to reliably prevent starting more than one run on any given exec node and thereby avoid getting into a shoving match with another run for access to the service, but without having to make a fake claim on other machine resources which other jobs might use.

The details are in section 3.5.8 of the current manual. Basically:

MACHINE_RESOURCE_NAMES = SimService
MACHINE_RESOURCE_SimService = 1

So the submits would have:

Request_SimService = 1

.. and then you'd have only one run on any machine configured with that resource. If there are machines which you don't want to run the sim at all, for example an opportunistic desktop machine, then you can set the SimService resource count to 0 in those configs.

In your situation, using the request_disk approach may be better in the long run - as disk sizes increase, you may wind up with machines which could easily and comfortably run two or more of your large-disk runs, and you could set the Request_Disk submit parameter to a good high-water mark and let the negotiator do its thing.

	-Michael Pelletier.

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Heiko Schroeter
Sent: Wednesday, April 4, 2018 2:58 PM
To: htcondor-users@xxxxxxxxxxx
Subject: [External] Re: [HTCondor-users] Limit jobs per node

> 
> Another way to achieve this is to define a custom machine resource in the pool configuration, such as "OnePerHost" so that a job could do "request_oneperhost = 1" and be the only job running on the system, but that would apply to any job from anyone which requests "oneperhost" rather than only a given group of jobs.
> 
 >    -Michael Pelletier.

If i do understand this correctly then it means that i can run one job per node when requesting this special created Variable ? All other jobs are not limited which do not request it ?

That would be precisly our use case. Only one job per node for the user who is "requesting" it.
We do have lots of small jobs which do not have a large impact on the machines. But when it comes to large model simulations the user is happy if he can restrict his jobs in such a manner to only have one per node because of the I/O limitations.


Best
Heiko
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/