[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Possible to have submit-implemented per-machine job limits?



On 9/12/2016 12:31 PM, Michael V Pelletier wrote:
Hi folks,

We have a situation where a certain type of job has an adjunct service
process which can only have on instance on a given machine, since it uses
a static port number to provide its service to the job. It can't easily be
reworked since it's designed to operate that way in a production
environment. This means that one physical machine can only run one
instance of that job.


Maybe you could run multiple instances of this adjunct job on one physical host by using a job universe that virtualizes the network environment (i.e. docker universe? vm universe?) ?

I know I can set up a machine resource in the configuration for this
purpose, assigning one "myservice" resource to each machine, and this
would allow the job to specify "request_myservice = 1" and thus limit to
one job per machine.

What I'm wondering is if it's possible to use something in the job's
requirements expression alone to accomplish this, rather than a
server-side config customization. I'm using partitionable slots - I
suspect that fact may make this a tricky problem to solve without startd
configuration changes, because the partitionable slot would probably need
information about what the dynamic slots are doing.


Doing what you want via setting up a custom machine resource (i.e. request_port777 = 1) is exactly what I'd suggest; scenarios like the above are why custom machine resources exist, since this really is a custom machine resource. For instance, what if two different users both have an app that requires the same static slot?

But given that you cannot configure the execute nodes, perhaps your job requirements could look at the ChildRemoteUser attribute in the partitionable slot? This attribute is a classad list of all the owners of dynamic slots on the machine. You could probably leverage this so only one job submitted by you runs on each machine...

regards
Todd


One similar thing I've done in the past was to steer jobs which could
share a license checkout on the same machine by making a "condor_q" query
from the script and turning it into a rank expression to favor machines
already running that user's licensed jobs, but that requires, needless to
say, a submit wrapper script which I'd like to avoid.

I've also used SubmitterUserResourcesInUse, but that applies to the entire
pool rather than to a single machine.

Maybe there's some sort of trick in the new 8.4 submit syntax that could
be applied here?

Thanks for any suggestions you can offer!

        -Michael Pelletier.
_



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685