[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Misspelled requirement

On 2/3/2022 12:41 PM, Jacek Kominek wrote:
Hi Todd,

It's quite possible the typo was reported (we are running HTCondor 8.9.11), I only got a report about a forever-idle job without further specifics, so it is likely that the user didn't catch or understand it.

Thank you for the response, it clarifies it a bit. What bothers me the most is that it did get processed as a valid condor job requirement (not a variable/macro) even though the resource was non-existent in the system. This is a very particular and limited namespace, from what I know, since you can only request CPUs, GPUs, memory or diskspace. Correct me if I am wrong, but should anything else be flat out rejected in such context?

The "RequestX" namespace is not limited to just CPUs etc, since execute nodes can define their own custom resources (fpgas, database connections, electron microscopes, whatever), and then jobs can request these custom resources with "RequestFPGAs = 1" or whatever.  See the HTCondor Manual for config knobs MACHINE_RESOURCE_<name> and friends.

What you could do at your site, however, is force all users to explicitly specify "RequestCpus" (spelled correctly) in every job, or give an error and refuse to allow the job to be submitted.  One way you could accomplish this is by specifying a default value for RequestCpus that makes no sense, then add a submit requirement that refuses to submit and gives an error message if RequestCpus was not modified.  Here would be an example config snippet you could put in the config file of your submit machine:

SUBMIT_REQUIREMENT_MustSpecifyCpus = RequestCpus != 0
SUBMIT_REQUIREMENT_MustSpecifyCpus_REASON = "You must specify RequestCpus in your job submit file."

After adding the above to your configuration, you must do a condor_reconfig.

Here is how things would look to your users after doing the above:

$ cat test.sub
requestCUPS = 8
executable = /bin/true
hold = true

$ condor_submit test.sub
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: You must specify RequestCpus in your job submit file.

Hope the above helps,


On 2/3/22 11:56, Todd Tannenbaum wrote:
On 2/3/2022 11:48 AM, Jacek Kominek via HTCondor-users wrote:
Hi all,

A user in our cluster submitted a job with a typo in its requirements: requestCUPS rather than requestCPUS. Rather than erroring out, the requirement was treated as valid and the job was forever stuck in Idle (since we have no cups in our cluster). Is this the expected behavior? Normally, if there are some errors/typos with the classads or variables the scheduler is pretty good at catching them and reporting shadow exceptions etc. I wonder if the resource requests are treated differently?

Hi Jacek,

Given that submit files can define custom macro names, it is a bit challenging to detect typos like the above.  However, upon job submission, the user most definitely should have received a prominent warning telling them they may have a typo in their submit file.  Did that warning not appear on your installation?

Here is what I see when I tried reproducing what you described above:

$ cat test.sub
requestCUPS = 8
executable = /bin/true

$ condor_submit test.sub
Submitting job(s).
1 job(s) submitted to cluster 2.
WARNING: the line 'requestCUPS = 8' was unused by condor_submit. Is it a typo?

Todd Tannenbaum <tannenba@xxxxxxxxxxx>  University of Wisconsin-Madison
Center for High Throughput Computing    Department of Computer Sciences
Calendar: https://tinyurl.com/yd55mtgd  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                   Madison, WI 53706-1685