[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Jobs rejected "because of their own requirements"



I just set up a one-machine cluster on a Fedora workstation, using the default package (8.8.10), and this is my first time setting up a condor cluster using roles. I followed the "quick start" guide in the administration part of the manual, setting CentralManager, Exec, and
submit roles, along with password authentication, and everything looks good. It's an 18-core machine with hyperthreading, and 36 slots show up in condor_status. I submitted "sleep.sub" from https://research.cs.wisc.edu/htcondor/manual/quickstart.html, and the job remains Idle. Looks like it's being rejected by the negotiator because "36 reject your job because of their own requirements". That's new for me. I could use some help debugging that.

$ condor_q -better-analyze 2.0


-- Schedd: clh-8842.lab.core : <172.16.8.48:9618?...
The Requirements _expression_ for job 2.000 is

  (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) &&
  (TARGET.HasFileTransfer)

Job 2.000 defines the following attributes:

  DiskUsage = 1
  ImageSize = 1
  RequestDisk = DiskUsage
  RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)

The Requirements _expression_ for job 2.000 reduces to these conditions:

    ÂSlots
Step  ÂMatched ÂCondition
----- Â-------- Â---------
[0] Â Â Â Â Â36 ÂTARGET.Arch == "X86_64"
[1] Â Â Â Â Â36 ÂTARGET.OpSys == "LINUX"
[3] Â Â Â Â Â36 ÂTARGET.Disk >= RequestDisk
[5] Â Â Â Â Â36 ÂTARGET.Memory >= RequestMemory
[7] Â Â Â Â Â36 ÂTARGET.HasFileTransfer

No successful match recorded.
Last failed match: Thu Sep 10 18:03:28 2020

Reason for last match failure: no match found

002.000: ÂRun analysis summary ignoring user priority. Of 36 machines,
   0 are rejected by your job's requirements
  Â36 reject your job because of their own requirements
   0 match and are already running your jobs
   0 match but are serving other users
   0 are able to run your job

WARNING: ÂBe advised:
 ÂJob did not match any machines's constraints
 ÂTo see why, pick a machine that you think should match and add
  Â-reverse -machine <name>
 Âto your query.

For what it's worth, adding "-reverse -machine clh-8842.core.lab" to the query didn't return anything useful.

I'm guessing the problem might be the "undefined" in the RequestMemory attribute, but I'm not sure, and I'm not sure why it's undefined.


Thanks,
Grant
--
Grant Goodyear   Â
web: http://www.grantgoodyear.org Â
e-mail: grant@xxxxxxxxxxxxxxxxx