[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jobs rejected "because of their own requirements"



Hi,

maybe you need the fqdn for the reverse option:

condor_q -better-analyze 2.0 -reverse -machine <fqdn>

If that does not work try slot<number>@<fqdn> ...

You can always add a memory request to your job submit file using:  request_memory = <quantity> (in MB)


Best
christoph

--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx


Von: g2boojum@xxxxxxxxx
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Freitag, 11. September 2020 01:25:02
Betreff: [HTCondor-users] Jobs rejected "because of their own requirements"

I just set up a one-machine cluster on a Fedora workstation, using the default package (8.8.10), and this is my first time setting up a condor cluster using roles. I followed the "quick start" guide in the administration part of the manual, setting CentralManager, Exec, and
submit roles, along with password authentication, and everything looks good. It's an 18-core machine with hyperthreading, and 36 slots show up in condor_status. I submitted "sleep.sub" from https://research.cs.wisc.edu/htcondor/manual/quickstart.html, and the job remains Idle. Looks like it's being rejected by the negotiator because "36 reject your job because of their own requirements". That's new for me. I could use some help debugging that.

$ condor_q -better-analyze 2.0


-- Schedd: clh-8842.lab.core : <172.16.8.48:9618?...
The Requirements _expression_ for job 2.000 is

    (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) &&
    (TARGET.HasFileTransfer)

Job 2.000 defines the following attributes:

    DiskUsage = 1
    ImageSize = 1
    RequestDisk = DiskUsage
    RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)

The Requirements _expression_ for job 2.000 reduces to these conditions:

         Slots
Step    Matched  Condition
-----  --------  ---------
[0]          36  TARGET.Arch == "X86_64"
[1]          36  TARGET.OpSys == "LINUX"
[3]          36  TARGET.Disk >= RequestDisk
[5]          36  TARGET.Memory >= RequestMemory
[7]          36  TARGET.HasFileTransfer

No successful match recorded.
Last failed match: Thu Sep 10 18:03:28 2020

Reason for last match failure: no match found

002.000:  Run analysis summary ignoring user priority.  Of 36 machines,
      0 are rejected by your job's requirements
     36 reject your job because of their own requirements
      0 match and are already running your jobs
      0 match but are serving other users
      0 are able to run your job

WARNING:  Be advised:
   Job did not match any machines's constraints
   To see why, pick a machine that you think should match and add
     -reverse -machine <name>
   to your query.

For what it's worth, adding "-reverse -machine clh-8842.core.lab" to the query didn't return anything useful.

I'm guessing the problem might be the "undefined" in the RequestMemory attribute, but I'm not sure, and I'm not sure why it's undefined.


Thanks,
Grant
--
Grant Goodyear       
web: http://www.grantgoodyear.org   
e-mail: grant@xxxxxxxxxxxxxxxxx

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/