[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] idle job and "Request has not yet been considered by the matchmaker"

On 10/15/2016 9:43 PM, Francisco Pereira wrote:

In /var/log/condor/MatchLog, I see

"Matched <ID> <user> <IP for head:53694?addrs=IP for head-53694>
preempting node2<IP for node:13698?address=IP for node2-13698>

Both of these messages recur every minute or so.

On node2, only MASTER and STARTD are running, and neither of the
respective logs show any mention of this job (using tail -f to track at
the moment of submission.

/etc/condor/condor_config is precisely the same between node1 and node2.
The only difference between them is that, despite having the same domain
in their FQDN (netA.netB.netC) the actual subnets are different ( node 2
is in ip2.ipB.ipC, whereas node1 and the head node are in ip1.ipB.ipC).
/etc/hosts contains <IP> <name> <FQDN> for all three machines in each
one of them.

Hi Francisco,

Skimming you post, it looks like the job is being matched to the slot, but the schedd on the submit machine is unable to claim the machine. Just a quick thought - maybe this due to your HTCondor authorization settings. Do you see any permission denied messages in the node2 StartLog (i.e. grep -i "permission" StartLog)? Perhaps you are missing one of the subnets in the config knobs ALLOW_WRITE or HOSTALLOW_WRITE. If you are using FQDN names (i.e. *.wisc.edu) in your [HOST]ALLOW_WRITE, be aware that the proper way to list your /etc/hosts on linux is "<IP> <FQDN> <name>", not "<IP> <name> <FQDN>". See https://is.gd/yXyiDG for a discussion. Most of the time it doesn't matter if DNS is in use, but maybe it is causing you grief; HTCondor is pretty sensitive to how IPs are mapped back to FQDNs.

Another thought is perhaps there is an issue preempting a previous job on node2 - do you still have problems running on node2 even when node2 is completely idle?

hope the above helps,