[HTCondor-users] idle job and "Request has not yet been considered by the matchmaker"

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

hello,

I am trying to submit a job to a specific machine in my test pool. Both the code and the submit file have been tested with other nodes, and the only difference this time is

Requirements = (Machine == "node2.netA.netB.netC")

When running condor_q I see that it is listed as I(dle). With

condor_q -analyze <job ID>

I see

Â Â <ID>: Request has not yet been considered by the matchmaker.

Â Â (...)

Â Â <ID>: Run analysis summary. Of 23 machines,

Â Â 13 are rejected by your job's requirements

Â Â Â0 reject your job because of their own requirements

Â Â Â0 match but are serving other users

Â Â Â10 are available to run your job

and indeed there are 10 slots in node2.netA.netB.netC (the remaining 13 are in the head node and another node). The only suggestion is to remove the machine specific requirement.

Looking at Â/var/log/condor/NegotiatorLog, I see

"Successfully matched with slot@xxxxxxxxxxxxxxxxxxxx"

In /var/log/condor/MatchLog, I see

"Matched <ID> <user> <IP for head:53694?addrs=IP for head-53694> preempting node2<IP for node:13698?address=IP for node2-13698> slot1@xxxxxxxxxxxxxxxxxxx

Both of these messages recur every minute or so.

On node2, only MASTER and STARTD are running, and neither of the respective logs show any mention of this job (using tail -f to track at the moment of submission.

/etc/condor/condor_config is precisely the same between node1 and node2. The only difference between them is that, despite having the same domain in their FQDN (netA.netB.netC) the actual subnets are different ( node 2 is in ip2.ipB.ipC, whereas node1 and the head node are in ip1.ipB.ipC). /etc/hosts contains <IP> <name> <FQDN> for all three machines in each one of them. In all condor_configs, I use

FILESYSTEM_DOMAIN = netA.netB.netC

UID_DOMAIN = netA.netB.netC

DEFAULT_DOMAIN_NAME = netA.netB.netC

TRUST_UID_DOMAIN = netA.netB.netC

SOFT_UID_DOMAIN = netA.netB.netC

TRUST_UID_DOMAIN = TRUE

STARTER_ALLOW_RUNAS_OWNER = TRUE

All nodes use home directories exported from the head node via NFS, and have matching UIDs and GIDs.

Might someone have encountered this situation? Given the absence of any relevant information on the logs of node2 I am at a loss as to how to proceed...

thank you for any help!

Francisco

Mailing List Archives

Public Access

[HTCondor-users] idle job and "Request has not yet been considered by the matchmaker"