We have a Condor-6.6.9 cluster, and for no apparent reason no
more than two jobs are able to run on the cluster. I checked the Sched.log file of the master server and I noticed the following
entries within it: Sent ad to central manager for ak791@xxxxxxxxxxxxxx Activity on stashed negotiator socket Negotiating for owner ak791@xxxxxxxxxxxxxx Checking consistency running and runnable
jobs Tables are consistent Out of servers – 0 jobs matched, 8
jobs idle, 8 jobs rejected I then checked the Matchlog file and I had numerous
instances of the following: Rejected 13149.x ak791@xxxxxxxxxxxxxx <
192.168.1.103:59494>: no match found The NegotiatorLog file had the following entries: Request 13149.0000x Rejected
13149.x ak791@xxxxxxxxxxxxxx <192.168.1.103:59494>:
no match found I noticed that the system in question, oneofxeon, has
problems connecting to several of the nodes in the cluster either through SSH
or telnet. Connection attempts fail with the error output being: No route to host. I
verified the /etc/hosts file entries are all correct. Has anyone seen this before, and knows what steps need to be
done to correct it? Thanks. |