[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] ERROR: Can't find address of local schedd




My condor pool was down. How to fix it?
After "service condor restart", condor works fine. But after submit a job and the job finished, condor_q shows Error.


ERROR: the submitting host claims to be in our UidDomain (node75), yet its hostname (10.1.1.75) does not match
This error hadn't shown before too.  Add "TRUST_UID_DOMAIN = TRUE"  to confilegure file can solve it, but less security.

condor_q shows Error.

The hosts file :
10.1.1.75 node75

[root@node75 condor]# condor_q
Error:

Extra Info: You probably saw this error because the condor_schedd is not
running on the machine you are trying to query. If the condor_schedd is not
running, the Condor system will not be able to find an address and port to
connect to and satisfy this request. Please make sure the Condor daemons are
running and try again.

Extra Info: If the condor_schedd is running on the machine you are trying to
query and you still see the error, the most likely cause is that you have
setup a personal Condor, you have not defined SCHEDD_NAME in your
condor_config file, and something is wrong with your SCHEDD_ADDRESS_FILE
setting. You must define either or both of those settings in your config
file, or you must use the -name option to condor_q. Please see the Condor
manual for details on SCHEDD_NAME and SCHEDD_ADDRESS_FILE.

[root@node75 condor]# condor_q -name node75
Error: Couldn't contact the condor_collector on 10.1.1.75.

Extra Info: the condor_collector is a process that runs on the central
manager of your Condor pool and collects the status of all the machines and
jobs in the Condor pool. The condor_collector might not be running, it might
be refusing to communicate with you, there might be a network problem, or
there may be some other problem. Check with your system administrator to fix
this problem.

If you are the system administrator, check that the condor_collector is
running on 10.1.1.75, check the ALLOW/DENY configuration in your
condor_config, and check the MasterLog and CollectorLog files in your log
directory for possible clues as to why the condor_collector is not
responding. Also see the Troubleshooting section of the manual.

[root@node75 condor]# ps -ef |grep condor
condor    5988     1  0 16:24 ?        00:00:00 /usr/sbin/condor_master -pidfile /var/run/condor/condor.pid
root      5989  5988  0 16:24 ?        00:00:00 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 107
condor    5990  5988  0 16:24 ?        00:00:00 condor_collector -f
condor    5991  5988  0 16:24 ?        00:00:00 condor_negotiator -f
condor    5992  5988  0 16:24 ?        00:00:00 condor_schedd -f
condor    5993  5988  0 16:24 ?        00:00:00 condor_startd -f
root      6366  5404  0 16:44 pts/4    00:00:00 grep condor

[root@node75 condor]# ll
total 7996
-rw-r--r-- 1 condor condor  101285 Jul 29 15:56 CollectorLog
-rw-r--r-- 1 root   root   7858809 Jul 29 15:14 log20150729.tar
-rw-r--r-- 1 condor condor    4739 Jul 29 15:52 MasterLog
-rw-r--r-- 1 condor condor     349 Jul 29 15:56 MatchLog
-rw-r--r-- 1 condor condor   36365 Jul 29 15:56 NegotiatorLog
-rw-r--r-- 1 root   root     92793 Jul 29 15:56 ProcLog
-rw-r--r-- 1 condor condor   13710 Jul 29 15:56 SchedLog
-rw-r--r-- 1 condor condor    3435 Jul 29 15:56 ShadowLog
-rw-r--r-- 1 condor condor       0 Jul 29 15:15 StarterLog
-rw-r--r-- 1 condor condor    9063 Jul 29 15:56 StarterLog.slot1
-rw-r--r-- 1 condor condor   24819 Jul 29 15:56 StartLog
[root@node75 condor]# less SchedLog 
[root@node75 condor]# LESS StarterLog.slot1 
-bash: LESS: command not found
[root@node75 condor]# less  StarterLog.slot1 
ãããã
07/29/15 15:56:09 (pid:4969) ERROR: the submitting host claims to be in our UidDomain (node75), yet its hostname (10.1.1.75) does not match.  If the above hostname is actually an IP address, Condor could not perform a reverse DNS lookup to convert the IP back into a name.  To solve this problem, you can either correctly configure DNS to allow the reverse lookup, or you can enable TRUST_UID_DOMAIN in your condor configuration.
07/29/15 15:56:09 (pid:4969) Chirp config summary: IO false, Updates false, Delayed updates true.
ãããã

[root@node75 condor]# hostname
node75

[zhang@node75 ~]$ condor_submit condor.submit
ERROR: Can't find address of local schedd