[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] -- Failed to fetch ads from: <ip adress> : hostname



Hello, all,

I am a beginner of condor and am really having problem managing our cluster. It is a small cluster with one master-node (as the sever of condor) and 16 compute nodes. We recently disassembled the cluster and moved it to another place, and after we plugged everything back in and turned on all the machines, we found condor was not working. I noticed that since the IP address for the master-node has changed, probably something need to be changed in condor configuration as well. So I opened the "condor_config.local" file on the master-node node, and updated the entry of "NETWORK_INTERFACE". Then I was able to  start condor:

 # ps -ef | grep condor
condor    3639     1  0 Sep03 ?        00:00:11 /opt/condor/sbin/condor_master
condor    3651  3639  0 Sep03 ?        00:00:00 condor_collector -f
condor    3652  3639  0 Sep03 ?        00:00:01 condor_schedd -f
condor    3653  3639  0 Sep03 ?        00:00:00 condor_negotiator -f
root     15130 15111  0 12:55 pts/1    00:00:00 grep condor

But when I type "condor_q", sometimes it returns the queue, but most of the time it returns:

-- Failed to fetch ads from: <ip adress> : hostname

It seems to be very unstable. I have rebooted the master-node once and it did not help. Also jobs in the queue are still idling, they have not been sent to the compute nodes (the system has been on for almost one day now, and I am able to ssh to those nodes). I am not sure if there is anything else I need to change upon the moving, or something went wrong. Any helps? Thanks


Li Xi
Department of Chemical and Biological Engineering
University of Wisconsin-Madison