[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Antwort: Re: -- Failed to fetch ads from: <ip adress> : hostname




Hi,
       I am also new in condor, and I am not sure what actually you changed, why do you only change the IP in the condor_config.local, I would suggest checking the condor_config file, especially the condor_host, HOSTALLOW_READ,HOSTALLOW_WRITE AND HOSTALLOW_CONFIG, you may spend a little time to read through the config file.
   And also, I would suggest you to search the relative information in google.... the condor resources online are powerful.  
PS: your character format is not reading comfortable....I just want to help, as so much I got from the others in this group:)


Li Xi <sealyxi@xxxxxxxxx>
Gesendet von: condor-users-bounces@xxxxxxxxxxx

09/07/2009 07:20 PM

Bitte antworten an
Condor-Users Mail List <condor-users@xxxxxxxxxxx>

An
Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Kopie
Thema
Re: [Condor-users] -- Failed to fetch ads from: <ip adress> :        hostname





Some follow-up information of the same problem. I checked the MasterLog and found entries like the following repeating every 8 minutes:

9/7 12:07:58 Can't connect to <old_ip:old_port>:0, errno = 110
9/7 12:07:58 Will keep trying for 10 seconds...
9/7 12:07:59 Connect failed for 10 seconds; returning FALSE
9/7 12:07:59 ERROR: SECMAN:2003:TCP connection to <old_ip:old_port> failed

where old_ip is the IP address of the master-node before the cluster was moved. Similar record was found in NegotiatorLog and  SchedLog. Apparently I still need to change the IP information somewhere else other than the one mentioned in the previous email, but I cannot figure it out...

 
Li Xi
Department of Chemical and Biological Engineering
University of Wisconsin-Madison
E-mail:sealyxi@xxxxxxxxx



From: Li Xi <sealyxi@xxxxxxxxx>
To:
condor-users@xxxxxxxxxxx
Sent:
Friday, September 4, 2009 1:13:57 PM
Subject:
[Condor-users] -- Failed to fetch ads from: <ip adress> : hostname


Hello, all,

I am a beginner of condor and am really having problem managing our cluster. It is a small cluster with one master-node (as the sever of condor) and 16 compute nodes. We recently disassembled the cluster and moved it to another place, and after we plugged everything back in and turned on all the machines, we found condor was not working. I noticed that since the IP address for the master-node has changed, probably something need to be changed in condor configuration as well. So I opened the "condor_config.local" file on the master-node node, and updated the entry of "NETWORK_INTERFACE". Then I was able to  start condor:

 # ps -ef | grep condor
condor    3639     1  0 Sep03 ?        00:00:11 /opt/condor/sbin/condor_master
condor    3651  3639  0 Sep03 ?        00:00:00 condor_collector -f
condor    3652  3639  0 Sep03 ?        00:00:01 condor_schedd -f
condor    3653  3639  0 Sep03 ?        00:00:00 condor_negotiator -f
root     15130 15111  0 12:55 pts/1    00:00:00 grep condor

But when I type "condor_q", sometimes it returns the queue, but most of the time it returns:

-- Failed to fetch ads from: <ip adress> : hostname

It seems to be very unstable. I have rebooted the master-node once and it did not help. Also jobs in the queue are still idling, they have not been sent to the compute nodes (the system has been on for almost one day now, and I am able to ssh to those nodes). I am not sure if there is anything else I need to change upon the moving, or something went wrong. Any helps? Thanks


Li Xi
Department of Chemical and Biological Engineering
University of Wisconsin-Madison
E-mail:sealyxi@xxxxxxxxx


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/