Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] strange ipaddress problem

Date: Thu, 05 Jun 2008 11:15:20 -0500
From: Dan Bradley <dan@xxxxxxxxxxxx>
Subject: Re: [Condor-users] strange ipaddress problem

Ashutosh,

Here is a link to patched Condor executables that should solve thisproblem. You only need to replace the condor_collector,condor_negotiator, and condor_schedd on your central manager. The restof your pool should not need to be patched.


http://www.cs.wisc.edu/~danb/condor_7.0.2_lehigh/

This Condor build was made from the 7.0.2 pre-release, but the patch Iam giving you may not be in time to make it into 7.0.2. I'll let youknow. I built Condor on CentOS release 4.5 32-bit. Hopefully that iscompatible with your system. If not, I can build it on some other platform.


--Dan

Dan Bradley wrote:

Hi Ashutosh,
This is a bug in Condor. It is affecting your nodes with an IPaddress matching the private address of your central manager plustrailing digits.
I have a patch ready, but I may be too late to sneak it into 7.0.2.I'll send you some patched Condor executables to solve the problem.
Sorry you hit this!

--Dan

Ashutosh Mahajan wrote:
hi all,
we are running a cluster with 600+ cpus. the head node has twointerfaces onefacing the internet (128.180.2.45) and the other a private net(192.168.*.*).users log into this node to submit their jobs. all the other nodesin the
cluster are in the private net.
everything seems fine except for 10 nodes in the cluster. these nodeshaveipaddresses 192.168.1.10 through 192.168.1.19 (and hostnames blaze10through
blaze19). if i do the following on the head node:

[asm4@blaze1 ~]$ condor_status blaze10 -l | grep IpAdd
PublicNetworkIpAddr = "<128.180.2.450:56927>"
StartdIpAddr = "<128.180.2.450:56927>"
PublicNetworkIpAddr = "<128.180.2.450:56927>"
StartdIpAddr = "<128.180.2.450:56927>"
similarly blaze11 shows ipaddress 128.180.2.451 in condor_status onblaze1 and
so on. however, the same command, when used on some other
node, say blaze2 gives:
[asm4@blaze2 ~]$ condor_status blaze10 -l | grep IpAdd
PublicNetworkIpAddr = "<192.168.1.10:56927>"
StartdIpAddr = "<192.168.1.10:56927>"
PublicNetworkIpAddr = "<192.168.1.10:56927>"
StartdIpAddr = "<192.168.1.10:56927>"

which is the correct address.


in NegotiatorLog of the head node i see,
6/4 20:24:32     Request 147588.00000:
6/4 20:24:32     Failed to initiate socket to send MATCH_INFO to
slot2@xxxxxxxxxxxxxxxxxxxxx
6/4 20:24:32 Matched 147588.0 bad0@xxxxxxxxxxxxx<128.180.2.45:45179>
preempting none <128.180.2.450:56927> slot2@xxxxxxxxxxxxxxxxxxxxx
6/4 20:24:32       Successfully matched with slot2@xxxxxxxxxxxxxxxxxxxxx

repeatedly.
i can log into each of these 10 nodes and their ipaddress seems to beset
correctly.
we have 7.0.1 running on all (X86_64-LINUX_RHEL5) nodes
we also have BIND_ALL_INTERFACES set to true because we were trying afew
things with flocking.

any ideas what could be wrong? thanks in advance.
--
regards
Ashutosh Mahajan
http://www.lehigh.edu/~asm4

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxxwith a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:https://lists.cs.wisc.edu/archive/condor-users/

Follow-Ups:
- Re: [Condor-users] strange ipaddress problem
  - From: Ashutosh Mahajan

References:
- [Condor-users] strange ipaddress problem
  - From: Ashutosh Mahajan
- Re: [Condor-users] strange ipaddress problem
  - From: Dan Bradley

Prev by Date: Re: [Condor-users] Running Java program in vanilla universe
Next by Date: Re: [Condor-users] testing condor-G (Non Local Crashed)
Previous by thread: Re: [Condor-users] strange ipaddress problem
Next by thread: Re: [Condor-users] strange ipaddress problem
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] strange ipaddress problem