[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] strange ipaddress problem



hi Dan,
thanks for the patch. it would be really helpful if we could have it for
CentOS 5 on 64 bit. let me know if that is possible.

ashutosh

On Thu, 05 Jun 2008, Dan Bradley wrote:

> Ashutosh,
> 
> Here is a link to patched Condor executables that should solve this 
> problem.  You only need to replace the condor_collector, 
> condor_negotiator, and condor_schedd on your central manager.  The rest 
> of your pool should not need to be patched.
> 
> http://www.cs.wisc.edu/~danb/condor_7.0.2_lehigh/
> 
> This Condor build was made from the 7.0.2 pre-release, but the patch I 
> am giving you may not be in time to make it into 7.0.2.  I'll let you 
> know.  I built Condor on CentOS release 4.5 32-bit.  Hopefully that is 
> compatible with your system.  If not, I can build it on some other platform.
> 
> --Dan
> 
> Dan Bradley wrote:
> 
> > Hi Ashutosh,
> >
> > This is a bug in Condor.  It is affecting your nodes with an IP 
> > address matching the private address of your central manager plus 
> > trailing digits.
> >
> > I have a patch ready, but I may be too late to sneak it into 7.0.2.  
> > I'll send you some patched Condor executables to solve the problem.
> >
> > Sorry you hit this!
> >
> > --Dan
> >
> > Ashutosh Mahajan wrote:
> >
> >> hi all,
> >> we are running a cluster with 600+ cpus. the head node has two 
> >> interfaces one
> >> facing the internet (128.180.2.45) and the other a private net 
> >> (192.168.*.*).
> >> users log into this node to submit their jobs.  all the other nodes 
> >> in the
> >> cluster are in the private net.
> >>
> >> everything seems fine except for 10 nodes in the cluster. these nodes 
> >> have
> >> ipaddresses 192.168.1.10 through 192.168.1.19 (and hostnames blaze10 
> >> through
> >> blaze19). if i do the following on the head node:
> >>
> >> [asm4@blaze1 ~]$ condor_status blaze10 -l | grep IpAdd
> >> PublicNetworkIpAddr = "<128.180.2.450:56927>"
> >> StartdIpAddr = "<128.180.2.450:56927>"
> >> PublicNetworkIpAddr = "<128.180.2.450:56927>"
> >> StartdIpAddr = "<128.180.2.450:56927>"
> >>
> >> similarly blaze11 shows ipaddress 128.180.2.451 in condor_status on 
> >> blaze1 and
> >> so on. however, the same command, when used on some other
> >> node, say blaze2 gives:
> >> [asm4@blaze2 ~]$ condor_status blaze10 -l | grep IpAdd
> >> PublicNetworkIpAddr = "<192.168.1.10:56927>"
> >> StartdIpAddr = "<192.168.1.10:56927>"
> >> PublicNetworkIpAddr = "<192.168.1.10:56927>"
> >> StartdIpAddr = "<192.168.1.10:56927>"
> >>
> >> which is the correct address.
> >>
> >>
> >> in NegotiatorLog of the head node i see,
> >> 6/4 20:24:32     Request 147588.00000:
> >> 6/4 20:24:32     Failed to initiate socket to send MATCH_INFO to
> >> slot2@xxxxxxxxxxxxxxxxxxxxx
> >> 6/4 20:24:32       Matched 147588.0 bad0@xxxxxxxxxxxxx 
> >> <128.180.2.45:45179>
> >> preempting none <128.180.2.450:56927> slot2@xxxxxxxxxxxxxxxxxxxxx
> >> 6/4 20:24:32       Successfully matched with slot2@xxxxxxxxxxxxxxxxxxxxx
> >>
> >> repeatedly.
> >> i can log into each of these 10 nodes and their ipaddress seems to be 
> >> set
> >> correctly.
> >> we have 7.0.1 running on all (X86_64-LINUX_RHEL5) nodes
> >>
> >> we also have BIND_ALL_INTERFACES set to true because we were trying a 
> >> few
> >> things with flocking.
> >>
> >> any ideas what could be wrong? thanks in advance.
> >> -- 
> >> regards
> >> Ashutosh Mahajan
> >> http://www.lehigh.edu/~asm4
> >>
> >> _______________________________________________
> >> Condor-users mailing list
> >> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx 
> >> with a
> >> subject: Unsubscribe
> >> You can also unsubscribe by visiting
> >> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >>
> >> The archives can be found at: 
> >> https://lists.cs.wisc.edu/archive/condor-users/
> >>  
> >>
> >
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/

--
regards
Ashutosh Mahajan
http://www.lehigh.edu/~asm4