Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] flocking: trying to connect directly to the privatenetwork

Date: Fri, 24 Sep 2004 17:04:21 -0400
From: "Henrique Bucher" <hbucher@xxxxxxxxxxxxxxx>
Subject: [Condor-users] flocking: trying to connect directly to the privatenetwork

Hi all

I've just finished installing condor in a cluster (viz), which head node is vizhead and the nodes are viznode1, viznode2... in a 10.1.1.* private network

From my submit node (127.39.27.70) I fired a job matching the OS on the viz

cluster so there's no chance to execute anywhere but there.

Problem is, the submit node seems to talk to vizhead and then a match is made. Up to this point, everything is okay.

However, it seems that the submit node then tries to connect directly to the cluster nodes, which are in the private network (10.1.1.*) and, of course, fails.

How to workaround this?

The SchedLog in the vizhead follows:

9/24 16:40:27 DaemonCore: Command received via TCP from host <127.39.27.70:44768> 9/24 16:40:27 DaemonCore: received command 416 (NEGOTIATE), calling handler (negotiate) 9/24 16:40:27 Negotiating for owner: hbucher@xxxxxxxxxxxxxxxxxxxxxxxxxx 9/24 16:40:27 Checking consistency running and runnable jobs 9/24 16:40:27 Tables are consistent 9/24 16:40:27 Out of servers - 0 jobs matched, 10 jobs idle, 1 jobs rejected 9/24 16:40:27 Increasing flock level for hbucher@xxxxxxxxxxxxxxxxxxxxxxxxxx to 1. 9/24 16:40:31 Sent ad to central manager for hbucher@xxxxxxxxxxxxxxxxxxxxxxxxxx 9/24 16:41:02 DaemonCore: Command received via TCP from host <127.39.27.155:51294> 9/24 16:41:02 DaemonCore: received command 416 (NEGOTIATE), calling handler (negotiate) 9/24 16:41:02 Negotiating for owner: hbucher@xxxxxxxxxxxxxxxxxxxxxxxxxx 9/24 16:41:02 Checking consistency running and runnable jobs 9/24 16:41:02 Tables are consistent 9/24 16:41:02 Out of servers - 7 jobs matched, 3 jobs idle, 1 jobs rejected 9/24 16:41:47 select returns 0, connect failed 9/24 16:41:47 Will keep trying for 45 seconds... 9/24 16:41:48 Connect failed for 45 seconds; returning FALSE 9/24 16:41:48 Couldn't send REQUEST_CLAIM to startd at <10.1.1.111:38543> 9/24 16:43:03 Can't connect to <10.1.1.111:38543>:0, errno = 145 9/24 16:43:03 Will keep trying for 10 seconds... 9/24 16:43:04 Connect failed for 10 seconds; returning FALSE 9/24 16:43:04 ERROR: SECMAN:2003:TCP connection to <10.1.1.111:38543> failed

9/24 16:43:04 Sent RELEASE_CLAIM to startd on <10.1.1.111:38543>
9/24 16:43:04 Match record (<10.1.1.111:38543>, 22, 0) deleted
9/24 16:43:49 select returns 0, connect failed
9/24 16:43:49 Will keep trying for 45 seconds...
9/24 16:43:50 Connect failed for 45 seconds; returning FALSE
9/24 16:43:50 Couldn't send REQUEST_CLAIM to startd at <10.1.1.112:33865>
9/24 16:45:05 Can't connect to <10.1.1.112:33865>:0, errno = 145
9/24 16:45:05 Will keep trying for 10 seconds...
9/24 16:45:06 Connect failed for 10 seconds; returning FALSE
9/24 16:45:06 ERROR:
SECMAN:2003:TCP connection to <10.1.1.112:33865> failed

Thanks!

Henrique

Prev by Date: RE: [condor-users] Can't reconfigure nodes remotely
Next by Date: Re: [Condor-users] flocking: trying to connect directly to theprivate network
Previous by thread: RE: [condor-users] Can't reconfigure nodes remotely
Next by thread: Re: [Condor-users] flocking: trying to connect directly to theprivate network
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] flocking: trying to connect directly to the privatenetwork