Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] flocking: trying to connect directly to the privatenetwork
- Date: Fri, 24 Sep 2004 17:04:21 -0400
- From: "Henrique Bucher" <hbucher@xxxxxxxxxxxxxxx>
- Subject: [Condor-users] flocking: trying to connect directly to the privatenetwork
Hi all
I've just finished installing condor in a cluster (viz), which head node is
vizhead and the nodes are viznode1, viznode2... in a 10.1.1.* private
network
From my submit node (127.39.27.70) I fired a job matching the OS on the viz
cluster so there's no chance to execute anywhere but there.
Problem is, the submit node seems to talk to vizhead and then a match is
made. Up to this point, everything is okay.
However, it seems that the submit node then tries to connect directly to the
cluster nodes, which are in the private network (10.1.1.*) and, of course,
fails.
How to workaround this?
The SchedLog in the vizhead follows:
9/24 16:40:27 DaemonCore: Command received via TCP from host
<127.39.27.70:44768>
9/24 16:40:27 DaemonCore: received command 416 (NEGOTIATE), calling handler
(negotiate)
9/24 16:40:27 Negotiating for owner: hbucher@xxxxxxxxxxxxxxxxxxxxxxxxxx
9/24 16:40:27 Checking consistency running and runnable jobs
9/24 16:40:27 Tables are consistent
9/24 16:40:27 Out of servers - 0 jobs matched, 10 jobs idle, 1 jobs rejected
9/24 16:40:27 Increasing flock level for hbucher@xxxxxxxxxxxxxxxxxxxxxxxxxx
to 1.
9/24 16:40:31 Sent ad to central manager for
hbucher@xxxxxxxxxxxxxxxxxxxxxxxxxx
9/24 16:41:02 DaemonCore: Command received via TCP from host
<127.39.27.155:51294>
9/24 16:41:02 DaemonCore: received command 416 (NEGOTIATE), calling handler
(negotiate)
9/24 16:41:02 Negotiating for owner: hbucher@xxxxxxxxxxxxxxxxxxxxxxxxxx
9/24 16:41:02 Checking consistency running and runnable jobs
9/24 16:41:02 Tables are consistent
9/24 16:41:02 Out of servers - 7 jobs matched, 3 jobs idle, 1 jobs rejected
9/24 16:41:47 select returns 0, connect failed
9/24 16:41:47 Will keep trying for 45 seconds...
9/24 16:41:48 Connect failed for 45 seconds; returning FALSE
9/24 16:41:48 Couldn't send REQUEST_CLAIM to startd at <10.1.1.111:38543>
9/24 16:43:03 Can't connect to <10.1.1.111:38543>:0, errno = 145
9/24 16:43:03 Will keep trying for 10 seconds...
9/24 16:43:04 Connect failed for 10 seconds; returning FALSE
9/24 16:43:04 ERROR:
SECMAN:2003:TCP connection to <10.1.1.111:38543> failed
9/24 16:43:04 Sent RELEASE_CLAIM to startd on <10.1.1.111:38543>
9/24 16:43:04 Match record (<10.1.1.111:38543>, 22, 0) deleted
9/24 16:43:49 select returns 0, connect failed
9/24 16:43:49 Will keep trying for 45 seconds...
9/24 16:43:50 Connect failed for 45 seconds; returning FALSE
9/24 16:43:50 Couldn't send REQUEST_CLAIM to startd at <10.1.1.112:33865>
9/24 16:45:05 Can't connect to <10.1.1.112:33865>:0, errno = 145
9/24 16:45:05 Will keep trying for 10 seconds...
9/24 16:45:06 Connect failed for 10 seconds; returning FALSE
9/24 16:45:06 ERROR:
SECMAN:2003:TCP connection to <10.1.1.112:33865> failed
Thanks!
Henrique