[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] SchedLog: job submission timed out....what additional firewall rule is needed?



Hi,

I use a Linux master PC.
I have a Windows pool PC (ip = 115.145.228.26 or name = "3-4")
which is in the Unclaimed state.
All are running Condor 7.4.3.

When I submit a Vanilla job, then NegotiatorLog tells me that the match is OK.

The SchedLog has then the following entries:

09/09 12:54:25 (pid:2109) attempt to connect to <115.145.228.26:1042> failed: 
Connection timed out (connect errno = 110).  Will keep trying for 45 total 
seconds (24 to go).
09/09 12:54:50 (pid:2109) attempt to connect to <115.145.228.26:1042> failed: 
Connection timed out (connect errno = 110).
09/09 12:54:50 (pid:2109) Failed to send REQUEST_CLAIM to startd slot1@3-4 
<115.145.228.26:1042> for user@xxxxxxxxxxxxxxx: SECMAN:2003:TCP connection to 
startd slot1@3-4 <115.145.228.26:1042> for user@xxxxxxxxxxxxxxx failed.
09/09 12:54:50 (pid:2109) Match record (slot1@3-4 <115.145.228.26:1042> for 
user@xxxxxxxxxxxxxxx, 247.0) deleted

Apparently the network communication is not working.
Can somebody tell me what communication or firewall rule
is actually missing from these messages in SchedLog?


The (linux) master does get the status info and it can
get the Windows log files with condor_fetchlog.

The firewall on the Windows PC is a commercial Korean product
(V3 from Ahnlab). I have allowed as firewall exceptions:
  condor_dagman.exe
  condor_kbdd.exe
  condor_master.exe
  condor_startd.exe
  condor_starter.exe
  condor_vm-gahp.exe
  condor_preen.exe

It seems that this is not enough to allow full condor communication.....

Thanks.
Rob.