[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] gateway



We are having problems getting jobs submitted from a linux submit host to a windows lab behind a gateway.  On the windows machine, we have errors in the starter log as follows:

0/3 19:10:35 Communicating with shadow <129.128.125.15:37473>
10/3 19:10:35 Submitting machine is "opteron-cluster.nic.ualberta.ca"
10/3 19:12:34 condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <129.128.125.15:55548>.
10/3 19:12:34 ERROR "Assertion ERROR on (result)" at line 113 in file ..\src\condor_starter.V6.1\NTsenders.C
10/3 19:12:34 ERROR "LocalUserLog::logStarterError() called before init()" at line 205 in file ..\src\condor_starter.V6.1\local_user_log.C

On the submit node, in the shadow log,

0/3 19:16:58 Initializing a VANILLA shadow for job 85.0
10/3 19:17:18 (85.0) (13769): condor_read(): timeout reading 5 bytes from <129.128.237.81:1050>.
10/3 19:17:18 (85.0) (13769): Request to run on <129.128.237.81:1050> was ACCEPTED
10/3 19:18:06 (85.0) (13769): condor_read(): timeout reading 5 bytes from <129.128.237.81:1050>.
10/3 19:19:16 (85.0) (13769): condor_read(): recv() returned -1, errno = 104, assuming failure reading 5 bytes from unknown source.
10/3 19:19:16 (85.0) (13769): ERROR "Can no longer talk to condor_starter <129.128.237.81:1050>" at line 123 in file NTreceivers.C

We have put in holes in the gateway so that there is communication between the lab and the submit host and the  central manager. We can ping between these machines without any problems and the collector gathers information about the available machines. However, there is something special about the submit-execute communication that seems to be blocked by the gateway. If the gateway is opened up, everything works fine.
Is there anything we can change to condor or to the gateway to make this work?

Thanks for your time.

Masao



--

Masao Fujinaga         

fujinaga@xxxxxxxxxxx    Tel.: (780) 492-2117  Fax.: (780) 492-1729

Research Computing Support

Academic Information and Communication Technologies (AICT)  

University of Alberta, Edmonton, Alberta, CANADA T6G 2H1


This communication is intended for the use of the recipient to which it is addressed, and may
contain confidential, personal, and/or privileged information.  Please contact us immediately 
if you are not the intended recipient of this communication.  If you are not the intended recipient 
of this communication, do not copy, distribute, or take action on it. Any communication received 
in error, or subsequent reply, should be deleted or destroyed