[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] read error

I am trying to set up a small linux lab to use as condor execution nodes. I can run very small jobs that produce no more than about 10kb of output but anything bigger just seems to hang. In the scheduler log, there is a message of the type,

condor_read(): timeout reading 5 bytes from ...

It would appear that the problem is linked to some networking configuration as I can set up condor on another linux box outside the linux lab and it works fine. The network administrator says the following about the linux lab:

The lab uses big frames for NFS performance, this means
that when a lab machine goes to do a large write to you, it
will send back a potentially 6000 byte or so packet. This isn't
a problem, because the router in front of the lab will
then fragment that packet down to a normal mtu.

I've certainly seen issues with client server software
that doesn't check correctly for a short read in these situations,
as part of the data can be delivered up to a socket before you
get the whole thing.

Could this fragmentation of the packets cause problems for condor?



Masao Fujinaga         

fujinaga@xxxxxxxxxxx    Tel.: (780) 492-2117  Fax.: (780) 492-1729

Research Computing Support

Academic Information and Communication Technologies (AICT)  

University of Alberta, Edmonton, Alberta, CANADA T6G 2H1

This communication is intended for the use of the recipient to which it is addressed, and may
contain confidential, personal, and/or privileged information.  Please contact us immediately 
if you are not the intended recipient of this communication.  If you are not the intended recipient 
of this communication, do not copy, distribute, or take action on it. Any communication received 
in error, or subsequent reply, should be deleted or destroyed