[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Odd problems with file transfer



Hello,

we have been successfully running a condor pool with a number of machines
for a few months but have been encountering an odd problem that sometimes (but not always) occurs with some new machines recently added to the pool:

The jobs will run & finish ok on the client machines with signal 0, but none
of the output files generated by the job seem to be transferred back to the submitting machine. Only the stdout and stderr are returned fine and indicate that the jobs indeed finished without error. This occurs non consistently - maybe 20% of identical jobs do this - however it happens
only one those particular new machines.
There are no obvious errors in any of the log files on the submit or execute
machine - both appear to be normal. Also the files that are not transferred back definitely exist in the execute directories of the exec. machines.
The file sizes are also not extreme - < 2MB at most.

Has anyone encountered anything similar ? What port/protocol does the
filetransfer use ? We are a bit stuck with this problem since it is
a little random and the logs dont help much.

Thanks for any clues,

 Mike


--------------------------------------------------------------
Michael Tyka, Computational Protein Folding
C.62, Department of Biochemistry,
University of Bristol
http://www.bch.bris.ac.uk/staff/pfdg/mike.htm