[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Remote Submit Fails To Spool Job Files



Hi Frank,

I've seen something similar but haven't dug in yet - I think some incompatibility snuck into the file transfer protocol when it was upgraded in the 8.1.x series.  Not sure what, however.

Brian

On May 23, 2014, at 6:37 PM, Frank Berghaus <frank@xxxxxxx> wrote:

Hi All,

My current set up is a remote machine submitting jobs to a central manger where the jobs are sent to worker nodes. Recently the remote machines was upgraded from condor 8.0.6 to condor version 8.1.5. With version 8.1.5 the jobs submitted by the remote machine show up on the central manager as holding for a few seconds, for example:

113423.0 apf 5/23 20:42 Spooling input data files
113423.1 apf 5/23 20:42 Spooling input data files
113423.2 apf 5/23 20:42 Spooling input data files
113423.3 apf 5/23 20:42 Spooling input data files
113423.4 apf 5/23 20:42 Spooling input data files
113423.5 apf 5/23 20:42 Spooling input data files

After a few seconds the jobs are removed. I can see corresponding error messages on the remote submitter:

DCSchedd::spoolJobFiles:7002:File transfer failed for target job 113423.0: Failed to receive GoAhead message from <central manager's IP>.

The central manager is running condor version 8.0.3. Is there a configuration variable hidden somewhere that may be causing this issue? Is this something that an upgrade to a later stable condor version (on the side of the central manager) would likely solve?

Best Regards,
-Frank



--
----------
Frank Berghaus
University of Victoria
Research Associate
Physics & Astronomy
UVic Phone: +1 (250) 721-7741
UVic Office: Elliot 212
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/