[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Problem with job submission with large input



Michael:


I don't know of any per-byte limit on file transfer, but there is a timeout when submit talks to the schedd -- could you be hitting this? Trying setting either

SUBMIT_TIMEOUT_MULTIPLIER=100 condor_submit submit-args


or maybe

TOOL_TIMEOUT_MULTIPLIER=100 condor_submit submit-args


and see if you still hit the problem.

-greg

On 8/18/20 10:14 AM, Michael Pelletier via HTCondor-users wrote:
Hello,

I've got a job submission that accepts a list of input directories as an argument on the condor_submit command line, and when that list grows beyond a certain point, the submission fails:

Submitting job(s)
08/18/20 11:10:44 condor_write() failed: send() 80 bytes to schedd at <127.0.0.1:9618> returned -1, timeout=0, errno=32 Broken pipe.
08/18/20 11:10:44 Buf::write(): condor_write() failed

ERROR: Failed submission for job 7269.-1 - aborting entire submit

ERROR: Failed to queue job.

The strace shows an EPIPE error from a system call.

A submission containing 26483 megabytes worth of inputs succeeds, but 29180 megabytes fails. There's 136G of free space in the /var/lib/condor/execute and spool filesystem, so I'm not running out of space, and I'm also not spooling so condor_submit is just enumerating the input files rather than moving them anywhere.

For the working submission, there's 35,744 individual files and directories in the inputs, and the total length of the paths to each of the input files is 2,999,469 bytes. Adding the additional directory causing it to fail results in 38,892 items and 3,263,860 bytes worth of length.

Am I exceeding a buffer size limit associated with the file transfer enumeration, perhaps? Or is there some issue with condor_submit's communication with the schedd that's tripping things up, either a size limit or a timeout? I don't see anything in SchedLog at default debug levels.


Michael V Pelletier
Principal Engineer

Raytheon Technologies
Information Technology
50 Apple Hill Drive
Tewksbury, MA 01876-1198


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/