[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Collision between SUBMIT_TransferOutputRemaps and TransferOutput functionality



Hi,

this is a small issue from the technical side, but the wall of text probably illustrates how unintuitive it seems to us.

We use Condor alongside a number of other batch/grid systems and have a Wrapper tool for users to easily switch between the different systems. For ease of maintenance/development, we try to keep the interfaces with the individual system modules as similar as possible; a recent restructuring however has required us to do some workarounds for file transfer in Condor - manageable but cumbersome.

The problem is that when submitting to a remote Schedd (which is a major use case for our tool), condor will remap the processes Output and Error on the worker (to _condor_stdout and _condor_stderr) [1][2]. From the internal architecture of our Wrapper, the lists for output to transfer does include the Output and Error files specifically with their expected name (gc.stdout and gc.stderr) [3]. When running with this default setup, upon success of the job the condor on the worker will try to transfer gc.stdout as requested and subsequently fails because the file does not exist (as it's still _condor_stdout). This puts the entire job on hold [4] and stops the file transfer.

Now, of course we can (and do) manually exclude the files and redirect all consistency checks, but it requires us to do additional hardcoding of dependencies and especially in light of Condor also being available to the users directly (a few of which already did make the same mistake) it seems very impractical and unintuitive. Is it possible to manipulate the automatic remapping, preferably also with a setting on the Schedd? I think setting the default to the basename of the Output/Error would suffice.

Cheers,
Max


[1] Job JDL
Output = /home/condor/gc_test/work.gcUp/sandbox/0/gc.stdout
Error = /home/condor/gc_test/work.gcUp/sandbox/0/gc.stderr

[2] Job Classad
SUBMIT_TransferOutputRemaps = "_condor_stdout=/home/condor/gc_test/work.gcUp/sandbox/15/gc.stdout;_condor_stderr=/home/condor/gc_test/work.gcUp/sandbox/15/gc.stderr"

[3] Job Classad
TransferOutput = "gc.stderr,job.info,job.stdout.gz,job.stderr.gz"

[4] Job Classad
HoldReason = "Error from glidein_29815@*********: STARTER at ***.***.***.*** failed to send file(s) to <***.***.***.***:****>: error reading from /home/cmsger031/home_cream_994114414/glide_O27358/execute/dir_902/gc.stdout: (errno 2) No such file or directory; SHADOW failed
to receive file(s) from <***.***.***.***:*****>"