Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Collision between SUBMIT_TransferOutputRemaps and TransferOutput functionality
- Date: Fri, 14 Dec 2012 12:05:08 +0100
- From: Max Fischer <mfischer@xxxxxxxxxxxxxxxxxxxx>
- Subject: [HTCondor-users] Collision between SUBMIT_TransferOutputRemaps and TransferOutput functionality
Hi,
this is a small issue from the technical side, but the wall of text
probably illustrates how unintuitive it seems to us.
We use Condor alongside a number of other batch/grid systems and have a
Wrapper tool for users to easily switch between the different systems.
For ease of maintenance/development, we try to keep the interfaces with
the individual system modules as similar as possible; a recent
restructuring however has required us to do some workarounds for file
transfer in Condor - manageable but cumbersome.
The problem is that when submitting to a remote Schedd (which is a major
use case for our tool), condor will remap the processes Output and Error
on the worker (to _condor_stdout and _condor_stderr) [1][2]. From the
internal architecture of our Wrapper, the lists for output to transfer
does include the Output and Error files specifically with their expected
name (gc.stdout and gc.stderr) [3].
When running with this default setup, upon success of the job the condor
on the worker will try to transfer gc.stdout as requested and
subsequently fails because the file does not exist (as it's still
_condor_stdout). This puts the entire job on hold [4] and stops the file
transfer.
Now, of course we can (and do) manually exclude the files and redirect
all consistency checks, but it requires us to do additional hardcoding
of dependencies and especially in light of Condor also being available
to the users directly (a few of which already did make the same mistake)
it seems very impractical and unintuitive. Is it possible to manipulate
the automatic remapping, preferably also with a setting on the Schedd? I
think setting the default to the basename of the Output/Error would suffice.
Cheers,
Max
[1] Job JDL
Output = /home/condor/gc_test/work.gcUp/sandbox/0/gc.stdout
Error = /home/condor/gc_test/work.gcUp/sandbox/0/gc.stderr
[2] Job Classad
SUBMIT_TransferOutputRemaps =
"_condor_stdout=/home/condor/gc_test/work.gcUp/sandbox/15/gc.stdout;_condor_stderr=/home/condor/gc_test/work.gcUp/sandbox/15/gc.stderr"
[3] Job Classad
TransferOutput = "gc.stderr,job.info,job.stdout.gz,job.stderr.gz"
[4] Job Classad
HoldReason = "Error from glidein_29815@*********: STARTER at
***.***.***.*** failed to send file(s) to <***.***.***.***:****>: error
reading from
/home/cmsger031/home_cream_994114414/glide_O27358/execute/dir_902/gc.stdout:
(errno 2) No such file or directory; SHADOW failed
to receive file(s) from <***.***.***.***:*****>"