[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Transfer files between jobs in DAG



On Sun, May 12, 2013 at 06:51:01AM +0800, 钱晓明 wrote:
>    Hi, I want the child job use the output file of its parent job as its
>    input file.
>    Is the output file transferred from startd node where the parent job
>    executed to dag submit node and then transferred to the startd node
>    where the child job to be executed again?

Only if you ask it to be transferred, in the submit file for the child job.

So you can either make a separate submit file for each job, or you can set
the parameters of interest in the dag node.

What I do is something like this (untested):

# jobs.sub
universe = vanilla
transfer_input_files = $(job_files)
queue

# jobs.dag
JOB A jobs.sub
VARS A output=A.out error=A.err executable=A.sh
JOB B jobs.sub
VARS B output=B.out error=B.out executable=B.sh input=A.out job_files=A.out
PARENT A CHILD B

I use a separate 'job_files' macro so that in the .sub file I can list other
files that I always want transferred. e.g.

transfer_input_files = mylib.zip,$(job_files)
environment = "PYTHONPATH=mylib.zip"

>    And in these steps, relative
>    paths of all input/output files for all node jobs are not changed?

Unfortunately not. By default the relative paths are stripped, so with

   transfer_input_files = foo/bar/baz

then the file ends up as just "baz". Full details at
http://research.cs.wisc.edu/htcondor/manual/current/condor_submit.html

"When a path to an input file or directory is specified, this specifies the
path to the file on the submit side. The file is placed in the job's
temporary scratch directory on the execute side, and it is named using the
base name of the original path. For example, /path/to/input_file becomes
input_file in the job's scratch directory.

A directory may be specified using a trailing path separator. An example of
a trailing path separator is the slash character on Unix platforms; a
directory example using a trailing path separator is input_data/. When a
directory is specified with a trailing path separator, the contents of the
directory are transferred, but the directory itself is not transferred. It
is as if each of the items within the directory were listed in the transfer
list. When there is no trailing path separator, the directory is
transferred, its contents are transferred, and these contents are placed
inside the transferred directory."

So I think that

    transfer_input_files = foo/bar

(without a trailing slash) will do what you want, but I've not tested it
myself.

Regards,

Brian.