[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] file transfer problems with vanilla job

On Wed, 10 Nov 2004 12:56:07 -0600 (CST)  De-Wei Yin wrote:

> My problem is that the output files are not coming back when the job
> is evicted from a node (by Condor or by me using condor_vacate or
> condor_hold)

i bet you're just not looking in the right place.  see my previous

> When the jobs is evicted or removed, I expect to find the latest "ckpt"
> file (if one has already been written), an "rlog" file, and any number
> of "data" files.

look in spool/cluster<X>.proc<Y>.subproc<Z>

> Unfortunately nothing comes back and the job always
> restarts from scratch, and I cannot figure out why.

i bet it's not restarting from scratch.  do you have proof of that?

> If the job is evicted, then when it restarts, it will need as input:
> the "ckpt" if it has already been created, the "init" file in case
> there is no "ckpt" file, and the "rlog" if it has been created so
> that new records can be appended to it.  How do I tell Condor that
> it needs to send back these files, especially the "ckpt" and "rlog"
> files, which might not yet exist if the job was interrupted early.

you don't have to.  condor does this automatically.  if you don't
specify transfer_output_files, condor sees any new files that get
created and transfers them for you.  if you're in ON_EXIT_OR_EVICT
mode, it will include all these files to be transfered as input the
next time it's run.

> By the way, any numbered "data" file that does come back need not be
> returned to the execute node since they are never needed as input.

that's one thing we don't handle now, and i don't see any good way to
avoid it.  these files are just going to be transfered around with
your job until it finally completes.  there's currently no way to
specify an "ignore" list or something.  it's worse, since you want
them transfered back to your submit machine, you just don't want them
considered input for the future...