[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] remote submission issues with automatic release and excessive copying



One of the reasons we moved to 6.8 recently was to get remote
submission working.

It does but has 2 gotchas

One is relatively simple and is that it appears that every command to
queue in the submit file causes an entirely new copy of the executable
and transfer input files to be created (and thus staged over the
network) considerably slowing the submission process (as well as
wasting lots of disk - but it's the submission latency I care about
more).

Would a command to "queue X" cause X copies or would that at least
know it was guaranteed to be able to have a consistent set of data and
just do the duplication on the remote side?
(they could then hack round it by providing differing parameters as
differently numbered input files at least)

The other is the use of the submit on hold and release which is
co-opted by the remote submission process. This means that an
instruction in the submit file of "hold = true" is effectively
ignored.
There are possible ways around this with periodic_hold expressions
based on the jobs birth date and a small spread about this; but such
things are hacky and error prone (not to mention removing any use we
might have for the expression otherwise)

When a job is being remote submitted does it appear in the condor_q
output in state 1 (Idle) *before* having the data staged to it? If it
does this is going to be a right pain for our automated release
mechanisms.
Would it be possible for the remote submission process to respect the
hold=true command? If it could also not appear as Idle to the queue
readers until it is ready that would be good, but if not if there were
some other way to note that a job is not yet ready to run we could at
least work round it.

Incidentally if the hold provisos I detailed above are how the remote
submission works it would be nice if the documentation for
condor_submit make it more clear which commands became unavailable

Thanks,
Matt