[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] POST script user privileges in DAG



On 06/02/2015 15:34, R. Kent Wenger wrote:
Ah, yes, the "no stdout/stderr from PRE/POST scripts" issue has been around quite a while:
https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=171,4
One workaround is for the POST script you specify in your DAG file to just be a wrapper that runs the "real" script and captures stdout and stderr.
If your PRE/POST script is a shell script, you can do this:

#!/bin/sh
exec >mypre.out 2>mypre.err
... continue with rest of script ...

(The "exec" keyword by itself, not followed by any other command, just redirects the stdout and/or stderr of the shell itself)

> Yes, the "run multiple jobs on the same machine" issue is another old one:
> https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=572,4

If these form a linear sequence, then you can write a single node which does this:

#!/bin/sh -e
./script1 args1
./script2 args2
./script3 args3

'-e' causes the shell script to fail if any of its commands returns a non-zero exit status which is not trapped, e.g. by an 'if' statement.

This seems to me to do essentially the same as the 'claim_timeout', which would have a slot temporarily reserved for the successor node.

Of course, you lose the granularity which would allow DAGMan to restart after a partial completion: let's say if script1 succeeds and script2 fails, and you want it to restart with script2. However, if the requirement is to be able to restart like this *and* that script2 has to run on the same machine as script1 ran, I can see no option other than for DAGMan to capture the node which script1 originally ran on, and record this in the rescue DAG as an extra constraint. That's pretty complicated.

Regards,

Brian Candler.