[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Suggestion: transfer on error
- Date: Fri, 6 Dec 2013 09:30:29 -0600 (CST)
- From: "R. Kent Wenger" <wenger@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Suggestion: transfer on error
On Fri, 6 Dec 2013, Brian Candler wrote:
(2) When I submit a DAG full of jobs, in log files I cannot see any record of
which host a particular job ran on.
If I see it while it's running (condor_q -run -dag) then I can see the host.
But this is not recorded in *.dagman.out as far as I can see.
You're right, this isn't in dagman.out.
Is there any way to log this information? It would be really helpful, for
example, if a job fails when it is matched with one particular host because
an NFS mount is missing.
But it is in the user log file for the job. If you're running a fairly
recent DAGMan (7.9 something or later) you'll get a file called
*.dag.nodes.log (unless you've changed the default settings). That file
will have events for all of your node jobs, including execute events
which have the IP address of the execute host.
If you specify 'log = ...' in your submit files, you can also see the same
events in those files.
If you're running an older dagman you most likely won't get the
.dag.nodes.log file, but you'll still have the individual logs specified
in the submit files.