[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] Dagman and rescue files



Hi Peter,

My log line is:
log=/home1/ncsg3/basis/simulator/condor_script1.log

After the run the log file has this in it:

000 (833.000.000) 09/14 16:07:38 Job submitted from host: <X:32773>
    DAG Node: condor_script1
...
001 (833.000.000) 09/14 16:08:01 Job executing on host: <X:32772>
...
005 (833.000.000) 09/14 16:08:03 Job terminated.
        (1) Normal termination (return value 0)
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
        0  -  Total Bytes Sent By Job
        0  -  Total Bytes Received By Job
...

Thanks

Colin

>>You wrote

DAGMan is dumping a rescue file in this case because it is unable to 
make any forward progress in the DAG.  The apparent reason it's unable 
to make any forward progress is because it can't open the userlog (the 
job event log) for your DAG node, because the filename of the userlog 
is "" (i.e., an empty string).

This (admittedly cryptic) error is right in the log you included:

> 9/14 10:47:46 UserLog::initialize: open("") failed - errno 2
> (No such file or directory) 9/14 10:47:51 Of 1 nodes total:

What does the "log =" line in your job submit file look like?