[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] FW: Logging what compute node a job executed/failed on



 

Is there a way to get a little more information about condor jobs and where they run, exactly what happened other than having separate log files for each job e.g.

 

Log = log_$(PROCESS).log

 

In the submit file?

 

There’s an issue when we’re submitting 1000+ jobs and we need to know which ones failed, and where they executed.  We can of course get the failures via the return codes and error output but it would be helpful to know exactly where this job executed.  All we have at the minute is

 

001 (021.000.000) 09/29 09:58:54 Job executing on host: <xxx.xxx.xxx.xxx:1104>

 

And while this is useful, it would be helpful to have the execute node actually in the following:

 

005 (021.000.000) 09/29 09:58:55 Job terminated.

        (0) Abnormal termination (signal 53)

        (0) No core file

                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage

                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage

                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage

                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage

        0  -  Run Bytes Sent By Job

        384684  -  Run Bytes Received By Job

        0  -  Total Bytes Sent By Job

        384684  -  Total Bytes Received By Job

.

 

Rather than just the job id.  E.g. what about:

 

005 (021.000.000) 09/29 09:58:55 Job terminated (after executing on node xxx.xxx.xxx.xxx)

 

This probably seems trivial, but if anyone can suggest other methods I’d be more than happy to hear them.

 

Kind Regards,

 

Shaun