[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Logging what compute node a job executed/failed on




The schedd history file (in your SPOOL directory) contains a record of completed jobs, including LastRemoteHost. You can either scan through this file with your own script, or you can run queries with condor_history. Example:

condor_history -format "%s" ClusterId -format ".%s" ProcId -format " %s\n" LastRemoteHost

If you do use condor_history, be aware that it is much more efficient to run one big bulk query than to run condor_history individually for a long list of jobs. Also be aware that the history file may be periodically rotated, depending on your configuration.

--Dan

Shaun J. O'Callaghan wrote:

Is there a way to get a little more information about condor jobs and where they run, exactly what happened other than having separate log files for each job e.g.

Log = log_$(PROCESS).log

In the submit file?

There’s an issue when we’re submitting 1000+ jobs and we need to know which ones failed, and where they executed. We can of course get the failures via the return codes and error output but it would be helpful to know exactly where this job executed. All we have at the minute is

001 (021.000.000) 09/29 09:58:54 Job executing on host: <xxx.xxx.xxx.xxx:1104>

And while this is useful, it would be helpful to have the execute node actually in the following:

005 (021.000.000) 09/29 09:58:55 Job terminated.

(0) Abnormal termination (signal 53)

(0) No core file

Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage

Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage

Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage

Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage

0 - Run Bytes Sent By Job

384684 - Run Bytes Received By Job

0 - Total Bytes Sent By Job

384684 - Total Bytes Received By Job

.

Rather than just the job id. E.g. what about:

005 (021.000.000) 09/29 09:58:55 Job terminated (after executing on node xxx.xxx.xxx.xxx)

This probably seems trivial, but if anyone can suggest other methods I’d be more than happy to hear them.

Kind Regards,

Shaun

------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR