On 10/23/2012 11:14 AM, Hermann Fuchs wrote:
Hello Currently we do have some Problems with our grid.It seems some machines abort jobs after about 20 Minutes. In oder to identify the erroneous machines I would needsome command to show the job duration history of the machine.
There's no easy way to do this with Condor today that I can think of. This is because the job history file contains the summary of all the execution attempts for a given job. If there is one user log file, parsing this is probably the best approach.
Or, if you can turn on the startd_history on each execute machine, each startd will write out a history-like file, but you'd need to concantenate those yourself.
-greg