Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] searching for job rerun info

Date: Fri, 08 Apr 2016 10:56:34 +0200
From: Thomas Hartmann <thomas.hartmann@xxxxxxx>
Subject: Re: [HTCondor-users] searching for job rerun info

Hi Todd,

many thanks for the link!
Probably extending history/logs is the most reasonable way - three days
of reaction time may be to short for all parties involved ;)

Cheers and thanks,
  Thomas

ps: Normally we are having grid jobs, i.e., pilots so re-runs should be
no problem. However, in this case there was a problem upstream causing a
bit of confusion.


On 2016-04-05 22:14, Todd Tannenbaum wrote:
> 
> Hi Thomas,
> 
>>From reading the above, is your desire that your job never gets re-run by HTCondor even 
> in the event of failures?  
> If so see
>  https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToAvoidJobRestarts
> This wiki page also lists out all the typical reasons why HTCondor will automatically
> restart a job; be aware that by default HTCondor alone will not restart a job that
> exits successfully, even if it exits with a non-zero exit code.  
> 
> As for will the rerun job have the same job id: yes it will, unless you are 
> using DAGMan -- failed nodes in DAGMan are resubmitted and thus will have a new job id.
> 
> As for where you can look since your history file rotated:  did the
> job specify a job event log via "log = /some/file" in the submit file? If so
> you could look there.  You could also grep the schedd log for the job id, but
> guessing that the SchedLog already rotated.  Finally, if you define "EVENT_LOG = /some/file" in
> the condor_config on your submit node, you could look there.
> 
> But you likely want to increase the size specified via config knob MAX_HISTORY_LOG. :)
> 
> Hope the above helps
> Todd

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

References:
- [HTCondor-users] searching for job rerun info
  - From: Thomas Hartmann
- Re: [HTCondor-users] searching for job rerun info
  - From: Todd Tannenbaum

Prev by Date: [HTCondor-users] adjust memory of slots according to free memory on machine
Next by Date: Re: [HTCondor-users] condor_status in SOAP API
Previous by thread: Re: [HTCondor-users] searching for job rerun info
Next by thread: [HTCondor-users] keep_claim_idle
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] searching for job rerun info