[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [EXTERNAL] Re: Make runs fail?



On 10/19/2018 04:57 PM, Kitlasten, Wesley via HTCondor-users wrote:
> Clarification:
> 
> The only solution I can come up with (until I move onto something more
> complex as time allows) is to wait until every parameter set has been
> submitted and then condor_rm the jobs individually... with a "sabotage
> node" on my local machine that forces the _held and _rm jobs to fail
> (yuck). If I condor_rm before all sets have been submitted and don't
> sabotage, the old/faulty sets just get resubmitted. Am I missing
> something?... aside from the time and experience to pursue the proper
> approach!

You can set a periodic_remove on your jobs, the question is can you
figure out an expression that'll work for you.

E.g. on jobs where I know if it hasn't completed in 5 days it's stuck I
just do "(time() - QDate) > 500000"

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Attachment: signature.asc
Description: OpenPGP digital signature