Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Make runs fail?

Date: Mon, 22 Oct 2018 17:50:12 +0000
From: Michael Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx>
Subject: Re: [HTCondor-users] Make runs fail?

Ok, I think I understand your problem now. The jobs are being submitted and monitored by an external program which doesnât consider a _hold or _rm-ed job to be complete, and will resubmit it thinking that it has gone missing.

If thatâs the case, the question is how does the submitting software decide whether a job needs to be resubmitted, and can that criteria be changed or extended? If the submitter code is looking at a job attribute, then if we can change which attribute it's looking at, we can use an expression for that attribute to set something appropriate whether the run failed or the job was interrupted as non-convergent.

Does that make sense?

Michael V. Pelletier
Information Technology
Digital Transformation & Innovation
Integrated Defense Systems
Raytheon Company

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Kitlasten, Wesley via HTCondor-users
Sent: Friday, October 19, 2018 5:58 PM
To: htcondor-users@xxxxxxxxxxx
Cc: Kitlasten, Wesley <wkitlasten@xxxxxxxx>
Subject: Re: [HTCondor-users] [EXTERNAL] Re: Make runs fail?

Clarification:

The only solution I can come up with (until I move onto something more complex as time allows) is to wait until every parameter set has been submitted and then condor_rm the jobs individually... with a "sabotage node" on my local machine that forces the _held and _rm jobs to fail (yuck). If I condor_rm before all sets have been submitted and don't sabotage, the old/faulty sets just get resubmitted. Am I missing something?... aside from the time and experience to pursue the proper approach!

-- 
Wes Kitlasten
United States Geological Survey
2730 N. Deer Run Road
Carson City, NV 89701
(775) 887-7711

References:
- [HTCondor-users] Make runs fail?
  - From: Kitlasten, Wesley
- Re: [HTCondor-users] Make runs fail?
  - From: Todd Tannenbaum
- Re: [HTCondor-users] [EXTERNAL] Re: Make runs fail?
  - From: Kitlasten, Wesley
- Re: [HTCondor-users] Make runs fail?
  - From: Michael Pelletier
- Re: [HTCondor-users] [EXTERNAL] Re: Make runs fail?
  - From: Kitlasten, Wesley
- Re: [HTCondor-users] [EXTERNAL] Re: Make runs fail?
  - From: Kitlasten, Wesley

Prev by Date: Re: [HTCondor-users] [EXTERNAL] Re: Make runs fail?
Next by Date: [HTCondor-users] how to re-allocate the servers for idle jobs
Previous by thread: Re: [HTCondor-users] [EXTERNAL] Re: Make runs fail?
Next by thread: Re: [HTCondor-users] How to write X.509 map file and unified map file
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Make runs fail?