[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Make runs fail?



Hello,

I am using the parameter estimation software PEST to run multiple models
(jobs). PEST uses YAMR and Panther, although I struggle to make sense of
how everything works together.

The parameters are determined from a probability distribution. Some
parameter combinations (jobs) can take 12+ hours to run, and from previous
experience I can tell the results of those runs will be worthless to me. I can usually
tell which jobs will be useless within the first hour. I would like to
remove these jobs after about 1 hours to free up cores for other runs but have them returned as failed jobs and not be resubmitted to the pool.Â

For example, if I type "condor_rm 1.10" it will
remove that job, but the model with those parameters will just be
resubmitted to another node and start over. However, if the job truly fails a job with
those parameters will not be resubmitted.

Is there a way to remove a job and have condor return a failed status,
rather than have the same parameters run under a different job name?

References:
https://github.com/dwelter/pestpp
https://github.com/jtwhite79/pestpp/tree/master/bin/iwin