[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] [resubmit jobs]



I submit some jobs.
A few of jobs took 2 hours to complete, but I think the time should be 20m and some similar jobs indeed finished within 20minutes.
I think something wrong with my jobs or clusters..

My question: how do I let a job restart after a specific time?
For example, if a job didn't finish within 5 minutes, then let the job resubmit?ãor restart on a different machine?

For example, i submit 100 jobs, then 99 jobs finished within 20m, but a job cost 2hours long , i want to resubmit a job.

I used following:
periodic_remove = (CurrentTime - EnteredCurrentStatus >60*20)
then check the log, then submit the failed job.

Any better ideas?

Thanks,
Allen