[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Getting Failed Jobs to Restart



Yes, but what if the job fails do to some internal exception - ie
uncaught exception in c++. Can Condor do anything there? I noticed
that in the standard universe, if I sent a running process sigkill
condor would register it as idle and then restart it in the next
negotiation cycle.

Can this be done in the standard universe?

Thanks for the response Jamie.
-Avi

On 8/26/05, Jaime Frey <jfrey@xxxxxxxxxxx> wrote:
> On Aug 25, 2005, at 6:11 PM, Avi Flamholz wrote:
> 
> > I am running a simple python script to test my condor configuration -
> > obviously in the vanilla universe. It simply computes the value of pi
> > for a while, times itself, and prints what machine it's on. I made it
> > run for a while so that I would have a chance to monitor it on the
> > remote machines.
> >
> > The desired functionality is this - If a job fails (dies due to some
> > exception or failure) Condor should restart it from scratch. I have
> > notices that in the standard universe, condor can do this. Can it also
> > be done in the vanilla universe? What are the limitations. For the end
> > task it is unlikely that I will be able to relink the code, as it is
> > legacy material and there are few people around who know enough pascal
> > to know what it's doing. So I would like to be able to support this
> > functionality in the vanilla universe.
> 
> Condor automatically restarts all jobs that don't complete due to a
> Condor failure or being kicked off a machine. The standard universe
> does one better by restarting the jobs where they left off, instead
> of at the beginning.
> 
> You can also tell Condor to restart jobs that complete on their own
> with the on_exit_remove expression.
> 
> +----------------------------------+---------------------------------+
> |            Jaime Frey            |  Public Split on Whether        |
> |        jfrey@xxxxxxxxxxx         |  Bush Is a Divider              |
> |  http://www.cs.wisc.edu/~jfrey/  |         -- CNN Scrolling Banner |
> +----------------------------------+---------------------------------+
> 
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>