[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Getting Failed Jobs to Restart



I am running a simple python script to test my condor configuration -
obviously in the vanilla universe. It simply computes the value of pi
for a while, times itself, and prints what machine it's on. I made it
run for a while so that I would have a chance to monitor it on the
remote machines.

The desired functionality is this - If a job fails (dies due to some
exception or failure) Condor should restart it from scratch. I have
notices that in the standard universe, condor can do this. Can it also
be done in the vanilla universe? What are the limitations. For the end
task it is unlikely that I will be able to relink the code, as it is
legacy material and there are few people around who know enough pascal
to know what it's doing. So I would like to be able to support this
functionality in the vanilla universe.

Thanks in advance.
-Avi