[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Notification of Cluster Complete - notprocesscomplete



On Thu, 29 Jul 2004 Robert.Nordlund@xxxxxxxxxxxxxxxx wrote:

> I was under the assumption that if I submitted jobs and my submit machine
> died, I lost all connection to the running jobs and jobs yet to be
> scheduled.  Is this the case or am I completely misguided?

If your submit machine crashes, any running jobs are killed on the execute
machines. When the submit machine restarts, the jobs will be marked as
idle and Condor will attempt to acquire new machines on which to restart
them. Since scheduler universe jobs just run on the submit machine,
they're restarted immediately. Therefore, all your jobs need to be able to
deal with dying in mid-execution (or just after completion) and afterwards
being restarted.

+----------------------------------+---------------------------------+
|            Jaime Frey            | I stayed up all night playing   |
|        jfrey@xxxxxxxxxxx         | poker with tarot cards. I got a |
|  http://www.cs.wisc.edu/~jfrey/  | full house and four people died.|
+----------------------------------+---------------------------------+