[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor on Windows - failover and checkpointing

Hi Shaun
When installing Condor on a Windows machine you can specify what happens when
a job is interrupted. The user is offered two choices:
(1) Whether the job is killed after 5 minutes of continued console activity, or
(2) Whether the job remains in memory and restarts at a later time when there has
     been no console activity for 10 minutes
Selecting the first option will mean that the job will be terminated after 5 minutes
and memory occupied by it will be released. The job will now have to be restarted
from first.
The second option means that the job is not terminated, but continues to exist
in either your computer's RAM or virtual memory. This may slow down the performance
of other application.
Navonil Mustafee
Centre for Applied Simulation Modelling (CASM)
School of Information Systems, Computing & Mathematics
Brunel University, Uxbridge, Middlesex UB8 3PH
Tel: 01895265727 (Direct Line)
Web: http://people.brunel.ac.uk/~cspgnnm

On 4/25/06, Shaun J. O'Callaghan <Shaun.OCallaghan@xxxxxxxxxxxx> wrote:

Dear All,


When implementing a Condor based system on a Windows network, as checkpointing functionality is missing, does this simply mean when a job is interrupted it is either suspended or terminated altogether?


Also, does this mean that there's no transaction-style failover in the event of a job failure?


Any light that could be shed on these two issues would be greatly appreciated.


Kind Regards,



Shaun James O'Callaghan

Research Computing Officer


Department of Geography

University of Durham

Science Site

South Road




Tel: 0191 334 1919

Fax: 0191 334 1801


Condor-users mailing list