[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor on Windows - failover and checkpointing



Hi Shaun
 
When installing Condor on a Windows machine you can specify what happens when
a job is interrupted. The user is offered two choices:
 
(1) Whether the job is killed after 5 minutes of continued console activity, or
(2) Whether the job remains in memory and restarts at a later time when there has
     been no console activity for 10 minutes
 
Selecting the first option will mean that the job will be terminated after 5 minutes
and memory occupied by it will be released. The job will now have to be restarted
from first.
 
The second option means that the job is not terminated, but continues to exist
in either your computer's RAM or virtual memory. This may slow down the performance
of other application.
 
Regards
Nav
 
Navonil Mustafee
Researcher
Centre for Applied Simulation Modelling (CASM)
School of Information Systems, Computing & Mathematics
Brunel University, Uxbridge, Middlesex UB8 3PH
Tel: 01895265727 (Direct Line)
Web: http://people.brunel.ac.uk/~cspgnnm
 

On 4/25/06, Shaun J. O'Callaghan <Shaun.OCallaghan@xxxxxxxxxxxx> wrote:

Dear All,

 

When implementing a Condor based system on a Windows network, as checkpointing functionality is missing, does this simply mean when a job is interrupted it is either suspended or terminated altogether?

 

Also, does this mean that there's no transaction-style failover in the event of a job failure?

 

Any light that could be shed on these two issues would be greatly appreciated.

 

Kind Regards,

 

 

Shaun James O'Callaghan

Research Computing Officer

 

Department of Geography

University of Durham

Science Site

South Road

Durham

DH1 3LE

 

Tel: 0191 334 1919

Fax: 0191 334 1801

 


_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users