[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Implementing checkpointing via job wrappers

Hi Max,

This may be a relevant link for you:


Even if you don't use the code directly, there's a good chance that it'll give an example for all the things you want to do.



On Aug 12, 2013, at 9:38 AM, Max Fischer <max.fischer@xxxxxxx> wrote:

> Hello Condor Users,
> we're currently looking into expanding our HTCondor setup to include desktop resources (was previously just glideins and dedicated worker nodes) so I'm investigating if/how to best supply checkpointing capabilities. Problem is that our user's workflows depend heavily on shell scripts for flow control and organisational tasks. Is there a suggested procedure to handle such jobs with preempting?
> Practically all jobs are run by our own job submission tool, so we can modify its wrapper layer (implemented as a shell script). I was thinking about issuing standalone checkpoints [1] and restoring from checkpoint files if any are present on startup. How must the HTCondor job be setup to fetch these manual checkpoints on eviction and transfer them on restart?
> Are there any guides, hints or tutorials for using external checkpointing such as BLCR?
> Cheers,
>  Max
> [1]
> http://research.cs.wisc.edu/htcondor/manual/v7.8/4_2HTCondor_s_Checkpoint.html#sec:standalone-ckpt
> [2]
> https://ftg.lbl.gov/projects/CheckpointRestart/
> -- 
> Dipl.-Phys. Max Fischer
> Karlsruhe Institute of Technology (KIT)
> Steinbuch Centre for Computing (SCC)
> Institute of Experimental nuclear Physics (IEKP)
> email:  max.fischer@xxxxxxx
> phone:  +49 721 608 28328 (SCC)
>        +49 721 608 43369 (IEKP)
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/