[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Implementing checkpointing via job wrappers
- Date: Mon, 12 Aug 2013 09:45:55 -0500
- From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Implementing checkpointing via job wrappers
This may be a relevant link for you:
Even if you don't use the code directly, there's a good chance that it'll give an example for all the things you want to do.
On Aug 12, 2013, at 9:38 AM, Max Fischer <max.fischer@xxxxxxx> wrote:
> Hello Condor Users,
> we're currently looking into expanding our HTCondor setup to include desktop resources (was previously just glideins and dedicated worker nodes) so I'm investigating if/how to best supply checkpointing capabilities. Problem is that our user's workflows depend heavily on shell scripts for flow control and organisational tasks. Is there a suggested procedure to handle such jobs with preempting?
> Practically all jobs are run by our own job submission tool, so we can modify its wrapper layer (implemented as a shell script). I was thinking about issuing standalone checkpoints  and restoring from checkpoint files if any are present on startup. How must the HTCondor job be setup to fetch these manual checkpoints on eviction and transfer them on restart?
> Are there any guides, hints or tutorials for using external checkpointing such as BLCR?
> Dipl.-Phys. Max Fischer
> Karlsruhe Institute of Technology (KIT)
> Steinbuch Centre for Computing (SCC)
> Institute of Experimental nuclear Physics (IEKP)
> email: max.fischer@xxxxxxx
> phone: +49 721 608 28328 (SCC)
> +49 721 608 43369 (IEKP)
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> The archives can be found at: