[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] using idle computers in computer labs for CFD jobs



On 19/10/2015 03:43, "HTCondor-users on behalf of David Herd"
<htcondor-users-bounces@xxxxxxxxxxx on behalf of d.herd@xxxxxxxxxxx> wrote:

>The jobs we run are mostly CFD and use Ansys.  As such we canât link them
>HT Condor modules and it looks like we wonât be able to take checkpoints
>of our jobs.

We just encourage our users to write their own checkpointing code under
the vanilla universe. We also have templates for e.g. C and MATLAB.
Basically, you have to check on startup for the existence of a checkpoint
file and if present start the computation from the point its contents
define; and then also periodically update it (or update it on evict).
Condor handles all the rest.

The very latest Condor (which we don't run) has a little more help for
vanilla checkpointing, but it doesn't save the user a lot of code
(basically you could just do the file write on evict bit, I think). If
nearly all your users run Ansys, you could likely figure out a template
for checkpointing that they could all copy.

regards
-Ian

-- 
Ian Cottam  | IT Relationship Manager | IT Services  | C38 Sackville
Street Building  |  The University of Manchester  |  M13 9PL  |