[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] transfer_in/output_files only if they exist



Hi Todd,

Thanks. We have it working with periodic_hold now, so we'll try and switch to this and see if it works.

Cheers,
Duncan.

> On Feb 15, 2019, at 23:08, Todd L Miller <tlmiller@xxxxxxxxxxx> wrote:
> 
>> Follow-up question: is there a way to set something like
>> 
>> periodic_transfer_spool = 3600
>> 
>> so that the contents of the job's spool directory can be transferred back to the shadow's spool periodically? In combination with ON_EXIT_OR_EVICT that would give me periodic checkpointing if the job dies unexpectedly, in addition to when it is cleanly evicted.
> 
> 	We have some experimental (meaning, probably at least partially broken) features intended to support this kind of use case.  They're both designed around the observation that HTCondor has no real way of knowing when it's safe to transfer the job sandbox if the job is still running, but that if you're creating checkpoints, your job is going to know how to restart from them.
> 
> 	If you want the job to periodically checkpoint, you can request that HTCondor send a signal to it every so often; when it exits successfully, HTCondor performs file transfer (as if the job had been evicted), but instead of going back into the queue, HTCondor just restarts the job right where it was running.
> 
> 	If the job generates checkpoints on its own, you can also configure HTCondor to recognize, for example, that when the job exits with code 88, that means to perform file transfer (as if the job had been evicted), and then restart the job right where it had been running.
> 
> 	See the following page on our Wiki for details, and do please let me know if either feature works for you.  Thanks.
> 
> https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=ExperimentalSupportForPeriodicCheckpointingInVanillaUniverse
> 
> - ToddM
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

-- 

Duncan Brown                              Room 263-1, Physics Department
Charles Brightman Professor of Physics     Syracuse University, NY 13244
http://dabrown.expressions.syr.edu                   Phone: 315 443 5993