[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] long running job advise

we currently run jobs longer than 6 days. There have been times when the job dies (shadow_excetion) mostly because I suspect I am doing a stream_output. 

I understand the job restarts occur for whatever reason, but are there any ways to just deal with it without loosing too much work? I was wondering how good the check pointing in condor is -- most of my jobs are perl jobs -- or do people recommend setting up a VM instance?

--- Get your facts first, then you can distort them as you please.--