[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to forbid job restarts



Hello Todd,

> The convention -- AFAIK -- for HTCondor configuration files is to be 644 (world-readable, owner-writable), but owned by root.

Thank you for the useful advice!

So, after 2 nights and one day of running jobs with the new version of HTCondor, my problem with jobs getting periodically restarted before they could complete seems to be resolved: the restarts do not happen anymore. In addition to installing the new version of HTCondor, I have made the following substantial changes:

 * deleted all the HTCondor log files, with the same names, as the log files of new jobs
 * stopped archiving and deleting log files of very old jobs (those that completed >6 months ago) in the same directory, as the logs of currently running jobs
 * set variable MAX_JOBS_PER_SUBMISSION to 300000 (the initial DAG, which broke my old HTCondor installation, contained >70000 nodes)

So, some subset of these changes was enough in my case to resolve the problem. Hopefully, it will help somebody with a similar problem.

Best,
Siarhei.



............................................................................

Trading instructions sent electronically to Bernstein shall not be deemed
accepted until a representative of Bernstein acknowledges receipt
electronically or by telephone.  Comments in this e-mail transmission and
any attachments are part of a larger body of investment analysis. For our
research reports, which contain information that may be used to support
investment decisions, and disclosures see our website at
www.bernsteinresearch.com.

For further important information about AllianceBernstein please click here
http://www.alliancebernstein.com/disclaimer/email/disclaimer.html