[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Job Suspended - Stuck
- Date: Thu, 16 Jan 2014 13:29:30 -0600
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Job Suspended - Stuck
On 1/16/2014 12:02 PM, Andrey Kuznetsov wrote:
Here's the log file from a job that appears to be suspended, and I cannot
Short of removing the job and resubmitting it, is there another way to
force it to restart or continue?
The story here is your job landed on a machine that is configured to
suspend jobs running on that machine when some condition becomes true
(e.g. activity on the keyboard or increased non-condor load average) and
then unsuspend or restart the job after X amount of time. This sort of
policy is common when running jobs on non-dedicated desktop machines.
As a user submitting jobs, if you never want your jobs to suspend,
you're only recourse is to add a requirement to your submit file to
avoid machines with such a policy (if there are any such machines in
If you are also the administrator of the machines in your pool, you
SUSPEND = FALSE
into your condor_config file...
001 (1321.003.000) 01/15 15:57:23 Job executing on host: <128.114.*.*:9944>
006 (1321.003.000) 01/15 15:57:32 Image size of job updated: 24704
2 - MemoryUsage of job (MB)
1572 - ResidentSetSize of job (KB)
010 (1321.003.000) 01/15 15:58:56 Job was suspended.
Number of processes actually suspended: 2