Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Parallel Scheduling - Handling of claims when jobs are on hold or are removed before starting

Date: Tue, 15 Aug 2017 09:03:24 -0500
From: Greg Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Parallel Scheduling - Handling of claims when jobs are on hold or are removed before starting

On 08/13/2017 05:06 AM, Felix Wolfheimer wrote:

Just noticed recently the following behavior when using the paralleluniverse. Whenever a job is submitted using the parallel universe andthis job starts claiming resources but has not started up, e.g., thejob requests 5 machines/slots but only 4 are free and get claimed andthe parallel job waits until a fifth slot gets available. If the jobis removed from the queue or set on hold (condor_rm, condor_hold) theclaims on the four machines/slots remain indefinitely (in my cases Iwaited several hours and the claims were still there blockingresources for the non-existent job). The only way to get rid of themwas to send a condor_reconfig command to the affected startds.

Thank you for your very descriptive bug report. We've now fixed this in8.6, but not in time to make the upcoming release. As you point out,the only workaround is to reconfig, or to run very short parallel jobsto consume the slots (perhaps even a one core job).


-greg

References:
- [HTCondor-users] Parallel Scheduling - Handling of claims when jobs are on hold or are removed before starting
  - From: Felix Wolfheimer

Prev by Date: [HTCondor-users] how to sub the bug ticket
Next by Date: [HTCondor-users] condor_userprio issues
Previous by thread: [HTCondor-users] Parallel Scheduling - Handling of claims when jobs are on hold or are removed before starting
Next by thread: [HTCondor-users] diffrent default coresize value between python api and condor_submit
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Parallel Scheduling - Handling of claims when jobs are on hold or are removed before starting