Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Dealing with lost submitters.

Date: Thu, 30 Jun 2022 06:42:25 +0000
From: Dudu Handelman <duduhandelman@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Dealing with lost submitters.

Thanks Jamie.

I just remembered that i have configured it before. I have a transform which set the JobLeaseDuration = 1800

I wonder why it's not working maybe its docker universe related.

Will try to verify that in the lab

Thanks

David.

Get Outlook for Android

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Jaime Frey <jfrey@xxxxxxxxxxx>
Sent: Wednesday, June 29, 2022, 23:48
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Dealing with lost submitters.

One option is to set JOB_DEFAULT_LEASE_DURATION in the configuration files on the submitting machines. The default is 2400 seconds (40 minutes). This controls how long the submitter and executor will attempt to reconnect before aborting a job execution. The downside to lowering this value is that you risk killing jobs in situations where an interruption is temporary. For example, when upgrading HTCondor or rebooting on the submit machine.

- Jaime

On Jun 25, 2022, at 1:15 AM, Dudu Handelman <duduhandelman@xxxxxxxxxxx> wrote:

Hi all.

Sometime the submitting machine is out of resources for example disk space. the condor service will be stopped and the jobs on the executer side will wait for it.

So, in this situation there are waisted resources just waiting.

Usually, I do it manually by evicting this user jobs.

How to deal with it automatically?

Many thanks

David

References:
- [HTCondor-users] Dealing with lost submitters.
  - From: Dudu Handelman
- Re: [HTCondor-users] Dealing with lost submitters.
  - From: Jaime Frey

Prev by Date: Re: [HTCondor-users] Dealing with lost submitters.
Previous by thread: Re: [HTCondor-users] Dealing with lost submitters.
Next by thread: [HTCondor-users] Using Singularity in Parallel Universe
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Dealing with lost submitters.