Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] How to minimize the reschedule interval for jobs on failed machines?

Date: Tue, 10 Sep 2013 18:30:31 +0800
From: 钱晓明 <kyleqian@xxxxxxxxx>
Subject: [HTCondor-users] How to minimize the reschedule interval for jobs on failed machines?

I find condor will execute jobs in other slots when the machine they on failed. But I think the interval is too long, about 22 minutes in my 5 nodes cluster.
So how can I minimize this interval? Condor should know that machine is down, because new jobs are not sent to it.
By the way, condor_q always shows that jobs are in running state, is it right?

Follow-Ups:
- Re: [HTCondor-users] How to minimize the reschedule interval for jobs on failed machines?
  - From: Andrey Kuznetsov

Prev by Date: Re: [HTCondor-users] How to make condor auto-reschedule my jobs executing on nodes which are failed because of hardware?
Next by Date: Re: [HTCondor-users] How to minimize the reschedule interval for jobs on failed machines?
Previous by thread: [HTCondor-users] Condor 8.0.2 stdout and stderr logging behavior
Next by thread: Re: [HTCondor-users] How to minimize the reschedule interval for jobs on failed machines?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[HTCondor-users] How to minimize the reschedule interval for jobs on failed machines?