Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Machines in state claimed/idle forever

Date: Sun, 02 Jun 2013 19:59:47 +0200
From: Felix Wolfheimer <f.wolfheimer@xxxxxxxxxxxxxx>
Subject: [HTCondor-users] Machines in state claimed/idle forever

I'm facing a problem with machines remaining in the claimed/idle state
forever. I guess it has something to do with my configuration of condor
and hope that someone has some idea. 

I'm using condor in a configuration where there's only one dedicated
scheduler for the whole pool (which also runs negotiator and collector).
All users are supposed to submit their jobs to this central scheduler.
It serves for both "normal" jobs (vanilla) using just one node but also
for MPI jobs (parallel) which need to reserve multiple nodes to run.
PREEMPTION is switched off completely:

PREEMPT = false
PREEMPTION_REQUIREMENTS= false

Now, in a situation where the scheduler is claiming resources for an MPI
job these resources go first into a "claimed/idle" state before the
scheduler has accumulated enough resources to start the job. If I decide
now to put the MPI job on hold before it actually runs the machines stay
in the claimed/idle state even though there's no job anymore to run. If
I submit any vanilla job it won't run as well on this machine because it
is rejected as the previous claim remains active. Basically, the machine
will remain in the claimed/idle state forever. I can solve the problem
by restarting condor_startd on the claimed machines. Then, it forgets
about the claim. Otherwise it will stay claimed/idle and thus be
blocked. 
Has anyone seen this behavior? Is there any recommendation about how to
configure condor for such a setup?

Prev by Date: Re: [HTCondor-users] Job Scheduling
Next by Date: Re: [HTCondor-users] Remove files when the job gets evicted or crashes
Previous by thread: [HTCondor-users] ICAC 2013 - Call for participation
Next by thread: [HTCondor-users] Whole memory request for wholeMemory job.
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[HTCondor-users] Machines in state claimed/idle forever