Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problem with START configuration for allocating whole machine - VMs stuck in owner state after job is removed

Date: Fri, 20 Feb 2009 14:57:20 -0800
From: David Brodbeck <brodbd@xxxxxxxxxxxxxxxx>
Subject: [Condor-users] Problem with START configuration for allocating whole machine - VMs stuck in owner state after job is removed

I'm trying to do something along the lines of what's described here,to provide a single-slot option for large jobs:

http://nmi.cs.wisc.edu/node/1482

I'm implementing it on condor 6.8.8, so I've changed the syntax to useVirtualMachineID instead of SlotID and vm1 instead of Slot1.

However, I'm running into trouble with machines not recovering whenthe large jobs finish.

It works in that jobs marked with "+RequiresWholeMachine = True" willonly run in vm1, and the other VMs on each machine are marked as beingin "Owner" status once the job starts. However, when I remove theRequiresWholeMachine job from the queue, the other VMs get stuck inthe "Owner" state and never return to being unclaimed. Here's mySTART condition:


START	= ( ( $(CPUIdle) || \
                      (State != "Unclaimed" && State != "Owner")) \
	&& (VirtualMachineID == 1 || vm1_RequiresWholeMachine =!= True) && \

(TARGET.RequiresWholeMachine =!= True || VirtualMachineID ==1) )

Can anyone spot what I'm doing wrong that's preventing VMs fromreturning to the "Unclaimed" state once the RequiresWholeMachine jobis removed? They seem to stay that way until I run 'condor_reconfig'to force a reload.


--

David Brodbeck
System Administrator, Linguistics
University of Washington

Follow-Ups:
- Re: [Condor-users] Problem with START configuration for allocating whole machine - VMs stuck in owner state after job is removed
  - From: Greg Thain

Prev by Date: Re: [Condor-users] Jobs interruption in the middle of running cause end results to failed
Next by Date: Re: [Condor-users] Problem with START configuration for allocating whole machine - VMs stuck in owner state after job is removed
Previous by thread: Re: [Condor-users] Condor 7.2.1 Released!
Next by thread: Re: [Condor-users] Problem with START configuration for allocating whole machine - VMs stuck in owner state after job is removed
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] Problem with START configuration for allocating whole machine - VMs stuck in owner state after job is removed