[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] partitionable slots are not returned?



Hi Matt

On Monday 28 March 2011 14:10:14 Matthew Farrellee wrote:
> I would guess that your START is preventing the slot from handling any
> jobs or going back to the Unclaimed state. When a dynamic slot hits
> Unclaimed it gets folded back into the partitionable slot.
> 

Let's see, right now, after a lot of testing it looks like this:
condor_status -direct gpu016

Name               OpSys      Arch   State     Activity LoadAv Mem   
ActvtyTime

slot1@xxxxxxxxxxxx LINUX      X86_64 Owner     Idle     0.000  7141  
0+01:06:49
slot1_1@xxxxxxxxxx LINUX      X86_64 Owner     Idle     0.000   256  
0+01:09:11
slot1_2@xxxxxxxxxx LINUX      X86_64 Claimed   Idle     0.000   256  
0+01:08:53
slot1_3@xxxxxxxxxx LINUX      X86_64 Owner     Idle     0.000   256  
0+00:06:46
               Total Owner Claimed Unclaimed Matched Preempting Backfill

X86_64/LINUX     4     3       1         0       0          0        0


StarterLogs look like this:
[...]
03/28 13:36:37 Create_Process succeeded, pid=6240
03/28 13:36:56 Got SIGQUIT.  Performing fast shutdown.
03/28 13:36:56 ShutdownFast all jobs.
03/28 13:36:56 Process exited, pid=6240, signal=9
03/28 13:36:56 Last process exited, now Starter is exiting
03/28 13:36:56 **** condor_starter.orig (condor_STARTER) pid 6185 EXITING WITH 
STATUS 0
(which was about 70 minutes ago).

StartLog:
[...]
03/28 13:36:56 slot1_1: Called deactivate_claim_forcibly()
03/28 13:36:56 Starter pid 6185 exited with status 0
03/28 13:36:56 slot1_1: State change: starter exited
03/28 13:36:56 slot1_1: Changing activity: Busy -> Idle
03/28 13:36:56 slot1_1: State change: received RELEASE_CLAIM command
03/28 13:36:56 slot1_1: Changing state and activity: Claimed/Idle -> 
Preempting/Vacating
03/28 13:36:56 slot1_1: State change: No preempting claim, returning to owner
03/28 13:36:56 slot1_1: Changing state and activity: Preempting/Vacating -> 
Owner/Idle

condor_config_val -dump gpu016.atlas.local | grep '^START ='
START = ( Target.Owner =?= "testuser" )

Does this prevent it going back to unclaimed?

> You can identify the slot types with PartitionableSlot, DynamicSlot
> attributes.

Looks fine to me:

condor_status -direct gpu016 -l|grep -E '(PartitionableSlot|DynamicSlot|Name)'
Name = "slot1@xxxxxxxxxxxxxxxxxx"
PartitionableSlot = TRUE
Name = "slot1_1@xxxxxxxxxxxxxxxxxx"
DynamicSlot = TRUE
Name = "slot1_2@xxxxxxxxxxxxxxxxxx"
DynamicSlot = TRUE
Name = "slot1_3@xxxxxxxxxxxxxxxxxx"
DynamicSlot = TRUE

Still puzzled

Cheers

Carsten