[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] partitionable slots are not returned?



Hello all:

We are having the same problem.

In your case (very much like ours), the problem is that the =?= operator evaluates to FALSE if the operand is not defined [1]. As there is no longer any job in the slot, it follows that Target.Owner =?= whatever will always evaluate to false, thus causing the slot to get "stuck" in the Owner state forever [2].

What we have done is to use any expression that evaluates to TRUE when there are no jobs executing (for instance, Target.Owner =?= UNDEFINED). Thus your requirement should be:

START = (Target.Owner =?= UNDEFINED) || ( Target.Owner =?= "testuser" )

Hope this works, and would be useful to have documented.

Cheers,

Joan

[1] http://www.cs.wisc.edu/condor/manual/v7.4/4_1Condor_s_ClassAd.html#SECTION00512300000000000000 [2] http://www.cs.wisc.edu/condor/manual/v7.4/3_5Policy_Configuration.html#SECTION00455000000000000000

El 29/03/11 13:28, Matthew Farrellee escribió:
On 03/29/2011 07:18 AM, Carsten Aulbert wrote:
Hi all

On Monday 28 March 2011 14:51:03 Carsten Aulbert wrote:
On Monday 28 March 2011 14:10:14 Matthew Farrellee wrote:
I would guess that your START is preventing the slot from handling any
jobs or going back to the Unclaimed state. When a dynamic slot hits
Unclaimed it gets folded back into the partitionable slot.
condor_config_val -dump gpu016.atlas.local | grep '^START ='
START = ( Target.Owner =?= "testuser" )

Does this prevent it going back to unclaimed?

I fear you are right, putting START=TRUE will "recycle" previoulsy used
dynamic slots.

I wonder if there is a way to both have a more complex START expression as well as having recycling. But for that I would need to understand *why* this
evaluates to false, *after* a job is gone, is there a default owner or
something else in order to make the START expression true again?

Any pointers?

Cheers

Carsten

That is puzzling. I'd suggest wrapping your START with debug(), e.g. START = debug(Target.Owner =?= "testuser"), and have a look at what's being evaluated to cause START == False.

Best,


matt
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



--
--------------------------------------------------------------------------
Joan Josep Piles Contreras -  Analista de sistemas
I3A - Instituto de Investigación en Ingeniería de Aragón
Tel: 976 76 10 00 (ext. 5454)
http://i3a.unizar.es -- jpiles@xxxxxxxxx
--------------------------------------------------------------------------