[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Rank expressions not evaluated



Hello,

We are currently playing with the Rank expression.

We have nodes with GPUs and nodes without GPUs. A user should put into
the submit file the ClassAd '+WantGPU = true' if the job is supposed to
run on GPUs. We also want to allow standard universe jobs on the GPUs
node to avoid idle time. But the standard universe jobs should be
replaced if a user submits a 'GPU' job.

The detailed scheme should look like: 

A standard universe job is running on a 'GPU'-node. It does not come
with the '+WantGPU = true' job ClassAd.

Another user submits a vanilla job with the job ClassAd '+WantGPU = true'.

The currently running standard universe shall be preempted and the slot
should be assigned to the job with '+WantGPU = true'. 

On the execution node we evaluate the Rank expression:
'RANK = target.WantGPU =?= true'

If we start jobs we find in the StartLog on the execution node that the
Rank of jobs assigned to slots is evaluated correctly, either 1.0 or
0.0.


Here is what we also observed but which seems odd: 

* condor_q -l always shows Rank=0.0.
* The job with the presumably higher Rank is not replacing the currently
running job, although preempting is enabled and works for other scenarios.

For us it seems that the Rank expression is evaluated if the job
starts. A new incoming job gets never its Rank expression evaluated
if it has not been started and, hence, can never replace the currently
running job. 

Any thoughts are welcome.


Thank you and cheers
Henning