[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] All new MPI jobs become Idle when one existing MPI job Requirements are not met



Hello,

I am facing the following problem. When one MPI job is Idle because it's Requirements are not met, all the subsequent MPI jobs will remain Idle even though these Idle jobs Requirements can be met. Running 'condor_q -better-analyze 1871.0' on a such Idle job 1871.0 will display :

...
The Requirements _expression_ for job 1871.000 reduces to these conditions:
ÂÂÂÂÂÂÂÂ Slots
Step Matched Condition
-----Â --------Â ---------
[0]ÂÂÂÂÂÂÂÂÂÂ 8Â TARGET.SupportedMPIQueue is "ipno"
[9]ÂÂÂÂÂÂÂÂÂ 23Â TARGET.HasFileTransfer

1871.000:Â Job has not yet been considered by the matchmaker.

1871.000: Run analysis summary ignoring user priority. Of 23 machines,
ÂÂÂÂ 14 are rejected by your job's requirements
ÂÂÂÂÂ 0 reject your job because of their own requirements
ÂÂÂÂÂ 1 are exhausted partitionable slots
ÂÂÂÂÂ 0 match and are already running your jobs
ÂÂÂÂÂ 0 match but are serving other users
ÂÂÂÂÂ 8 are available to run your job


What I am trying to do with a test pool is the following:

Â1) I have 1 Sched, 1 CM, 5 worker nodes (WN)
Â2) I have a variable SupportedMPIQueue added to STARTD_ATTRS on 4 WNs :
Â* on 2 WNs (4 + 4cores) I have SupportedMPIQueue="ipno"
ÂÂ # condor_config_val -dump |grep SupportedMPIQueue
ÂÂÂÂÂ STARTD_ATTRS = SupportedMPIQueue, DedicatedScheduler
ÂÂÂÂÂ SupportedMPIQueue = "ipno"
Â* on two other WN (4 + 8 cores) I have SupportedMPIQueue="ipnofast"
 # condor_config_val -dump |grep SupportedMPIQueue
ÂÂ STARTD_ATTRS = SupportedMPIQueue, DedicatedScheduler
ÂÂÂ SupportedMPIQueue = "ipnofast"
Â* on one WNs (2 cores), SupportedMPIQueue is not defined

The aim of this configuration is to be able to select a group of WNs for a MPI jobs. In our existing Torque/MAUI cluster, I have 3 queues: ipno, ipnofast, ipnofast2. Each queue points to a group of WNs having the same CPU speed/generation. I would like to reproduce the same behavior by allowing the selection of a group of WNs in the .sub file with for example:

Requirements = (TARGET.SupportedMPIQueue =?= "ipnofast" )

If I comment in the 'Requirements' ligne in the .sub, I can use all the available MPI slots and MPI jobs will stay in Idle state only when there is anymore enough slots available.

Now, suppose that I use 'Requirements = (TARGET.SupportedMPIQueue =?= "ipnofast" )' and "machine_count = 4".
If I submit 5 jobs, 3 jobs will run and 2 jobs will be Idle. This is normal because we have in total 12 cores on "ipnofast" WNs. Whith 2 jobs Idle, if I submit one job with 'Requirements = (TARGET.SupportedMPIQueue =?= "ipno" )', the job will remain Idle until there is anymore an Idle job waiting for the "ipnofast" WNs. The job requiring "ipno" WNs should has run without waiting because there was 8 cores free.

My conclusion is that once there is an Idle MPI job, all the other submitted MPI job will remain also Idle even though the new submitted jobs requirements cant be met. The new Idle jobs are seen by condor_q -l as "not yet been considered by the matchmaker".

Is this the default behavior ? Is it possible to do something about it ? Any advice ?

A simple test job, is :

universe = parallel
executable = /bin/sleep
arguments = 300
machine_count = 4
Requirements = (TARGET.SupportedMPIQueue =?= "ipno" )
queue


Thanks,

Christophe.
-- 
Christophe DIARRA

Institut de Physique Nucleaire
Service Informatique
15 Rue Georges Clemenceau
F91406 ORSAY Cedex
Tel:    +33 (0)1 69 15 65 60
Mobile: +33 (0)6 31 26 23 69
Fax:    +33 (0)1 69 15 64 70
E-mail: diarra@xxxxxxxxxxxxx