[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Effective Priority lower but job stays idle



Hello,

 

I have these (among others):

 

[1]

$ condor_userprio -priority

Last Priority Update:  7/21 12:39

                                               Effective     Real   Priority

User Name                                       Priority   Priority  Factor

--------------------------------------------- ------------ -------- ---------

group_atlas.mcore.atlpilot001@xxxxxxxxxxxxxxx   1065005.62    10.65 100000.00

group_cms.mcore.cmspilot003@xxxxxxxxxxxxxxx     3259168.25    32.59 100000.00

group_alice.sgmalice@xxxxxxxxxxxxxxx          657536192.00  6575.36 100000.00

--------------------------------------------- ------------ -------- ---------

 

[2]

$ condor_userprio -usage

Group                                   Res   Total Usage       Usage             Last

  User Name                            In Use (wghted-hrs)    Start Time       Usage Time

-------------------------------------- ------ ------------ ---------------- ----------------

group_cms                                  40   5201044.50  4/17/2020 10:55  7/21/2021 12:46

  mcore.cmspilot003@xxxxxxxxxxxxxxx        40   4117231.75  9/17/2020 14:12  7/21/2021 12:46

group_atlas                              1685  20845206.00  4/17/2020 10:55  7/21/2021 12:46

  sgmatlas@xxxxxxxxxxxxxxx                  1       712.06  4/17/2020 10:55  7/21/2021 12:39

  mcore.atlpilot001@xxxxxxxxxxxxxxx         8    406800.62  9/17/2020 19:30  7/21/2021 12:46

  atlpilot001@xxxxxxxxxxxxxxx             260   2988551.00  6/30/2020 09:56  7/21/2021 12:46

  mcore.prdatl008@xxxxxxxxxxxxxxx         701   3553249.75  9/17/2020 14:50  7/21/2021 12:46

  prdatl008@xxxxxxxxxxxxxxx               716  11126445.00  6/30/2020 09:35  7/21/2021 12:46

group_alice                              6762  45051064.00  4/17/2020 10:55  7/21/2021 12:46

Number of users: 11                      8492  67240840.00                   7/20/2021 12:46

 

[3]

$ condor_userprio –quotas

Group                                  Effective  Config     Use    Subtree  Requested

Name                                     Quota     Quota   Surplus   Quota   Resources

-------------------------------------- --------- --------- ------- --------- ----------

group_alice                              1552.07      0.18 Regroup   1552.07       6796

group_atlas                              3657.77      0.42 Regroup   3657.77       3266

group_cms                                2220.79      0.32 Regroup   2741.38       1640

 

from [2] I get:

alice=>  45051064/67240840=0.67

cms=>   5201044/67240840=0.08

atlas=>20845206/67240840=0.31

 

 

My problem is that from [1] mcore.atlas is first served, then cms, them alice.

BUT

1)      alice uses only single core, and is always getting slots to run, even though its quota is much much over quota (0.67  instead of 0.18)

2)      mcore atlas always gets in, and core.cms NEVER ( I had to reserve one workernode to let them have job running)

analyse says there are slots suitable but busy, ( and I’ve seen some lines in NegotiatorLog saying it is over quota, which is not the case, but I cn’t find those lines anymore)

 

Anyone might know

1)      how to use defrag to leave space for 8cores ?

2)      how come cms never enters … ?

 

 

Thanks for any help

SF.

 

 

[ANALYSE]

]#  condor_q -better-analyse 5914078.0

 

 

-- Schedd: node16.datagrid.cea.fr : <192.54.206.43:28348>

The Requirements _expression_ for job 5914078.000 is

 

    ((NumJobStarts == 0) && ((IfThenElse(RequestCpus isnt undefined,(RequestCpus == 8 || RequestCpus == 1),true)))) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") &&

    (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.Cpus >= RequestCpus) && (TARGET.HasFileTransfer)

 

Job 5914078.000 defines the following attributes:

 

    DiskUsage = 150

    NumJobStarts = 0

    RequestCpus = 8

    RequestDisk = DiskUsage

    RequestMemory = 24000

 

The Requirements _expression_ for job 5914078.000 reduces to these conditions:

 

         Slots

Step    Matched  Condition

-----  --------  ---------

[3]        8030  TARGET.Arch == "X86_64"

[5]        8030  TARGET.OpSys == "LINUX"

[7]        8030  TARGET.Disk >= RequestDisk

[9]         289  TARGET.Memory >= RequestMemory

[11]         93  TARGET.Cpus >= RequestCpus

 

No successful match recorded.

Last failed match: Wed Jul 21 13:04:22 2021

 

Reason for last match failure: no match found

 

5914078.000:  Run analysis summary ignoring user priority.  Of 212 machines,

      1 are rejected by your job's requirements

     41 reject your job because of their own requirements

      0 match and are already running your jobs

      0 match but are serving other users

    170 are able to run your job

 

 

 

 

---------------------

        Sophie Ferry |

          CEA Saclay |

91190 Gif-sur-Yvette |

  DRF/IRFU/DEDIP/LIS |

           GRIF-IRFU |

       Bat 141 p023B |

+33(0)1 69 08 76 45 |

---------------------