[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] another Windows challenge



Do you use partitionable slots? 

 

From: Fienen, Michael N. [mailto:mike@xxxxxxxxxxx]
Sent: Wednesday, May 31, 2017 5:02 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: John M Knoeller <johnkn@xxxxxxxxxxx>
Subject: another Windows challenge

 

Hello

 

Iâm having trouble in that some machines in our Windows pool are not accepting jobs to run even though they should match. This is also exhibiting some strange results on the schedd condor_q -better-analyze output.

 

Thanks for any leads,

Mike

 

See below for the output and strange counts.

 

For Job 7.012, which is not running, we see 85 slots matched, but only 68 of those are taken (but 0 are listed as available to run).

 

<snip>

The Requirements _expression_ for job 7.012 is

 

    ( ( ( Target.OpSys == "WINDOWS" ) && ( Target.Arch == "X86_64" ) &&

        ( Machine != "blahblah.edu" ) ) ) &&

    ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&

    ( TARGET.HasFileTransfer )

 

Job 7.012 defines the following attributes:

 

    DiskUsage = 35000

    ImageSize = 1

    RequestDisk = DiskUsage

    RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)

 

The Requirements _expression_ for job 7.012 reduces to these conditions:

 

         Slots

Step    Matched  Condition

-----  --------  ---------

[0]          93  Target.OpSys == "WINDOWS"

[1]          93  Target.Arch == "X86_64"

[3]          85  Machine != âblahblah.edu"

 

No successful match recorded.

Last failed match: Wed May 31 16:31:31 2017

 

Reason for last match failure: no match found 

 

007.012:  Run analysis summary ignoring user priority.  Of 93 machines,

      8 are rejected by your job's requirements 

      0 reject your job because of their own requirements 

     68 match and are already running your jobs 

      0 match but are serving other users 

      0 are available to run your job

</snip>

 

Now, for Job 7.000, which is running, the counts are the same (and not adding up to 85 slots that should take jobs)

 

<snip>

The Requirements _expression_ for job 7.000 is

 

    ( ( ( Target.OpSys == "WINDOWS" ) && ( Target.Arch == "X86_64" ) &&

        ( Machine != "blahblah.edu" ) ) ) &&

    ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&

    ( TARGET.HasFileTransfer )

 

Job 7.000 defines the following attributes:

 

    DiskUsage = 275000

    ImageSize = 250000

    MemoryUsage = ( ( ResidentSetSize + 1023 ) / 1024 )

    RequestDisk = DiskUsage

    RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)

    ResidentSetSize = 250000

 

The Requirements _expression_ for job 7.000 reduces to these conditions:

 

         Slots

Step    Matched  Condition

-----  --------  ---------

[0]          93  Target.OpSys == "WINDOWS"

[1]          93  Target.Arch == "X86_64"

[3]          85  Machine != âblahblah.edu"

 

 

007.000:  Job is running.

 

Last successful match: Wed May 31 15:43:31 2017

 

007.000:  Run analysis summary ignoring user priority.  Of 93 machines,

      8 are rejected by your job's requirements 

      0 reject your job because of their own requirements 

     68 match and are already running your jobs 

      0 match but are serving other users 

      0 are available to run your job

 

</snip> 

 

But for job 7.018, it shows (more appropriately) that there are 17 available. 

 

<snip>

The Requirements _expression_ for job 7.018 is

 

    ( ( ( Target.OpSys == "WINDOWS" ) && ( Target.Arch == "X86_64" ) &&

        ( Machine != "blahblah.edu" ) ) ) &&

    ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&

    ( TARGET.HasFileTransfer )

 

Job 7.018 defines the following attributes:

 

    DiskUsage = 425000

    ImageSize = 250000

    MemoryUsage = ( ( ResidentSetSize + 1023 ) / 1024 )

    RequestDisk = DiskUsage

    RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)

    ResidentSetSize = 250000

 

The Requirements _expression_ for job 7.018 reduces to these conditions:

 

         Slots

Step    Matched  Condition

-----  --------  ---------

[0]          93  Target.OpSys == "WINDOWS"

[1]          93  Target.Arch == "X86_64"

[3]          85  Machine != âblahblah.edu"

 

 

007.018:  Job is running.

 

Last successful match: Wed May 31 15:43:33 2017

 

007.018:  Run analysis summary ignoring user priority.  Of 93 machines,

      8 are rejected by your job's requirements 

      0 reject your job because of their own requirements 

     68 match and are already running your jobs 

      0 match but are serving other users 

     17 are available to run your job

</snip>