[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] another Windows challenge



Hello

Iâm having trouble in that some machines in our Windows pool are not accepting jobs to run even though they should match. This is also exhibiting some strange results on the schedd condor_q -better-analyze output.

Thanks for any leads,
Mike

See below for the output and strange counts.

For Job 7.012, which is not running, we see 85 slots matched, but only 68 of those are taken (but 0 are listed as available to run).

<snip>
The Requirements _expression_ for job 7.012 is

    ( ( ( Target.OpSys == "WINDOWS" ) && ( Target.Arch == "X86_64" ) &&
        ( Machine != "blahblah.edu" ) ) ) &&
    ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&
    ( TARGET.HasFileTransfer )

Job 7.012 defines the following attributes:

    DiskUsage = 35000
    ImageSize = 1
    RequestDisk = DiskUsage
    RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)

The Requirements _expression_ for job 7.012 reduces to these conditions:

         Slots
Step    Matched  Condition
-----  --------  ---------
[0]          93  Target.OpSys == "WINDOWS"
[1]          93  Target.Arch == "X86_64"
[3]          85  Machine != âblahblah.edu"

No successful match recorded.
Last failed match: Wed May 31 16:31:31 2017

Reason for last match failure: no match found 

007.012:  Run analysis summary ignoring user priority.  Of 93 machines,
      8 are rejected by your job's requirements 
      0 reject your job because of their own requirements 
     68 match and are already running your jobs 
      0 match but are serving other users 
      0 are available to run your job
</snip>

Now, for Job 7.000, which is running, the counts are the same (and not adding up to 85 slots that should take jobs)

<snip>
The Requirements _expression_ for job 7.000 is

    ( ( ( Target.OpSys == "WINDOWS" ) && ( Target.Arch == "X86_64" ) &&
        ( Machine != "blahblah.edu" ) ) ) &&
    ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&
    ( TARGET.HasFileTransfer )

Job 7.000 defines the following attributes:

    DiskUsage = 275000
    ImageSize = 250000
    MemoryUsage = ( ( ResidentSetSize + 1023 ) / 1024 )
    RequestDisk = DiskUsage
    RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)
    ResidentSetSize = 250000

The Requirements _expression_ for job 7.000 reduces to these conditions:

         Slots
Step    Matched  Condition
-----  --------  ---------
[0]          93  Target.OpSys == "WINDOWS"
[1]          93  Target.Arch == "X86_64"
[3]          85  Machine != âblahblah.edu"


007.000:  Job is running.

Last successful match: Wed May 31 15:43:31 2017

007.000:  Run analysis summary ignoring user priority.  Of 93 machines,
      8 are rejected by your job's requirements 
      0 reject your job because of their own requirements 
     68 match and are already running your jobs 
      0 match but are serving other users 
      0 are available to run your job

</snip> 

But for job 7.018, it shows (more appropriately) that there are 17 available. 

<snip>
The Requirements _expression_ for job 7.018 is

    ( ( ( Target.OpSys == "WINDOWS" ) && ( Target.Arch == "X86_64" ) &&
        ( Machine != "blahblah.edu" ) ) ) &&
    ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&
    ( TARGET.HasFileTransfer )

Job 7.018 defines the following attributes:

    DiskUsage = 425000
    ImageSize = 250000
    MemoryUsage = ( ( ResidentSetSize + 1023 ) / 1024 )
    RequestDisk = DiskUsage
    RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)
    ResidentSetSize = 250000

The Requirements _expression_ for job 7.018 reduces to these conditions:

         Slots
Step    Matched  Condition
-----  --------  ---------
[0]          93  Target.OpSys == "WINDOWS"
[1]          93  Target.Arch == "X86_64"
[3]          85  Machine != âblahblah.edu"


007.018:  Job is running.

Last successful match: Wed May 31 15:43:33 2017

007.018:  Run analysis summary ignoring user priority.  Of 93 machines,
      8 are rejected by your job's requirements 
      0 reject your job because of their own requirements 
     68 match and are already running your jobs 
      0 match but are serving other users 
     17 are available to run your job
</snip>