[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] dynamic slots



I solved this particular issue - I had

'Requirements':  '(Memory > 10000)'

When I changed it to

'request_memory': '10000'

This issue was solved. But then I ended up not using dynamic slots as
they are not doing what I need.

My need is to have condor hold jobs if there is not some amount of
memory available and submit them when memory does become available. I
have not figured out how to do that. I have another thread on the ML
about that (https://www-auth.cs.wisc.edu/lists/htcondor-users/2018-February/msg00102.shtml)
, but it has not received any replies.

On Tue, Feb 27, 2018 at 9:36 AM, John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
> The condor_q -analyze output below shows that the job matches the slot, but it also shows 0 machines for all of the counters in the last clause, and
>
> No successful match recorded.
> Last failed match: Fri Feb 23 14:38:52 2018
>
> That probably indicates that the slot doesn't match the job for some reason.  try running
>
> condor_q -better:reverse 38720 -machine slot1@chopin
>
> -tj
>
> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Larry Martell
> Sent: Friday, February 23, 2018 1:47 PM
> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: [HTCondor-users] dynamic slots
>
> I am trying to use dynamic slots as documented here:
>
> http://research.cs.wisc.edu/htcondor/CondorWeek2012/presentations/thain-dynamic-slots.pdf
>
> I have configured 1 slot thusly:
>
> NUM_SLOTS = 1
> NUM_SLOTS_TYPE_1 = 1
> SLOT_TYPE_1 = cpus=75%
> SLOT_TYPE_1 = mem=64000
> SLOT_TYPE_1_PARTITIONABLE = true
>
> I submit a job that requires 10G of memory and it does not run:
>
> $ condor_q -better-analyze 38720
>
>
> -- Schedd: bach.elucid.local : <192.168.10.2:9618?...
> The Requirements expression for job 38720.000 is
>
>     ( ( Memory >= 10000 ) ) && ( TARGET.Arch == "X86_64" ) &&
>     ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) &&
>     ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer )
>
> Job 38720.000 defines the following attributes:
>
>     DiskUsage = 0
>     ImageSize = 0
>     RequestDisk = DiskUsage
>     RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(
> ImageSize + 1023 ) / 1024)
>
> slot1@chopin has the following attributes:
>
>     TARGET.Memory = 64000
>     TARGET.Arch = "X86_64"
>     TARGET.Disk = 90191948
>     TARGET.HasFileTransfer = true
>     TARGET.OpSys = "LINUX"
>
> The Requirements expression for job 38720.000 reduces to these conditions:
>
>          Slots
> Step    Matched  Condition
> -----  --------  ---------
> [0]           1  Memory >= 10000
> [1]           1  TARGET.Arch == "X86_64"
> [3]           1  TARGET.OpSys == "LINUX"
> [5]           1  TARGET.Disk >= RequestDisk
> [7]           1  TARGET.Memory >= RequestMemory
> [9]           1  TARGET.HasFileTransfer
>
> No successful match recorded.
> Last failed match: Fri Feb 23 14:38:52 2018
>
> Reason for last match failure: no match found
>
> 38720.000:  Run analysis summary ignoring user priority.  Of 1 machines,
>       0 are rejected by your job's requirements
>       0 reject your job because of their own requirements
>       0 match and are already running your jobs
>       0 match but are serving other users
>       0 are available to run your job
>
> Can anyone tell me why it's not running?