[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job is stubborn to remain idle



Hello VB,

Perhaps, you could try the reverse analyze to see if there is something preventing the job from starting.

    condor_q -better-analyze 62.0 -reverse -machine ep1ext.sel

Also, what does "condor_status" produce?

...Tim

On 12/5/23 07:39, Valerio Bellizzomi wrote:
Hi Tim,
finally I got the slot1 to match my job, but for some unknown reason the job still remains in idle state:

$ condor_q -better-analyze


-- Schedd: t450.sel : <10.10.0.47:9618?...
The Requirements _expression_ for job 62.000 is

    ((Machine == "ep1ext.sel")) && (TARGET.Arch == "X86_64") && (TARGET.HasVM is true) && (TARGET.VM_Type == MY.JobVMType) && (TARGET.VM_AvailNum > 0) &&
    (TARGET.Disk >= RequestDisk) && (TARGET.TotalMemory >= MY.JobVMMemory) && (TARGET.VM_Memory >= MY.JobVMMemory) && (TARGET.Cpus >= RequestCpus) && (TARGET.HasFileTransfer)

Job 62.000 defines the following attributes:

    DiskUsage = 4250000
    JobVMMemory = 4096
    JobVMType = "kvm"
    RequestCpus = 2
    RequestDisk = DiskUsage

slot1@xxxxxxxxxx has the following attributes:

    TARGET.Arch = "X86_64"
    TARGET.Cpus = 12
    TARGET.Disk = 11642268
    TARGET.HasFileTransfer = true
    TARGET.HasVM = true
    TARGET.Machine = "ep1ext.sel"
    TARGET.TotalMemory = 32130
    TARGET.VM_AvailNum = 4
    TARGET.VM_Memory = 30000
    TARGET.VM_Type = "kvm"

The Requirements _expression_ for job 62.000 reduces to these conditions:

         Slots
Step    Matched  Condition
-----  --------  ---------
[0]           1  Machine == "ep1ext.sel"
[1]           1  TARGET.Arch == "X86_64"
[3]           1  TARGET.HasVM is true
[5]           1  TARGET.VM_Type == MY.JobVMType
[7]           1  TARGET.VM_AvailNum > 0
[9]           1  TARGET.Disk >= RequestDisk
[11]          1  TARGET.TotalMemory >= MY.JobVMMemory
[13]          1  TARGET.VM_Memory >= MY.JobVMMemory
[15]          1  TARGET.Cpus >= RequestCpus
[17]          1  TARGET.HasFileTransfer


062.000:  Run analysis summary ignoring user priority.  Of 1 machines,
      0 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match and are already running your jobs
      0 match but are serving other users
      1 are able to run your job

$ condor_q


-- Schedd: t450.sel : <10.10.0.47:9618?... @ 12/05/23 14:28:16
OWNER    BATCH_NAME    SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
sel      ID: 62      12/5  14:27      _      _      1      1 62.0

Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
Total for sel: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended


-- 
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736