[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job is stubborn to remain idle



Hi,
here is the analysis you requested:

$ condor_q -better-analyze 63.0 -reverse -machine ep1ext.sel


-- Schedd: t450.sel : <10.10.0.47:9618?...

-- Slot: slot1@xxxxxxxxxx : Analyzing matches for 1 Jobs in 1 autoclusters

The Requirements _expression_ for this slot is

    START &&
    (WithinResourceLimits)

  START is
    true

  WithinResourceLimits is
    (MY.Cpus > 0 &&
      TARGET.RequestCpus <= MY.Cpus && MY.Memory > 0 &&
      TARGET.RequestMemory <= MY.Memory && MY.Disk > 0 &&
      TARGET.RequestDisk <= MY.Disk)

This slot defines the following attributes:

    Cpus = 12
    Disk = 11642268
    Memory = 32130

Job 63.0 has the following attributes:

    TARGET.RequestCpus = 2
    TARGET.RequestDisk = 4250000
    TARGET.RequestMemory = 4096

The Requirements _expression_ for this slot reduces to these conditions:

       Clusters
Step    Matched  Condition
-----  --------  ---------
[1]           1  WithinResourceLimits

slot1@xxxxxxxxxx: Run analysis summary of 1 jobs.
    1 (100.00 %) match both slot and job requirements.
    1 match the requirements of this slot.
    1 have job requirements that match this slot.


$ condor_status
Name             OpSys      Arch   State     Activity LoadAv Mem    ActvtyTime

slot1@xxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 32130  0+01:39:45

               Total Owner Claimed Unclaimed Matched Preempting  Drain Backfill BkIdle

  X86_64/LINUX     1     0       0         1       0          0      0        0      0

         Total     1     0       0         1       0          0      0        0      0



On Tue, 2023-12-05 at 08:37 -0600, Tim Theisen via HTCondor-users wrote:

Hello VB,

Perhaps, you could try the reverse analyze to see if there is something preventing the job from starting.

    condor_q -better-analyze 62.0 -reverse -machine ep1ext.sel

Also, what does "condor_status" produce?

...Tim

On 12/5/23 07:39, Valerio Bellizzomi wrote:
Hi Tim,
finally I got the slot1 to match my job, but for some unknown reason the job still remains in idle state:

$ condor_q -better-analyze


-- Schedd: t450.sel : <10.10.0.47:9618?...
The Requirements _expression_ for job 62.000 is

    ((Machine == "ep1ext.sel")) && (TARGET.Arch == "X86_64") && (TARGET.HasVM is true) && (TARGET.VM_Type == MY.JobVMType) && (TARGET.VM_AvailNum > 0) &&
    (TARGET.Disk >= RequestDisk) && (TARGET.TotalMemory >= MY.JobVMMemory) && (TARGET.VM_Memory >= MY.JobVMMemory) && (TARGET.Cpus >= RequestCpus) && (TARGET.HasFileTransfer)

Job 62.000 defines the following attributes:

    DiskUsage = 4250000
    JobVMMemory = 4096
    JobVMType = "kvm"
    RequestCpus = 2
    RequestDisk = DiskUsage

slot1@xxxxxxxxxx has the following attributes:

    TARGET.Arch = "X86_64"
    TARGET.Cpus = 12
    TARGET.Disk = 11642268
    TARGET.HasFileTransfer = true
    TARGET.HasVM = true
    TARGET.Machine = "ep1ext.sel"
    TARGET.TotalMemory = 32130
    TARGET.VM_AvailNum = 4
    TARGET.VM_Memory = 30000
    TARGET.VM_Type = "kvm"

The Requirements _expression_ for job 62.000 reduces to these conditions:

         Slots
Step    Matched  Condition
-----  --------  ---------
[0]           1  Machine == "ep1ext.sel"
[1]           1  TARGET.Arch == "X86_64"
[3]           1  TARGET.HasVM is true
[5]           1  TARGET.VM_Type == MY.JobVMType
[7]           1  TARGET.VM_AvailNum > 0
[9]           1  TARGET.Disk >= RequestDisk
[11]          1  TARGET.TotalMemory >= MY.JobVMMemory
[13]          1  TARGET.VM_Memory >= MY.JobVMMemory
[15]          1  TARGET.Cpus >= RequestCpus
[17]          1  TARGET.HasFileTransfer


062.000:  Run analysis summary ignoring user priority.  Of 1 machines,
      0 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match and are already running your jobs
      0 match but are serving other users
      1 are able to run your job

$ condor_q


-- Schedd: t450.sel : <10.10.0.47:9618?... @ 12/05/23 14:28:16
OWNER    BATCH_NAME    SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
sel      ID: 62      12/5  14:27      _      _      1      1 62.0

Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
Total for sel: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
Total for all users: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended


-- 
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to 
htcondor-users-request@xxxxxxxxxxx
 with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users


The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/