[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Some jobs from same cluster won't run



Good morning,

I recently enabled preemption and am seeing jobs remain idle in queue after being preempted while some jobs from the same set of jobs have already completed. I noticed that this happened to one user's jobs in particular after they were preempted by another user's jobs. They just stayed idle while there were nodes available to run them. Running 'condor_q -better-analyze' gives me the following.

The Requirements _expression_ for your job is:

( ( Memory > 512 ) ) && ( TARGET.Arch == "X86_64" ) &&
( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) &&
( TARGET.Memory >= RequestMemory ) && ( ( TARGET.HasFileTransfer ) ||
  ( TARGET.FileSystemDomain == MY.FileSystemDomain ) )

Your job defines the following attributes:

FileSystemDomain = "subdomain.domain.blah"
DiskUsage = 1
ImageSize = 1750000
MemoryUsage = 1221
RequestDisk = 1
RequestMemory = 1221
ResidentSetSize = 1250000

The Requirements _expression_ for your job reduces to these conditions:

     Slots
Step    Matched  Condition
-----  --------  ---------
[0]          96  Memory > 512
[1]          96  TARGET.Arch == "X86_64"
[3]          96  TARGET.OpSys == "LINUX"
[5]          96  TARGET.Disk >= RequestDisk
[7]           0  TARGET.Memory >= RequestMemory
[9]          96  TARGET.HasFileTransfer

Suggestions:

Condition                         Machines Matched    Suggestion
---------                         ----------------    ----------
1   ( ( Memory > 512 ) )              0                   REMOVE
2   ( TARGET.Memory >= 1221 )         0                   MODIFY TO 977
3   ( TARGET.Arch == "X86_64" )       96                   
4   ( TARGET.OpSys == "LINUX" )       96                   
5   ( TARGET.Disk >= 1 )              96                   
6   ( ( TARGET.HasFileTransfer ) || ( TARGET.FileSystemDomain == "subdomain.domain.blah" ) )
                                  96

I found this pageÂand verified that there is a memory requirement in the submit file. It is 'Requirements = (Memory > 512)'. I do not know how to keep HTCondor from adding this memory requirement to the job. Does anyone have suggestions? I can provide my condor_config file if needed. I left it pretty close to the default that ships with HTCondor.

Thanks,

Matt