[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Why does my job not match?



Hi Larry,

I think the problem may be in the START _expression_:

Start = (true && (EVJobType =?= Composition))

I think Composition needs to be quoted so that _expression_ references matching the value in EVJobType against the string "Composition":

Start = (true && (EVJobType =?= "Composition"))

Without the quotes, the _expression_ is referencing a (Machine) ClassAd attribute named Composition, which is undefined in the Machine ad provided.

Jason

On Tue, Jun 20, 2023 at 9:27âAM Larry Martell <larry.martell@xxxxxxxxx> wrote:
I am submitting a job that I think should match. I have enough memory,
disk, and cpu and the job type and other requirements seem to be met.
Can anyone tell me why it's not matching?

Output of condor_status --long, condor_q --long, and condor_q
-better-analyze -reverse -machine shown below.

TIA!

condor_status --long:

AcceptedWhileDraining = false
Activity = "Idle"
AddressV1 = "{[ p=\"primary\"; a=\"172.20.11.75\"; port=9618;
n=\"Internet\"; spid=\"2683471_7ba1_3\"; noUDP=true; ], [ p=\"IPv4\";
a=\"172.20.11.75\"; port=9618; n=\"Internet\";
spid=\"2683471_7ba1_3\"; noUDP=true; ]}"
Arch = "X86_64"
AuthenticatedIdentity = "unauthenticated@unmapped"
CanHibernate = true
CheckpointPlatform = "LINUX X86_64 5.15.0-1037-aws normal N/A avx avx2
ssse3 sse4_1 sse4_2"
ChildAccountingGroup = {Â }
ChildActivity = {Â }
ChildCpus = {Â }
ChildCurrentRank = {Â }
ChildDisk = {Â }
ChildEnteredCurrentState = {Â }
ChildGPUs = {Â }
ChildMemory = {Â }
ChildName = {Â }
ChildRemoteOwner = {Â }
ChildRemoteUser = {Â }
ChildRetirementTimeRemaining = {Â }
ChildState = {Â }
ClockDay = 1
ClockMin = 1351
COLLECTOR_HOST_STRING = "xxxx.biz"
CondorLoadAvg = 0.0
CondorPlatform = "$CondorPlatform: X86_64-Ubuntu_20.04 $"
CondorVersion = "$CondorVersion: 8.8.13 Mar 23 2021 BuildID:
Debian-8.8.13-1.1 PackageID: 8.8.13-1.1 Debian-8.8.13-1.1 $"
ConsoleIdle = 3600
CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.5)
CpuBusyTime = 0
CpuCacheSize = 36608
CpuFamily = 6
CpuIsBusy = false
CpuModelNumber = 85
Cpus = 1
CUDACapability = 7.5
CUDAClockMhz = 1590.0
CUDAComputeUnits = 40
CUDACoresPerCU = 64
CUDADeviceName = "Tesla T4"
CUDADevicePciBusId = "0000:00:1E.0"
CUDADeviceUuid = "81034430-ff8b-f682-edc5-86505c21f36c"
CUDADriverVersion = 12.0
CUDAECCEnabled = true
CUDAGlobalMemoryMb = 15102
CurrentRank = 0.0
DaemonCoreDutyCycle = 6.164174810319167E-05
DaemonLastReconfigTime = 1687224682
DaemonStartTime = 1687224682
DetectedCpus = 8
DetectedGPUs = 0
DetectedMemory = 31640
Disk = 2545577
EnteredCurrentActivity = 1687224689
EnteredCurrentState = 1687224689
ExpectedMachineGracefulDrainingBadput = 0
ExpectedMachineGracefulDrainingCompletion = 1687224689
ExpectedMachineQuickDrainingBadput = 0
ExpectedMachineQuickDrainingCompletion = 1687224689
FileSystemDomain = "poc.cloud.elucid.biz"
GPUs = 0
HardwareAddress = "0a:7a:f6:ec:88:b7"
has_avx = true
has_avx2 = true
has_sse4_1 = true
has_sse4_2 = true
has_ssse3 = true
HasEncryptExecuteDirectory = true
HasFileTransfer = true
HasFileTransferPluginMethods = "file,ftp,http,data,https"
HasIOProxy = true
HasJava = true
HasJICLocalConfig = true
HasJICLocalStdin = true
HasJobDeferral = true
HasMPI = true
HasPerFileEncryption = true
HasReconnect = true
HasSelfCheckpointTransfers = true
HasTDP = true
HasTransferInputRemaps = true
HasVM = false
HibernationLevel = 0
HibernationState = "NONE"
HibernationSupportedStates = "S4,S5"
IsLocalStartd = false
IsValidCheckpointPlatform = (TARGET.JobUniverse =!= 1 ||
((MY.CheckpointPlatform =!= undefined) &&
((TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) ||
(TARGET.NumCkpts == 0))))
IsWakeAble = false
IsWakeOnLanEnabled = false
IsWakeOnLanSupported = false
JavaMFlops = 1533.733398
JavaSpecificationVersion = "11"
JavaVendor = "Amazon.com Inc."
JavaVersion = "11.0.19"
JobPreemptions = 0
JobRankPreemptions = 0
JobStarts = 0
JobUserPrioPreemptions = 0
KeyboardIdle = 2504
KFlops = 1587902
LastBenchmark = 1687224718
LastFetchWorkCompleted = 0
LastFetchWorkSpawned = 0
LastHeardFrom = 1687228292
LastUpdate = 1687224718
LoadAvg = 0.0
Machine = "processor.poc.cloud.elucid.biz"
MachineMaxVacateTime = 10 * 60
MachineResources = "Cpus Memory Disk Swap GPUs"
MaxJobRetirementTime = 0
Memory = 7142
Mips = 25819
MonitorSelfAge = 3608
MonitorSelfCPUUsage = 0.01249920167160802
MonitorSelfImageSize = 20556
MonitorSelfRegisteredSocketCount = 0
MonitorSelfResidentSetSize = 14116
MonitorSelfSecuritySessions = 6
MonitorSelfTime = 1687228289
MyAddress = "<172.20.11.75:9618?addrs=172.20.11.75-9618&noUDP&sock=2683471_7ba1_3>"
MyCurrentTime = 1687228292
MyType = "Machine"
Name = "slot3@xxxxxxxx"
NextFetchWorkDelay = -1
NUM_DETECTED_GPUs = 1
NumDynamicSlots = 0
NumPids = 0
OpSys = "LINUX"
OpSysAndVer = "Ubuntu20"
OpSysLegacy = "LINUX"
OpSysLongName = "Ubuntu 20.04.6 LTS"
OpSysMajorVer = 20
OpSysName = "Ubuntu"
OpSysShortName = "Ubuntu"
OpSysVer = 2004
PartitionableSlot = true
PROSERVER = "PROSERVER_PROCESSOR"
PslotRollupInformation = true
Rank = 0
RecentDaemonCoreDutyCycle = 5.879344050940816E-05
RecentJobPreemptions = 0
RecentJobRankPreemptions = 0
RecentJobStarts = 0
RecentJobUserPrioPreemptions = 0
Requirements = (START) && (IsValidCheckpointPlatform) && (WithinResourceLimits)
RetirementTimeRemaining = 0
SlotID = 3
SlotType = "Partitionable"
SlotTypeID = 3
SlotWeight = Cpus
Start = (true && (EVJobType =?= Composition))
StartdIpAddr = "<172.20.11.75:9618?addrs=172.20.11.75-9618&noUDP&sock=2683471_7ba1_3>"
StarterAbilityList =
"HasFileTransferPluginMethods,HasEncryptExecuteDirectory,HasVM,HasJava,HasMPI,HasFileTransfer,HasJobDeferral,HasPerFileEncryption,HasReconnect,HasTDP,HasJICLocalStdin,HasTransferInputRemaps,HasSelfCheckpointTransfers,HasJICLocalConfig"
State = "Unclaimed"
SubnetMask = "255.255.252.0"
TargetType = "Job"
TimeToLive = 2147483647
TotalCondorLoadAvg = 0.0
TotalCpus = 7.0
TotalDisk = 21213140
TotalGPUs = 0
TotalLoadAvg = 0.01
TotalMemory = 28568
TotalSlotCpus = 1
TotalSlotDisk = 2545577.0
TotalSlotGPUs = 0
TotalSlotMemory = 7142
TotalSlots = 3
TotalTimeUnclaimedIdle = 3603
TotalVirtualMemory = 32399928
UidDomain = "poc.cloud.elucid.biz"
Unhibernate = MY.MachineLastMatchTime =!= undefined
UpdateSequenceNumber = 14
UpdatesHistory = "00000000000000000000000000000000"
UpdatesLost = 0
UpdatesSequenced = 4112
UpdatesTotal = 4116
UtsnameMachine = "x86_64"
UtsnameNodename = "processor.poc.cloud.elucid.biz"
UtsnameRelease = "5.15.0-1037-aws"
UtsnameSysname = "Linux"
UtsnameVersion = "#41~20.04.1-Ubuntu SMP Mon May 22 18:18:00 UTC 2023"
VcCompSlot = true
VirtualMemory = 10799976
WakeOnLanEnabledFlags = "NONE"
WakeOnLanSupportedFlags = "NONE"
WithinResourceLimits = (ifThenElse(TARGET._condor_RequestCpus =!=
undefined,MY.Cpus > 0 && TARGET._condor_RequestCpus <=
MY.Cpus,ifThenElse(TARGET.RequestCpus =!= undefined,MY.Cpus > 0 &&
TARGET.RequestCpus <= MY.Cpus,1 <= MY.Cpus)) &&
ifThenElse(TARGET._condor_RequestMemory =!= undefined,MY.Memory > 0 &&
TARGET._condor_RequestMemory <=
MY.Memory,ifThenElse(TARGET.RequestMemory =!= undefined,MY.Memory > 0
&& TARGET.RequestMemory <= MY.Memory,false)) &&
ifThenElse(TARGET._condor_RequestDisk =!= undefined,MY.Disk > 0 &&
TARGET._condor_RequestDisk <= MY.Disk,ifThenElse(TARGET.RequestDisk
=!= undefined,MY.Disk > 0 && TARGET.RequestDisk <= MY.Disk,false)) &&
(TARGET.RequestGPUs =?= undefined || MY.GPUs >=
ifThenElse(TARGET._condor_RequestGPUs =?=
undefined,TARGET.RequestGPUs,TARGET._condor_RequestGPUs)))

condor_q --long

Arguments = "args"
AutoClusterAttrs =
"_condor_RequestCpus,_condor_RequestDisk,_condor_RequestGPUs,_condor_RequestMemory,Composition,EVJobType,JobUniverse,LastCheckpointPlatform,MachineLastMatchTime,NumCkpts,Offline,RemoteOwner,RequestCpus,RequestDisk,RequestGPUs,RequestMemory,TotalJobRuntime,ConcurrencyLimits,FlockTo,Rank,Requirements,KFlops,FileSystemDomain"
AutoClusterId = 1
ClusterId = 327
Cmd = "cmd"
CommittedSlotTime = 0
CommittedSuspensionTime = 0
CommittedTime = 0
CondorPlatform = "$CondorPlatform: X86_64-Ubuntu_20.04 $"
CondorVersion = "$CondorVersion: 10.4.0 2023-04-06 BuildID: 638308
PackageID: 10.4.0-1.1 $"
CoreSize = 0
CumulativeRemoteSysCpu = 0.0
CumulativeRemoteUserCpu = 0.0
CumulativeSlotTime = 0
CumulativeSuspensionTime = 0
CurrentHosts = 0
DiskUsage = 3500
DiskUsage_RAW = 3376
EncryptExecuteDirectory = false
EnteredCurrentStatus = 1687224704
Environment = "EVCFG=/inst/web/zzzz/config.ini"
Err = "/inst/web/logs/vc_dont_delete/ev_327.0.err"
EVJobType = "Composition"
ExecutableSize = 3500
ExecutableSize_RAW = 3376
ExitBySignal = false
ExitStatus = 0
FileSystemDomain = "xxxx.biz"
GlobalJobId = "xxxx.biz#327.0#1687224704"
ImageSize = 3500
ImageSize_RAW = 3376
In = "/dev/null"
Iwd = "some_dir"
JobLeaseDuration = 2400
JobMaxRetries = 0
JobNotification = 0
JobPrio = 100000
JobStatus = 1
JobSubmitMethod = 0
JobUniverse = 5
LastRejMatchReason = "no match found "
LastRejMatchTime = 1687228905
LastSuspensionTime = 0
LeaveJobInQueue = false
MaxHosts = 1
MinHosts = 1
MyType = "Job"
NumCkpts = 0
NumCkpts_RAW = 0
NumJobCompletions = 0
NumJobStarts = 0
NumRestarts = 0
NumSystemHolds = 0
> > JobMaxRetries || ExitCode =?= 0
Out = "/inst/web/logs/vc_dont_delete/ev_327.0.out"
Owner = "prod_user"
PeriodicHold = false
PeriodicRelease = false
PeriodicRemove = false
ProcId = 0
QDate = 1687224704
Rank = 0.0
RemoteSysCpu = 0.0
RemoteUserCpu = 0.0
RemoteWallClockTime = 0.0
RequestCpus = 1
RequestDisk = 2048
RequestMemory = 2048
Requirements = (TARGET.VcCompSlot) && (TARGET.Arch == "X86_64") &&
(TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) &&
(TARGET.Memory >= RequestMemory) && (TARGET.FileSystemDomain ==
MY.FileSystemDomain)
ServerTime = 1687228934
ShouldTransferFiles = "NO"
StreamErr = false
StreamOut = false
TargetType = "Machine"
TotalSubmitProcs = 1
TotalSuspensions = 0
TransferIn = false
TransferInputSizeMB = 3
User = "prod_user@xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
UserLog = "/inst/web/logs/vc_dont_delete/ev_327.0.log"


Here is the output of condor_q -better-analyze -reverse -machine xxxx:

The Requirements _expression_ for this slot is
(START) && (IsValidCheckpointPlatform) &&
    (WithinResourceLimits) START is
  (true &&
   (EVJobType is Composition)) WithinResourceLimits is
  (ifThenElse(TARGET._condor_RequestCpus isnt undefined,MY.Cpus > 0 &&
    TARGET._condor_RequestCpus <=
MY.Cpus,ifThenElse(TARGET.RequestCpus isnt undefined,MY.Cpus > 0 &&
     TARGET.RequestCpus <= MY.Cpus,1 <= MY.Cpus)) &&
   ifThenElse(TARGET._condor_RequestMemory isnt undefined,MY.Memory > 0 &&
    TARGET._condor_RequestMemory <=
MY.Memory,ifThenElse(TARGET.RequestMemory isnt undefined,MY.Memory > 0
&&
     TARGET.RequestMemory <= MY.Memory,false)) &&
   ifThenElse(TARGET._condor_RequestDisk isnt undefined,MY.Disk > 0 &&
    TARGET._condor_RequestDisk <=
MY.Disk,ifThenElse(TARGET.RequestDisk isnt undefined,MY.Disk > 0 &&
     TARGET.RequestDisk <= MY.Disk,false)) &&
   (TARGET.RequestGPUs is undefined ||
    MY.GPUs >= ifThenElse(TARGET._condor_RequestGPUs is
undefined,TARGET.RequestGPUs,TARGET._condor_RequestGPUs)))

This slot defines the following attributes:
CheckpointPlatform = "LINUX X86_64 5.15.0-1037-aws normal N/A avx avx2
ssse3 sse4_1 sse4_2"
  Cpus = 1
  Disk = 2545577
  GPUs = 0
  IsValidCheckpointPlatform = (TARGET.JobUniverse =!= 1 ||
((MY.CheckpointPlatform =!= undefined) &&
((TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) ||
(TARGET.NumCkpts == 0))))
  Memory = 7142
Job 327.0 has the following attributes:Â Â TARGET.EVJobType = "Composition"
  TARGET.JobUniverse = 5
  TARGET.NumCkpts = 0
  TARGET.RequestCpus = 1
  TARGET.RequestDisk = 2048
  TARGET.RequestMemory = 2048
The Requirements _expression_ for this slot reduces to these conditions:
     ÂClusters
Step  Matched Condition
-----Â --------Â ---------
[1]Â Â Â Â Â Â0Â EVJobType is Composition
slot3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: Run analysis summary of 1 jobs.
  0 (0.00 %) match both slot and job requirements.
  0 match the requirements of this slot.
  1 have job requirements that match this slot.

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/