[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] slot1: Job Requirements check failed!



On 12/10/2010 02:23 AM, Carsten Aulbert wrote:
Hi

this is a follow-up to my last email, but addressing a completely different
issue.

I'm using a job submit file as follows:
Executable     = sleep.gpu.sh
Arguments      = 10 $$(GPU_DEV) $$(GPU_NAME) $$(GPU_CAPABILITY) \
$$(GPU_GLOBALMEM_MB) $$(GPU_MULTIPROC) $$(GPU_NUMCORES) $$(GPU_CLOCK_GHZ) \
$$(GPU_CUDA_DRV) $$(GPU_CUDA_RUN)
Error   = logs/err.$(Process)
Output  = logs/out.$(Process)
Log = /local/user/carsten/foo.log
Requirements = GPU_CAPABILITY>= 1.9
+WantGPU=True
Universe = vanilla
Queue 1

where sleep.gpu.sh is only printing out the arguments and sleeping for $1
seconds.

With "Requirements = GPU_CAPABILITY>= 2.0" I'm trying to steer it to a
machine which has this one set. It kind of works, the match is made, but when
the startd wants to start the job, it just says "slot1: Job Requirements check
failed!" and goes back to idle (full debug startLog attached).

$ gpu010:/var/log/condor# grep -i require /tmp/StartLog
AutoClusterAttrs =
"JobUniverse,LastCheckpointPlatform,NumCkpts,Scheduler,Owner,NeedGpu,WantGPU,DiskUsage,ImageSize,RequestMemory,FileSystemDomain,Requirements,NiceUser,ConcurrencyLimits"
Requirements = (GPU_CAPABILITY>= 2.000000)&&  (Arch == "X86_64")&&  (OpSys ==
"LINUX")&&  (Disk>= DiskUsage)&&  ((Memory * 1024)>= ImageSize)&&
((RequestMemory * 1024)>= ImageSize)&&  (TARGET.FileSystemDomain ==
MY.FileSystemDomain)
Requirements = (START)&&  (IsValidCheckpointPlatform)
AutoClusterAttrs =
"JobUniverse,LastCheckpointPlatform,NumCkpts,Scheduler,Owner,NeedGpu,WantGPU,DiskUsage,ImageSize,RequestMemory,FileSystemDomain,Requirements,NiceUser,ConcurrencyLimits"
Requirements = (GPU_CAPABILITY>= 2.000000)&&  (Arch == "X86_64")&&  (OpSys ==
"LINUX")&&  (Disk>= DiskUsage)&&  ((Memory * 1024)>= ImageSize)&&
((RequestMemory * 1024)>= ImageSize)&&  (TARGET.FileSystemDomain ==
MY.FileSystemDomain)
Requirements = (START)&&  (IsValidCheckpointPlatform)
12/10 10:43:42 slot1: Job Requirements check failed!


I'm not quite sure what is causing Condor to not start the job, at first I
thought it might be the floating-point comparison, but even with

Requirements = GPU_CAPABILITY>= 1.9

it matches, but does not start.

Any ideas?

Cheers

Carsten


You might try wrapping your Requirements in a debug(),

$ condor_qedit 1702098.0 Requirements 'debug((GPU_CAPABILITY >= 2.000000) && (Arch == "X86_64") && (OpSys == "LINUX") && (Disk >= DiskUsage) && ((Memory * 1024) >= ImageSize) && ((RequestMemory * 1024) >= ImageSize) && (TARGET.FileSystemDomain == MY.FileSystemDomain)
)'

Don't do it too much though, you'll spam the Sched&Negotiator&StartLogs

Best,


matt