[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor 6.8.2 + RHEL 4 - jobs stay idle, never run



I've found a box that does have better-analyze available:

( target.NikolaHost == "noddy" ) &&
( ( MY.RESOURCE_GROUP == TARGET.JOB_GROUP ) ) && ( target.Arch == "INTEL" ) &&
( target.OpSys == "LINUX" ) && ( target.Disk >= DiskUsage ) &&
( ( target.Memory * 1024 ) >= ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( ( MY.RESOURCE_GROUP == TARGET.JOB_GROUP ) )0                   REMOVE
2   ( target.NikolaHost == "noddy" )  1                    
3   ( target.Arch == "INTEL" )        364                  
4   ( target.OpSys == "LINUX" )       377                  
5   ( target.Disk >= 10000 )          385                  
6   ( ( 1024 * target.Memory ) >= 10000 )385                  
7   ( TARGET.FileSystemDomain == "ee.washington.edu" )
                                      385                  


This is exactly the same set up as the (working) 6.6.10 inplementation.
The following four lines are in /etc/condor/condor_config:

	RESOURCE_GROUP = "ssli"
	JOB_GROUP = "ssli"
	SUBMIT_EXPRS = JOB_GROUP
	STARTD_EXPRS = RESOURCE_GROUP


The requirement part of the condor_config is:

	IS_ALLOWED =  ( \
          	MY.RESOURCE_GROUP == TARGET.JOB_GROUP || \
          	MY.RESOURCE_GROUP == TARGET.USER_GROUP || \
          	MY.RESOURCE_GROUP == "ssli" \
	)

	IS_LOCAL =  ( \
          	MY.RESOURCE_GROUP == TARGET.JOB_GROUP || \
          	MY.RESOURCE_GROUP == TARGET.USER_GROUP \
	)

	START = $(UWCS_START) && $(IS_ALLOWED)
	RANK = $(IS_LOCAL)



"ssli" or "vlsi" or "mtml", etc is filled in by the script that installs
the condor_config on the host.

When I remove the NikolaHost requirement this particular box actually
sends jobs to the 6.6.10 pool just fine.  Noddy is a 32-bit system
running RHEL 4 with Condor 6.8.2.  The boxes that are not sending jobs
out at all are 64-bit boxes so I can understand why they would
not be sending jobs to the 32-bit 6.6.10 systems.

What I don't understand is why this requirement works in 6.6.10 but not
in 6.8.2.

nomad

>> 6.8.2:
>> 
>>       Requirements = (START) && (IsValidCheckpointPlatform)
>
>IsValidCheckpointPlatform is automatically inserted by the startd, but 
>it should evaluate to true for any vanilla job.  What does condor_q 
>-better-analyze say?
>
>-Greg