[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] execute node choice and disk parameter



I've set up a small condor test of a head node with two slave execute
nodes.

Unfortunately I can't seem to get anything scheduled on the slaves
as they are not chosen due to TARGET.Disk > 1500 parameter. See log
below.

These execute nodes are booted remotely and use a NFS root,
however they do have access to a shared NFS filesystem, and 
each has a 500G scratch disk.

I have configured the following on both head and the slave

UID_DOMAIN              = ece.ucsb.edu
FILESYSTEM_DOMAIN       = saw.ece.ucsb.edu
COLLECTOR_NAME          = CBI UCSB
USE_NFS                 = True

Is it possible to change how condor calculates available disk space 
on the slave execute nodes?


Thanks,
Kris

--------------------------------------------------------------

$ condor_status

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@b0001        LINUX      X86_64 Unclaimed Idle     0.000   992  0+00:40:04
slot2@b0001        LINUX      X86_64 Unclaimed Idle     0.000   992  0+16:40:28
slot3@b0001        LINUX      X86_64 Unclaimed Idle     0.000   992  0+16:40:29
slot4@b0001        LINUX      X86_64 Unclaimed Idle     0.000   992  0+16:40:30
slot1@b0002        LINUX      X86_64 Unclaimed Idle     0.000   992  0+00:40:04
slot2@b0002        LINUX      X86_64 Unclaimed Idle     0.000   992  0+16:40:28
slot3@b0002        LINUX      X86_64 Unclaimed Idle     0.000   992  0+16:40:29
slot4@b0002        LINUX      X86_64 Unclaimed Idle     0.000   992  0+16:40:30
slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000  1007  0+00:13:08
slot2@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.100  1007  0+00:13:15
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX    10     0       0        10       0          0        0

               Total    10     0       0        10       0          0        0
kgk@claw$ condor_q


-- Submitter: claw.ece.ucsb.edu : <128.111.60.123:41670> : claw.ece.ucsb.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               

0 jobs; 0 idle, 0 running, 0 held

kgk@claw$ condor_submit eattime.cmd 
Submitting job(s)......
Logging submit event(s)......
6 job(s) submitted to cluster 14.
kgk@claw$ condor_q -ana 14.0


-- Submitter: claw.ece.ucsb.edu : <128.111.60.123:41670> : claw.ece.ucsb.edu
---
014.000:  Run analysis summary.  Of 10 machines,
     10 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      0 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 match but are currently offline
      0 are available to run your job

WARNING:  Be advised:
   No resources matched request's constraints

The Requirements expression for your job is:

( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) &&
( TARGET.Disk >= DiskUsage ) && ( ( ( TARGET.Memory * 1024 ) >= ImageSize ) &&
( ( RequestMemory * 1024 ) >= ImageSize ) ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( TARGET.Disk >= 1500 )           2                   MODIFY TO 0
2   ( TARGET.FileSystemDomain == "saw.ece.ucsb.edu" )
                                      8                    
3   ( TARGET.Arch == "X86_64" )       10                   
4   ( TARGET.OpSys == "LINUX" )       10                   
5   ( ( ( 1024 * TARGET.Memory ) >= 1500 ) && ( ( 1024 * ceiling(ifThenElse(JobVMMemory isnt undefined,JobVMMemory,1.464843750000000E+00)) ) >= 1500 ) )
                                      10                   

Conflicts:

  conditions: 1, 2
kgk@claw$ condor_status

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot1@b0001        LINUX      X86_64 Unclaimed Idle     0.000   992  0+00:40:04
slot2@b0001        LINUX      X86_64 Unclaimed Idle     0.000   992  0+16:40:28
slot3@b0001        LINUX      X86_64 Unclaimed Idle     0.000   992  0+16:40:29
slot4@b0001        LINUX      X86_64 Unclaimed Idle     0.000   992  0+16:40:30
slot1@b0002        LINUX      X86_64 Unclaimed Idle     0.000   992  0+00:40:04
slot2@b0002        LINUX      X86_64 Unclaimed Idle     0.000   992  0+16:40:28
slot3@b0002        LINUX      X86_64 Unclaimed Idle     0.000   992  0+16:40:29
slot4@b0002        LINUX      X86_64 Unclaimed Idle     0.000   992  0+16:40:30
slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000  1007  0+03:50:04
slot2@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     -1.000  1007  0+19:50:20
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX    10     0       0        10       0          0        0

               Total    10     0       0        10       0          0        0