[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Flocking feature in condor, samples



Hi list
I have enabled flocking on my two condor pools at machine A and B
When I do this on B
[  ] condor_status -pool A

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

compute-0-0.local  LINUX      INTEL  Unclaimed Idle     0.000  1009  0+01:35:04
compute-0-1.local  LINUX      INTEL  Unclaimed Idle     0.000  1009  0+03:10:04
compute-0-2.local  LINUX      INTEL  Unclaimed Idle     0.000  1009  0+02:10:04
compute-0-3.local  LINUX      INTEL  Unclaimed Idle     0.000  1009  0+03:15:04
protos.cs.bgsu.edu LINUX      INTEL  Unclaimed Idle     0.000   933  0+03:10:04

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

         INTEL/LINUX     5     0       0         5       0          0        0

               Total     5     0       0         5       0          0        0

And condor_status on B shows

$ condor_status

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

comet.cs.bgsu.edu  LINUX      X86_64 Owner     Idle     1.000  7927  0+10:25:04
slot1@compute-0-0. LINUX      X86_64 Unclaimed Idle     0.000   954  0+06:51:29
slot2@compute-0-0. LINUX      X86_64 Unclaimed Idle     0.000   954  0+06:51:30
slot3@compute-0-0. LINUX      X86_64 Unclaimed Idle     0.030   954  0+02:20:06
slot4@compute-0-0. LINUX      X86_64 Unclaimed Idle     0.000   954  0+06:54:47
slot1@compute-0-1. LINUX      X86_64 Unclaimed Idle     0.000   954  0+06:54:36
slot2@compute-0-1. LINUX      X86_64 Unclaimed Idle     0.000   954  0+06:54:37
slot3@compute-0-1. LINUX      X86_64 Unclaimed Idle     0.000   954  0+06:57:53
slot4@compute-0-1. LINUX      X86_64 Unclaimed Idle     0.000   954  0+00:20:07
slot1@compute-0-2. LINUX      X86_64 Unclaimed Idle     0.000   954  0+06:51:15
slot2@compute-0-2. LINUX      X86_64 Unclaimed Idle     0.000   954  0+06:51:16
slot3@compute-0-2. LINUX      X86_64 Unclaimed Idle     0.000   954  0+06:54:32
slot4@compute-0-2. LINUX      X86_64 Unclaimed Idle     0.000   954  0+00:05:07
slot1@compute-0-3. LINUX      X86_64 Unclaimed Idle     0.000   954  0+06:52:38
slot2@compute-0-3. LINUX      X86_64 Unclaimed Idle     0.000   954  0+06:52:39
slot3@compute-0-3. LINUX      X86_64 Unclaimed Idle     0.010   954  0+03:45:07
slot4@compute-0-3. LINUX      X86_64 Unclaimed Idle     0.000   954  0+06:55:56
slot1@compute-0-4. LINUX      X86_64 Unclaimed Idle     0.000   954  0+06:50:42
slot2@compute-0-4. LINUX      X86_64 Unclaimed Idle     0.000   954  0+06:50:43
slot3@compute-0-4. LINUX      X86_64 Unclaimed Idle     0.000   954  0+03:45:06
slot4@compute-0-4. LINUX      X86_64 Unclaimed Idle     0.000   954  0+06:54:00
slot1@compute-0-5. LINUX      X86_64 Unclaimed Idle     0.000   954  0+03:50:04
slot2@compute-0-5. LINUX      X86_64 Unclaimed Idle     0.000   954 21+04:21:26
slot3@compute-0-5. LINUX      X86_64 Unclaimed Idle     0.000   954 21+04:21:27
slot4@compute-0-5. LINUX      X86_64 Unclaimed Idle     0.000   954 21+04:21:28

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX    25     1       0        24       0          0        0

               Total    25     1       0        24       0          0        0

I want to submit jobs so that it executes on the "Arch=INTEL", but the job doesnot run on A,

universe = MPI
executable = /home/skhanal/cpi
Requirements= (OpSys == "LINUX" && Arch =="INTEL")
log = userlog.txt
output = outfile.$(NODE)
error = errfile.$(NODE)
machine_count = 2
should_transfer_files = yes
when_to_transfer_output = on_exit
queue

I get the following output

$ condor_q -analyze

---
153.000:  Run analysis summary.  Of 25 machines,
     25 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      0 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job

WARNING:  Be advised:
   No resources matched request's constraints
   Check the Requirements expression below:

Requirements = ((OpSys == "LINUX" && Arch == "INTEL")) && (Disk >= DiskUsage) && ((Memory * 1024) >= ImageSize) && (HasMPI) && (HasFileTransfer)

I read in the manual that the job will move to another condor pool there is none that is satisfying in the current pool and that's what I tried to do, without success.

Any hints on how to do this, or how to check if the flocking is actually working, some simple tests?

Thanks
Samir