[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Setup with dynamic slots runs only one job, ever



I have a small cluster with one submit node, and two slave machines. One slave has 4 cores, the other has 8. All are red hat FC8, and condor is version 7.2.1. The config on the slaves looks like this:

SLOT_TYPE_1 = cpus=100%, ram=100%, swap=50%, disk=100%
NUM_SLOTS_TYPE_1 = 1
SLOT_TYPE_1_PARTITIONABLE = True

If I run "condor_restart -all", and then "condor_run hostname" then it runs and I get a machine name back from hostname. A dynamic slot was synthesized to run it, which I can observe by adding "sleep 60" to the command, to give me time to look. Then the dynamic slot goes away. After this happens, doing "condor_run hostname" remains idle forever for "unknown reasons" (even with better-analyze).

I tried comparing condor_status -l on the two machines before and after "condor_run hostname" had run, and one value that changed afterwards is that VirtualMemory changed from 60000000 to -1 on the machine that ran hostname. I thought that might be the problem, but the second machine, which didn't run hostname, still has its full VirtualMemory being reported, but it doesn't run the job either.

In another scenario, I created a submit file to run hostname, with "queue 200" at the bottom. What happens when I submit it is that each machine spawns one dynamic slot and each of those dynamic slots runs one job until all 200 are finished. Even if I add "sleep 600" to the job, so that a negotiator interval or two has to go by before the job is done, no more than one slot is ever synthesized on either machine. I feel that each machine should spawn up to its TotalCpus or Cpus, which are both 4 on one machine and 8 on the other.

Any ideas how to debug this? Where is the decision to synthesize a dynamic slot being logged? The NegotiatorLog simply says 

2/18 13:39:09     Request 02718.00000:
2/18 13:39:09       Rejected 2718.0 glangmead@xxxxxxxxxxxxxxxxxx <192.168.129.20:48105>: no match found

Thanks,
Greg Langmead
Senior Research Scientist
Language Weaver, Inc.