[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Cannot get all workers in a cluster to work on jobs

I'm new to HTCondor, and currently have a cluster setup on 4 identical
machines all running this condor version:
> condor_version
$CondorVersion: 7.8.7 Dec 12 2012 BuildID: 86173 $
$CondorPlatform: x86_64_rhap_5.8 $

machine 3 is the master, the other 3 (1,2,4) are identical workers.

condor_status (run on any of the 4 machines) shows me all slots and
machines in the cluster and all looks fine (it shows 4 machines with 4
slots each so 16). The problem is, I can never get all 4 machines to
work on jobs. In my tests, I submit 100 copies of a job from each of
the machines, and they never run on all machines. At most they run on
two and when submitted from the master itself, they only run on the
submit from 3: only 3 is used: 1,2,4 are MATCHED but never actually used
submit from 1:   1,3 are used:    2,4 are MATCHED but never actually used
submit from 2:   2,3 are used:    1,4 are MATCHED but never actually used
submit from 4:   3,4 are used;    1,2 are MATCHED but never actually used

My jobs specify no special requirements to be run so I don't think
that's the problem. I assume I have something wrong somewhere in my
config but I'm not sure what or where to look. Can anyone point me in
the right direction? I can post any of my config you need to see.