[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] net topology



On 5/26/2017 6:20 AM, lejeczek wrote:
hi everybody

like earlier, a newbie here, trying to grasp all those concepts condor (may)offer.
Question:
Can a pool be configured with a central manager(s - in HA setup) where only the central manager(s) would be submitting and the rest of the pool would be on different subnet.

You probably envision easily what I'm asking: HA managers(10.0.0.0 users see) then the rest 10.1.0.0 users don't and submit only via c. managers' 10.0.0.x.
Would such a setup work and by allowed by the design?

m.! tahnks
L.

Not quite sure what you want from the above, but I think perhaps you have missed an important point re the architecture of an HTCondor pool.

In an HTCondor pool, there is:
1. one central manager (CM), and one or more optional backup CMs if you bother with the HA setup.
  2. one or more submit machines.
  3. one or more execute machines.

Any machine can serve one, two, or all the of the above three roles simply based on what daemons are listed in DAEMON_LIST. Your central manager(s) do NOT have to be the same as your submit machines. Any machine in the pool that runs the "condor_schedd" daemon can act as a submit machine, just by adding SCHEDD to the DAEMON_LIST config knob. Esp for larger pools, it is a good idea to have dedicated machine(s) for each role: one central manager, one or submit machines, one or more execute machines. You can have as many submit machines as you want; here at UW, we have a pool with 1 central manager, ~500 execute machines, and 80+ submit machines, as we have many submit machines that are embedded within various research labs that only have logins for the researchers in that lab. Meanwhile our central manager is located in the centralized IT data center. If one of the 80+ submit machines goes down, only the jobs submitted on that one submit machine are impacted; all the other submit machine continue to operate as normal. More details on this is at

http://research.cs.wisc.edu/htcondor/manual/v8.7/3_1Introduction.html#SECTION00411000000000000000

If you are asking "can I have a submit machine that is on two networks, a public network that users can access via ssh to login, and a private network that holds all my execute nodes and my central manager", the answer is yes.

Hope the above helps
Todd