[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] net topology





On 30/05/17 10:28, lejeczek wrote:


On 26/05/17 16:03, Todd Tannenbaum wrote:
On 5/26/2017 6:20 AM, lejeczek wrote:
hi everybody

like earlier, a newbie here, trying to grasp all those concepts condor (may)offer.
Question:
Can a pool be configured with a central manager(s - in HA setup) where only the central manager(s) would be submitting and the rest of the pool would be on different subnet.

You probably envision easily what I'm asking: HA managers(10.0.0.0 users see) then the rest 10.1.0.0 users don't and submit only via c. managers' 10.0.0.x.
Would such a setup work and by allowed by the design?

m.! tahnks
L.

Not quite sure what you want from the above, but I think perhaps you have missed an important point re the architecture of an HTCondor pool.

In an HTCondor pool, there is:
1. one central manager (CM), and one or more optional backup CMs if you bother with the HA setup.
  2. one or more submit machines.
  3. one or more execute machines.

Any machine can serve one, two, or all the of the above three roles simply based on what daemons are listed in DAEMON_LIST. Your central manager(s) do NOT have to be the same as your submit machines. Any machine in the pool that runs the "condor_schedd" daemon can act as a submit machine, just by adding SCHEDD to the DAEMON_LIST config knob. Esp for larger pools, it is a good idea to have dedicated machine(s) for each role: one central manager, one or submit machines, one or more execute machines. You can have as many submit machines as you want; here at UW, we have a pool with 1 central manager, ~500 execute machines, and 80+ submit machines, as we have many submit machines that are embedded within various research labs that only have logins for the researchers in that lab. Meanwhile our central manager is located in the centralized IT data center. If one of the 80+ submit machines goes down, only the jobs submitted on that one submit machine are impacted; all the other submit machine continue to operate as normal. More details on this is at

http://research.cs.wisc.edu/htcondor/manual/v8.7/3_1Introduction.html#SECTION00411000000000000000

If you are asking "can I have a submit machine that is on two networks, a public network that users can access via ssh to login, and a private network that holds all my execute nodes and my central manager", the answer is yes.

Hope the above helps
Todd


I was asking because I did:

HA with two central manager with specific NETWORK_INTERFACE (on a subnet A) and then, a exec node (with only DAEMON_LIST = MASTER, STARTD) pointing to CONDOR_HOST = $(CENTRAL_MANAGER1),$(CENTRAL_MANAGER2) but different subnet, not CM's NETWORK_INTERFACE (a subnet B to which central managers are also connected).

And it works apparently, but I was worried as I could see in _status the exec node twice, affecting Total. That was soon after I started exec node but now after long weekend it seems condor corrected it somehow.
Why do you think it showed up twice?

many thanks.

mailman disabled my subscription, and I wonder if anybody replied.