[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] net topology
- Date: Tue, 30 May 2017 10:28:53 +0100
- From: lejeczek <peljasz@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] net topology
On 26/05/17 16:03, Todd Tannenbaum wrote:
On 5/26/2017 6:20 AM, lejeczek wrote:
like earlier, a newbie here, trying to grasp all those
concepts condor (may)offer.
Can a pool be configured with a central manager(s - in HA
setup) where only the central manager(s) would be
submitting and the rest of the pool would be on different
You probably envision easily what I'm asking: HA
managers(10.0.0.0 users see) then the rest 10.1.0.0 users
don't and submit only via c. managers' 10.0.0.x.
Would such a setup work and by allowed by the design?
Not quite sure what you want from the above, but I think
perhaps you have missed an important point re the
architecture of an HTCondor pool.
In an HTCondor pool, there is:
1. one central manager (CM), and one or more optional
backup CMs if you bother with the HA setup.
2. one or more submit machines.
3. one or more execute machines.
Any machine can serve one, two, or all the of the above
three roles simply based on what daemons are listed in
DAEMON_LIST. Your central manager(s) do NOT have to be the
same as your submit machines. Any machine in the pool
that runs the "condor_schedd" daemon can act as a submit
machine, just by adding SCHEDD to the DAEMON_LIST config
knob. Esp for larger pools, it is a good idea to have
dedicated machine(s) for each role: one central manager,
one or submit machines, one or more execute machines. You
can have as many submit machines as you want; here at UW,
we have a pool with 1 central manager, ~500 execute
machines, and 80+ submit machines, as we have many submit
machines that are embedded within various research labs
that only have logins for the researchers in that lab.
Meanwhile our central manager is located in the
centralized IT data center. If one of the 80+ submit
machines goes down, only the jobs submitted on that one
submit machine are impacted; all the other submit machine
continue to operate as normal. More details on this is at
If you are asking "can I have a submit machine that is on
two networks, a public network that users can access via
ssh to login, and a private network that holds all my
execute nodes and my central manager", the answer is yes.
Hope the above helps
I was asking because I did:
HA with two central manager with specific NETWORK_INTERFACE
(on a subnet A) and then, a exec node (with only DAEMON_LIST
= MASTER, STARTD) pointing to CONDOR_HOST =
$(CENTRAL_MANAGER1),$(CENTRAL_MANAGER2) but different
subnet, not CM's NETWORK_INTERFACE (a subnet B to which
central managers are also connected).
And it works apparently, but I was worried as I could see in
_status the exec node twice, affecting Total.
That was soon after I started exec node but now after long
weekend it seems condor corrected it somehow.
Why do you think it showed up twice?