[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] CM Failover with submits from CM




Condor supports fail-over of the submit node. I don't have experience with that, so I'll just focus on a different aspect of your question.

In Condor 7.x, you can use CCB to allow a submit machine to operate from outside of the private network, assuming you have outbound connectivity from the private network to the submit machine. (Whether that still satisfies your boss's security concerns is a different question. CCB access can be regulated using Condor's standard authentication and authorization options.)

To configure it, you would simply list both of your CMs in CCB_ADDRESS in the configuration of the execute nodes. If one CM fails, things will automatically fail over to the other. While both are functioning, CCB traffic will load-balance across the two.

--Dan

Janzen Brewer wrote:
I have an interesting problem. I believe I detailed the setup of my organization's cluster/network in an earlier post, but I will repeat it here:

Nine compute machines (running STARTD, SCHEDD) exist on a private subnet and cannot be reached (by design) from the rest of my organization. They are connected to a switch which is also connected to the primary and secondary central managers. The primary/secondary CMs have two NICs each. Each CM has an IP on the private subnet and on my organization's public network.

Problem: I tried submitting a job from my workstation (on my organization's public network), but the CMs tell it to talk to the STARTD at a private address, which obviously doesn't work. I told my boss about this and asked how he wanted to proceed, and he wants to only allow job submissions from the CMs. This works, BUT he also wants failover capability. I don't foresee this working well since the submit machine goes down when the CM goes down, even though CM functionality fails over.

What kind of options do I have? My boss is adamant that the nodes stay on a private subnet and that we have CM failover capability. I don't think having a separate submit machine which straddles the private and public networks (like the current CMs) will work. My boss wants no single point of failure present in the system.
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/