[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Flocking and high availability




Ian,

We have the super-pool CM configured to collect the offlinelog and have defined a main rooster that is in charge to wake the machines up. The only machines of the sub-pools that are defined to not hibernate are the sub-pool CM.
It is not fully tested in production yet, but I have made some development tests and they worked accordingly. You could also give it a try defining a super-pool and at least one sub-pool to prove the prove if these work as you expect.
With these approach you don´t need HA defined at the sub-pool level because the redundancy you need to run your jobs is provided by the super-pool collector (that has HA).
We have had problems to provide local redundancy using Condor native HA mechanisms together with the hibernate mechanism. There is a ticket open for this problem (#2106).
To overcome this problem we have defined HA at OS level (through cluster resource management service) for the super-pool machines.

Klaus




Ian Cottam <Ian.Cottam@xxxxxxxxxxxxxxxx>
Sent by: condor-users-bounces@xxxxxxxxxxx

07/11/2011 12:18

Please respond to
Condor-Users Mail List <condor-users@xxxxxxxxxxx>

To
"condor-users@xxxxxxxxxxx" <condor-users@xxxxxxxxxxx>
cc
"condor-users-bounces@xxxxxxxxxxx" <condor-users-bounces@xxxxxxxxxxx>
Subject
Re: [Condor-users] Flocking and high availability





Thanks Klaus.
That is indeed an interesting approach. I'm a little unsure how it would
impact our desire to have wake-on-lan working in each local cluster but
with some redundancy.
Also, if anyone can answer my original question directly I would
appreciate it.
Thanks again -Ian

On 07/11/2011 12:10, "kschwarz@xxxxxxxxxxxxxx" <kschwarz@xxxxxxxxxxxxxx>
wrote:

>
>Ian,
>
>Take a look at the "How to have
>execute machines belong to multiple pools" Admin Recipe
>(https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToHaveExecuteMachines
>)
>that shows how to configure a super-pool over all your sub-pools and gives
>you the benefit of flocking  in a better way and "HA" for
>the matchmaking process.
>
>Klaus
>
>
>
>
>Ian Cottam <Ian.Cottam@xxxxxxxxxxxxxxxx>
>Sent by: condor-users-bounces@xxxxxxxxxxxxx/11/2011 09:49Please respond to
>Condor-Users Mail List <condor-users@xxxxxxxxxxx>
>
>
>To
>"condor-users@xxxxxxxxxxx"
><condor-users@xxxxxxxxxxx>cc
>Subject
>[Condor-users] Flocking and high availability
>
>
>
>
>Dear all,
>
>To ease our problems with wake-on-lan, I am thinking of changing our
>Condor topology from one big pool across campus to lots of little ones
>connected by flocking.
>
>We use teaching cluster PCs as our Condor Pool, and the idea is that one
>of the PCs in each cluster would be the local match maker. Jobs would
>(exclusively) flock from our central submitter/matchmaker pair to the
>local pools. The teaching cluster PCs are not submit nodes so they don't
>need any flock_to config lines.
>
>Now, the obvious question arises: what if the PC in a local pool we have
>chosen as local match maker is down? Looking at the High Availability part
>of the manual it says you can have multiple negotiators, BUT it further
>states that this does not work in the case of flocking.
>
>My question is would it work in my case of restricted flocking (getting
>jobs from one central submitter/matchmaker)?
>I suspect not, but would appreciate a response from the gurus out there
>:-)
>
>Many thanks
>-Ian
>
>
>
>--
>Ian Cottam
>IT Services for Research
>Faculty of Engineering and Physical Sciences
>The University of Manchester
>"The only strategy that is guaranteed to fail is not taking risks."
>Mark
>Zuckerberg
>
>
>
>
>_______________________________________________
>Condor-users mailing list
>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
>a
>subject: Unsubscribe
>You can also unsubscribe by visiting
>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>The archives can be found at:
>https://lists.cs.wisc.edu/archive/condor-users/
>
>
>________________________________________
>This message is intended solely for the
>use of its addressee and may contain privileged or confidential
>information.
>All information contained herein shall be treated as confidential and
>shall
>not be disclosed to any third party without Embraer¹s prior written
>approval.
>If you are not the addressee you should not distribute, copy or file this
>message. In this case, please notify the sender and destroy its contents
>immediately.
>Esta mensagem é para uso exclusivo de seu destinatário e pode conter
>informações
>privilegiadas e confidenciais. Todas as informações aqui contidas devem
>ser tratadas como confidenciais e não devem ser divulgadas a terceiros
>sem o prévio consentimento por escrito da Embraer. Se você não é o
>destinatário
>não deve distribuir, copiar ou arquivar a mensagem. Neste caso, por favor,
>notifique o remetente da mesma e destrua imediatamente a
>mensagem._______________________________________________
>Condor-users mailing list
>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>subject: Unsubscribe
>You can also unsubscribe by visiting
>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>The archives can be found at:
>https://lists.cs.wisc.edu/archive/condor-users/
>


--
Ian Cottam
ext. 61851
IT Services for Research
Faculty of Engineering and Physical Sciences
The University of Manchester
"The only strategy that is guaranteed to fail is not taking risks." Mark
Zuckerberg




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



This message is intended solely for the use of its addressee and may contain privileged or confidential information. All information contained herein shall be treated as confidential and shall not be disclosed to any third party without Embraer’s prior written approval. If you are not the addressee you should not distribute, copy or file this message. In this case, please notify the sender and destroy its contents immediately.
Esta mensagem é para uso exclusivo de seu destinatário e pode conter informações privilegiadas e confidenciais. Todas as informações aqui contidas devem ser tratadas como confidenciais e não devem ser divulgadas a terceiros sem o prévio consentimento por escrito da Embraer. Se você não é o destinatário não deve distribuir, copiar ou arquivar a mensagem. Neste caso, por favor, notifique o remetente da mesma e destrua imediatamente a mensagem.