[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Elastically extend local condor pool by EC2 instances



Yes, of course, this is a good point. Still it seems that you are also
considering to directly add cloud instances to your cluster. Does that
mean you found that network reliability is not so much of a problem in
practice? (network traffic is, of course, another important issue)

My (limited) personal experience and what I've heard from others indicates that network reliability is not usually an issue. You should still probably configure the cloud instances to handle network problems (e.g., shutting themselves down if they haven't had any work to do for a while) anyway -- I have to imagine that it's more likely for the network issue to be on your side (making it hard to turn off those nodes which aren't doing anything useful any longer) than on Amazon's.

I don't know if you can connect Linux instances back to your Windows pool,

You can, but it's not likely to be useful, since the jobs on a Windows-only pool are probably all Windows-only.

That sounds really interesting! I'd be highly interested to learn more
about this. There is not much information online about this tool
somehow. Is it a linux-only solution?

condor_annex is presently Linux-only and is still in the prototype stage, which is why we're not saying much about it (yet). It's probably going to change quite a bit before we release it for the general public, so I'd prefer not to confuse things.

My impression is that our limitation to windows is really an obstacle in this whole project.

It certainly means you're going to be doing some trail-blazing. On the other hand, you could minimize the amount of trail-blazing by running a small Linux installation somewhere (even Amazon, if you have to). The HTCondor command-line tools work just fine across platforms, so as long as you configure your primary schedd(s) to allow it, you could monitor the queue(s) from there. Add a schedd, and you can submit EC2 jobs. Those EC2 jobs could be Windows images; I assume (I am not a Windows person ;)) it's possible to preconfigure them to join your pool.

Waiting for condor_annex may save you some grief, but it may also take a long time for condor_annex to be able to handle autoconfiguring Windows instances. In a shorter time-frame, the 8.5.5 release will include an improved EC2 GAHP, one that supports (if you're patient) thousands of EC2 jobs, which seems like it ought to be enough to get you well and truly started.

- ToddM