Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor multiple pools

Date: Mon, 30 Oct 2006 13:22:52 -0600
From: Dan Bradley <dan@xxxxxxxxxxxx>
Subject: Re: [Condor-users] Condor multiple pools



Steffen Grunewald wrote:

On Mon, Oct 30, 2006 at 12:21:18PM +0100, Cor Cornelisse wrote:

Hi,

We are setting up a cluster at our campus and since we've some experience
with condor we plan on using it. The machines which will join the cluster
are scattered throughout the building. Since there's not enough power /
network connections available to fit them into one room, it comes down to
small clusters of say 10 ~ 20 boxes, which have a connection to the
internal network via NAT.
At first we thought of creating seperate condor pools on all these
subclusters and then use job flocking. However, we'd like to have the
ability to use ALL machines for ONE big job. Job flocking can only migrate
it's job from one pool to another if I'm correct.

When you say, "one big job", are you talking about MPI or something likethat?

Condor glide-in looks a bit like overkill to me, since we'll then be
running condor within condor.

The overhead of running an extra startd for each job is typically notsignificant. However, glidein still requires bi-directionalconnectivity between submit and execute machines, so you would need touse GCB within the glidein pool itself. Within the underlying pools,you would not necessarily have to use GCB, as long as you have onepublic schedd per pool. The glideins could be submitted on-demand fromsome central location to each of these publicly accessible schedds. Ofcourse, it would take some effort to set that all up and maintain it.

I've spend quite some time reading documentation and the only thing I
could come up with is using GCB to create one big pool. However, this
would severly affect the scalability.

From what I have seen, pools on the order of 2000 CPUs are practical,with some attention to configuration details. Beyond that, I lackexperience to comment.

 We might like to add an existing
cluster in the future and if we would be using GCB, the existing cluster's
configuration would have to be adapted to use GCB and join our pool.

There is an active effort to make GCB less invasive, so, for example,communication within a pool could take place without any dependence onGCB, but communication with external submitters would use GCB. As itexists today, you are correct that GCB is all or nothing.

I find it hard to believe I'm the only one who would like to join multiple
pools and still have the ability to have one job running over multiple
pools. I must be overlooking something, can someone give me a hint in de
right direction? I do understand condor is about HTC, and what I'm
requesting is actually a HPC kind of thing, but does this mean I will have
to go looking for something else instead of Condor?


Sound like VLAN might be a solution for you - allows to use the general
network infrastructure, and still keep the clusters separated from other
stuff... I'd ask the IT guys whether they can make this possible.

Cheers
 Steffen

Follow-Ups:
- Re: [Condor-users] Condor multiple pools
  - From: Cor Cornelisse

References:
- Re: [Condor-users] MPI condor Config
  - From: Diego Bello
- [Condor-users] Condor multiple pools
  - From: Cor Cornelisse
- Re: [Condor-users] Condor multiple pools
  - From: Steffen Grunewald

Prev by Date: Re: [Condor-users] Condor multiple pools
Next by Date: [Condor-users] condor_rm failing for one user because of credential problem
Previous by thread: Re: [Condor-users] Condor multiple pools
Next by thread: Re: [Condor-users] Condor multiple pools
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Condor multiple pools