[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor with multiple sites over a WAN link.


I hope these questions are not too frequently asked; I have looked through the
manual and other documentation without finding an answer.

To give some background, we are starting to deploy Condor to help run a
variety of batch processed reporting and data manipulation jobs, most of which
are relatively small: no more than a few CPU-hours of run-time.

We have two major sites connected via a WAN link, and we would like users to
be able to run jobs that span more these two sites — mostly for historical
reasons, such as "data extraction can only happen of machine A, processing
only on machine B", where A and B are at opposite sides of the link.

The WAN link is always available and quite reliable, although somewhat
bandwidth constrained,

So, my understanding is that this deployment is probably best served using a
single Condor master, and having our submission and execution machines all
talk to it over the local or WAN link respectively.

Assuming that is the case, what sort of traffic is this likely to generate?
I believe that this is just the class-ad transmission via UDP every five
minutes or so, plus the data transfer for individual jobs.

Within this model we want jobs to prefer to run in their local site, and only
to run at the site across the WAN link when mandatory.

I believe the NEGOTIATOR_{PRE,POST}_JOB_RANK configuration, together with a
ClassAd identifying location, is the right tool to use here, as documented in
the admin tips and tricks: http://nmi.cs.wisc.edu/node/1479

Specifically, we use the PRE version to rank the local site higher, which will
send the job to a machine there, and use the RANK in the job itself to work
out which one to use.

Is that correct, or do I need to do something more complex with the START
expression to enforce this rule?

Given that some of these jobs can generate 100 to 300 MB of (post-compression)
output data for a 30-CPU-minute runtime, the cost of waiting for a free slot
at the local site will often beat out the cost of data transfer across the WAN

It *IS* mandatory that some jobs run on machines on the other side of the
link, though, so I can't just set START to only accept the same site as the
submitting machine.

Thanks in advance for your time,
✣ Daniel Pittman            ✉ daniel@xxxxxxxxxxxx            ☎ +61 401 155 707
               ♽ made with 100 percent post-consumer electrons
   Looking for work?  Love Perl?  In Melbourne, Australia?  We are hiring.