Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] use free machines first but overload cpu's

Date: Fri, 20 Jan 2006 10:03:32 +0100
From: van Pee <pee@xxxxxxxxxxxxxxxx>
Subject: Re: [Condor-users] use free machines first but overload cpu's

Hi Steffen,

you are right, we do not realy parallize but split a job.
This is done on file base and each job processes about 100 files.

The idea is the following: We have similar machines which all canprocess 4 jobswithout running into resource problems. Mostly a user can use one cpufor all of his jobs.But sometimes the cluster is overloaded and for this cases I want tohave stil resources leftfor the 5 minute jobs. If this user have to wait in some rare cases 20minutes it would be no problem.If this users have to wait one day in one case, I will run into troubleand condor will maybe not be acceped.


I have problems to understand the ranking
RANK=(7-VirtualMachineID)
seems to be a good idea.
Where I have to put this rank? In the local config file ?
I don't understand the class add method.

Harald



Steffen Grunewald wrote:

On Thu, Jan 19, 2006 at 06:26:14PM +0100, van Pee wrote:
Hi all,
My problem is the following: All users should have the same priority andcan use all machines. Its intended togive all users maximum throughput. If there are small jobs which can beparallised they should always run!
Harald,

I'm a bit puzzled: first you're talking about vanilla (which is fine for
a lot of applications), then you want to parallelise. Condor vanilla is
meant for *serialised* tasks. If you want parallel execution, you will
need the MPI universe.

Let me assume that you meant "split a task into n subtasks which can run
independent of each other" - then it can happen that the same CPU (or
virtual machine, in Condor-speak) will process all n jobs if no other
resources are free.Remember that a maximum throughput solution may be unfair to individualusers! It's the general throughput that counts - there's no guarantee that
you individual job batch will be finished within a given time range.
(Of course there are means to tweak the configuration to favor certain
classes of tasks, but that's not what you'd like to have at the very
beginning of your Condor experience.)
If I use just as many cpu's as there are (6 at the moment) than I canuse just 6 jobs at once. If there would bea user who wants to run a job splitted to 6 cpu's (on filebase) whichtake in total 5 minutes it could happen, that
he have to wait for hours or days for this job, which is not acceptable.
If you got n cpus, and don't redefine virtual machines, there will be aone-to-one mapping of cpus to VMs, correct. Each of those VMs will get
negotiated (by the master) and matched with a job, and once it's finished
its work it will receive the next chunk of work. In our setup, a VM
negotiated for a certain user will stay assigned to that user until it
runs out of work - so if youu manage to grab at least one CPU odds are good
to finish the whole batch in limited time.
with NUM_CPUS = ,
I can change this, but it seems, that condor uses first all 6 (of coursevirtual) cpus of the first machine
and then it starts with the next one!
That depends on the negotiator cycles and will randomize over time.
You may prioritise using a RANK=(7-VirtualMachineID).
What I want to have is:
I allow a maximum of 4 jobs per real cpu. We have 2 types (later 3 or 4types) of cpus: fast and faster.
condor should use
1. all faster cpu with one job
2. all fast cpu with one job
Use RANKing to prefer faster cpus, based on the classad attributesrelated to speed (MIPS or the like). To prefer slow machines, use
100000-MIPS :-)

Are you sure you want to run 4 jobs on a single CPU? What about real
and virtual memory? If the machine starts swapping, your executiontimes may explode.
if there are 6 jobs each real cpu should run one of them.
if there are 12 jobs, each real cpu should run two of them
and so on!
What's the point? If every real CPU has a single job to run it will do
so 100% if the time, and finish after time T. If the same real CPU(split into 2 VMs) has to run 2 jobs, it will run every job at max 50%,
and finish both after 2*T (or later, if swapping has to be accounted for).
In both cases, 2 jobs will be done after 2*T - the one-to-one solution
is far more predictable.
For me the condor configuration is too sophisticated and I don't find the
correct setting for the above task. Therefore it would be very helpfulif someone can lead me in the right direction.
Don't try to do everything at the same time. Serialisation is a good
thing (unless you're MPIing). If you need dependencies, DAG will beyour friend...
Cheers,
Steffen

Follow-Ups:
- Re: [Condor-users] use free machines first but overload cpu's
  - From: Steffen Grunewald

References:
- [Condor-users] use free machines first but overload cpu's
  - From: van Pee
- Re: [Condor-users] use free machines first but overload cpu's
  - From: Steffen Grunewald

Prev by Date: Re: [Condor-users] Checkpointing in vanilla
Next by Date: Re: [Condor-users] GCB Performance
Previous by thread: Re: [Condor-users] use free machines first but overload cpu's
Next by thread: Re: [Condor-users] use free machines first but overload cpu's
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] use free machines first but overload cpu's