[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] use free machines first but overload cpu's



On Fri, Jan 20, 2006 at 10:03:32AM +0100, van Pee wrote:
> Hi Steffen,
> 
> you are right, we do not realy parallize but split a job.
> This is done on file base and each job processes about 100 files.

In this case you may consider to rsync the files to the nodes,
or you may run into a bottleneck on the fileserver. YMMV - but
it's worth watching.

> The idea is the following: We have similar machines which all can 
> process 4 jobs
> without running into resource problems.

This measn you will set up 4 VMs per machine. (Condor may already do this
for you).

  Mostly a user can use one cpu 
> for all of his jobs.

Then it would make sense to talk to other users and schedule 
VirtualMachineID settings (user 1 uses VM1, and so on)

> But sometimes the cluster is overloaded and for this cases I want to 
> have stil resources left
> for the 5 minute jobs.

Don't allow access to the "last" VM then, except the special jobs show
up with some magic phrase. 
You may set up a special START condition to accomplish this, looking
this way:

START = ((VirtualMachineID != 4) || (TARGET.Magic =?= "magick"))

This would start a job on any VM but 4, except the user specifies
some +Magic="magick" string in the submit file (I'm typing this 
from memory which has proven to be unreliable sometimes, so better
check with the manual, but you get the idea)

  If this user have to wait in some rare cases 20 
> minutes it would be no problem.

This is all a matter of whether you want to vacate a VM if a user with
higher priority shows up. My recipe is based on a 'never vacate' policy;
if you can afford vacating (and restarting!) jobs (IIRC, you're in the
vanilla universe; read the thread on checkpointing as well) your solution
may be to give the special user a higher priority.
Priorities are evaluated during the negotiation phase - so if there's a
VM available for matching the user with the highest prio will get it; 
if not, and vacating is disabled ("let everything finish, no matter what")
even the highest-prioritised user will have to wait. 

> If this users have to wait one day in one case, I will run into trouble 
> and condor will maybe not be acceped.

In this case, the admin still can put idle jobs by other users on hold
so a VM that just finished its job will not get another job by the same 
user (same match) but will return to the matching phase. 
There's no way for Condor to entirely guess what you all (your users, and
you as admin) expect it to do, and there are several ways to pass it some
hints.

> 
> I have problems to understand the ranking
> RANK=(7-VirtualMachineID)
> seems to be a good idea.
> Where I have to put this rank? In the local config file ?
> I don't understand the class add method.

There have been lots of RANK examples on the list in the past.
You can RANK on the machine side (done in the config) to prefer jobs,
and you can give a Rank = expression in the submit files to choose from
available machines (for instance, to prefer faster machines).
See both explained in the manual (RANK macro: config side, Rank expr:
explained in condor_submit man page)

Cheers,
 Steffen

-- 
Steffen Grunewald * MPI fuer Gravitationsphysik (Albert-Einstein-Institut)
SciencePark Golm, Am Mühlenberg 1, D-14476 Potsdam * http://www.aei.mpg.de
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html