[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] use free machines first but overload cpu's



Hi,

thanks for the help, I just use DEFAULT_RANK and everything is running fine 
now.

At the moment it seems that condor gives us good performance even in the 
vanilla universe and we will start learning more about condor by using it.

Harald

On Monday 23 January 2006 11:09 am, Steffen Grunewald wrote:
> On Fri, Jan 20, 2006 at 10:03:32AM +0100, van Pee wrote:
> > Hi Steffen,
> >
> > you are right, we do not realy parallize but split a job.
> > This is done on file base and each job processes about 100 files.
>
> In this case you may consider to rsync the files to the nodes,
> or you may run into a bottleneck on the fileserver. YMMV - but
> it's worth watching.
>
> > The idea is the following: We have similar machines which all can
> > process 4 jobs
> > without running into resource problems.
>
> This measn you will set up 4 VMs per machine. (Condor may already do this
> for you).
>
>   Mostly a user can use one cpu
>
> > for all of his jobs.
>
> Then it would make sense to talk to other users and schedule
> VirtualMachineID settings (user 1 uses VM1, and so on)
>
> > But sometimes the cluster is overloaded and for this cases I want to
> > have stil resources left
> > for the 5 minute jobs.
>
> Don't allow access to the "last" VM then, except the special jobs show
> up with some magic phrase.
> You may set up a special START condition to accomplish this, looking
> this way:
>
> START = ((VirtualMachineID != 4) || (TARGET.Magic =?= "magick"))
>
> This would start a job on any VM but 4, except the user specifies
> some +Magic="magick" string in the submit file (I'm typing this
> from memory which has proven to be unreliable sometimes, so better
> check with the manual, but you get the idea)
>
>   If this user have to wait in some rare cases 20
>
> > minutes it would be no problem.
>
> This is all a matter of whether you want to vacate a VM if a user with
> higher priority shows up. My recipe is based on a 'never vacate' policy;
> if you can afford vacating (and restarting!) jobs (IIRC, you're in the
> vanilla universe; read the thread on checkpointing as well) your solution
> may be to give the special user a higher priority.
> Priorities are evaluated during the negotiation phase - so if there's a
> VM available for matching the user with the highest prio will get it;
> if not, and vacating is disabled ("let everything finish, no matter what")
> even the highest-prioritised user will have to wait.
>
> > If this users have to wait one day in one case, I will run into trouble
> > and condor will maybe not be acceped.
>
> In this case, the admin still can put idle jobs by other users on hold
> so a VM that just finished its job will not get another job by the same
> user (same match) but will return to the matching phase.
> There's no way for Condor to entirely guess what you all (your users, and
> you as admin) expect it to do, and there are several ways to pass it some
> hints.
>
> > I have problems to understand the ranking
> > RANK=(7-VirtualMachineID)
> > seems to be a good idea.
> > Where I have to put this rank? In the local config file ?
> > I don't understand the class add method.
>
> There have been lots of RANK examples on the list in the past.
> You can RANK on the machine side (done in the config) to prefer jobs,
> and you can give a Rank = expression in the submit files to choose from
> available machines (for instance, to prefer faster machines).
> See both explained in the manual (RANK macro: config side, Rank expr:
> explained in condor_submit man page)
>
> Cheers,
>  Steffen