[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Assesing Condor suitability



Hi Raymond,

I will try to answer your topics based on my experience implementing Condor in our company.
As you are a company fellow of Matthew Farrellee, I suggest to talk to him to explore the details.

Find below in your message, my *comments* pointing out the features of Condor that address the issue.

Whether Condor is a substitute or not to the scheduling features of Beaker depends on the objectives of the tool.

Taking a look in the Help section of the Beaker project web site, and reading some of the documents, in my opinion, the real added value of Beaker is the knowledge embedded to automate test cases creation (writing). So I suggest focus spending effort on that, and let Condor take care of the scheduling and running the jobs that run the generated test cases.

cheers,
Klaus





From:        Raymond Mancy <rmancy@xxxxxxxxxx>
To:        htcondor-users@xxxxxxxxxxx,
Date:        26/02/2014 02:16
Subject:        [HTCondor-users] Assesing Condor suitability
Sent by:        "HTCondor-users" <htcondor-users-bounces@xxxxxxxxxxx>




Hi,

I'm working on an open source project called 'Beaker' (
http://beaker-project.org/).
I'm looking at the possibility of replacing a chunk of what Beaker does, with Condor.

I've done some reading of various Condor related documents (there are a lot!), but I thought here
may be a good place to get an idea of whether or not Condor would be a good fit for our purpose.

Similarly to Condor, Beaker runs jobs on remote systems. Beaker matches jobs to systems,
keeps track of the status of jobs, and the status of systems. Our current approach is to find a system
that matches the job's criteria (hardware and OS requirements),


*Matchmaking mechanism of Condor.*

and one that is currently 'available' to
the job (which encompasses the permissions, the current status of the system and whether someone is already using it).


*Job Status: Unclaimed (Available to run jobs).*

Beaker does other ancillary things as well, but they are less important.

I'm considering using ClassAds to match jobs with systems. We currently do this matching with our own
XML language. We need all the basic things, like this job requires > 512MB RAM, HVM cpu flag etc etc.

*Condor client Startd Daemon sends all machine informations as ClassAds (available memory, disk spave, CPU performance, etc) to the Pool Central Manager (CM) Collector daemon for the matchmaking performed by the CM Negotiator daemon.*

We also need to support a group and individual based permission model for access to the systems.
I'd also like to identify groups of systems as belonging to a certain 'pool' (just an abstract name) and
then create a job ClassAd that expresses a preference for systems from one pool over another. From
the reading I've done, I think ClassAds can do all this. The real reason I want to use ClassAds though, is
that I want to take advantage of Condor's scheduling.


*Preferred machines are expressed with RANK _expression_ using ClassAds that are already created by Condor or create your own particular ones*

Currently in Beaker, when a job is submitted to be executed on a system, we store the names of
all the systems that meet the hardware requirements as potential candidates.
Then, every 30 seconds or so we will check all the candidate systems until we find one that is free
and that we have the right permissions to use the system. This is slow, cumbersome, inefficient and involves
lots of complex database queries that we would rather not have to ever look at again :-) We want to get rid of this,
as this model no longer meets our requirements. We're envisioning a scheduler where the system can find jobs based
on what the system finds preferable, and jobs can find systems based on what the job finds preferable,
and then form an harmonious union between the two :-)


*Condor CM Negotiator takes care of that for you. For that, RANK _expression_ (preference: machine prefer jobs (defined in the machine configuration file), or jobs prefer machines (in the job description file), and REQUIREMENTS _expression_ (must have: a job needs at least 5GB of memory) should be used*

In terms of 'preferable', things like effective job priorities would be important. Currently in Beaker, if UserA and UserB both have access to
SystemA, they have equal priority to that system. We want to be able to have the system give a higher priority to a job
based on who owns the job, or what groups they belong to (and perhaps other things as well).


*RANK _expression_ (MACHINE): in local machine configuration file (see section 3.5.4 The RANK _expression_ of Condor 8.0.5 manual)*

Another example is a job
that requires a single CPU system (there are few of them) that has to wait forever because that system just happens to be swamped
by other jobs that don't care whether they have 1,2 or 4 CPUs. We would like to be able to tell that system to give preferential
treatment to jobs that require a single CPU.


*I have no experience using that, but it should be possible to configure.*

So the way I imagine it, we basically we would want to use condor from the point where a user's job is analysed and matched to a system,
to where condor returns to say "Here, you can run it on this system to the exclusion of all others".
Beaker would then actually setup and organise the running of the job (Writing PXE cfg files, power cycling the system,
etc etc). Once the Job is complete, Beaker would inform Condor that the system is no longer running a job and so is available to
potentially run another job.


*Condor can take care of all of that. Beaker work is only wait until jobs have run.*

Beaker is written in Python, so all of the interaction with Condor would have to be through it's SOAP API I imagine.

So does Condor seem like it might be a reasonable solution?


*Based on what you described so far, yes Condor will substitute this Beaker part of the system.*

Thanks,
Raymond
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



This message is intended solely for the use of its addressee and may contain privileged or confidential information. All information contained herein shall be treated as confidential and shall not be disclosed to any third party without Embraer’s prior written approval. If you are not the addressee you should not distribute, copy or file this message. In this case, please notify the sender and destroy its contents immediately.
Esta mensagem é para uso exclusivo de seu destinatário e pode conter informações privilegiadas e confidenciais. Todas as informações aqui contidas devem ser tratadas como confidenciais e não devem ser divulgadas a terceiros sem o prévio consentimento por escrito da Embraer. Se você não é o destinatário não deve distribuir, copiar ou arquivar a mensagem. Neste caso, por favor, notifique o remetente da mesma e destrua imediatamente a mensagem.