[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] extend ClassAds with hawkeye



On Wed November 29 2006 10:10 am, Junjun Mao wrote:
> Dear Hawkeye users and developers,
>
>
> Here is what I want to do:
> What shall I do to ask hawkeye to insert ClassAds to Condor ClassAds?
> I need to use hawkeye to extend the ClassAds so that MPI jobs will only
> go to the nodes with enough semaphore resources.
>
> Here is what I already have:
> I have a cluster running Condor as the job scheduler. The master node
> is a submit machine, the file server is a central manager and slave
> nodes are execute machines.  Hawkeye and who module run on each execute
> machines. Submit machine and central manager remain unchanged. I can
> get hawkeye ClassAds with "condor_status -l".
>
> Here is the problem:
> ClassAds look like from Condor startd and Hawkeye startd alternatively.
> I probably have missed something in the configuration.
>
> Nick suggested to run a second Collector and point hawkeye startd to it,
> but I am kind of lost with this.

Yes, I did, but that was because you hadn't explained what you were trying to 
do...

Hmm, I'd *swear* that I posted an updated reply to the thread yesterday, but 
somehow it got lost..

What I was trying to say is that the word Hawkeye is somewhat overloaded.  
Basically, we have a Hawkeye mechanism built into the Condor Startd - 
the "cron" logic, and it's typically used in two different ways.

1. As Hawkeye.  This is what I thought you meant, but not what you want, hence 
the confusion.

In this mode, you download and install a separate set of hawkeye executables, 
etc. that are a "real" Hawkeye.  You setup a separate Hawkeye collector, and 
all of these Hawkeye startds report to this collector, and it's completely 
independent of your main Condor startds.


2. As a means to publish extra things into your machine ad - we don't call 
this "Hawkeye", but it does use the same mechanism.  This is what you want.

In this mode, you use the cron mechanisms built into the startd to enhance 
your existing machine ad, but you wouldn't install or run the Hawkeye 
binaries.  You have just a single startd running, and it's reporting to your 
main Condor collector(s).


So, what you want to do is this:

1. In your condor configuration, add something that looks like this:


MODULES = /path/to/your/modules
STARTD_CRON_NAME = cron
CRON_JOBLIST =

Then, for each "module" that you want to run, add a section like this:

##
## Configuration for Module foo
##	See if we should run foo jobs on this host
CRON_JOBLIST = $(HAWKEYE_JOBLIST) foo
CRON_FOO_PREFIX = foo_
CRON_FOO_EXECUTABLE = $(MODULES)/foo
CRON_FOO_PERIOD = 1h
CRON_FOO_MODE = periodic
CRON_FOO_RECONFIG = false
CRON_FOO_KILL = true
CRON_FOO_ARGS =
##  Parameters for module foo:

Then, reconfig your startd.

Remember: 1 startd pointing at your main collector, no "Hawkeye" proper, no 
Hawkeye startd, no Hawkeye collector.

Hope this helps

-Nick

-- 
           <<< Why, oh, why, didn't I take the blue pill? >>>
 /`-_    Nicholas R. LeRoy               The Condor Project
{     }/ http://www.cs.wisc.edu/~nleroy  http://www.cs.wisc.edu/condor
 \    /  nleroy@xxxxxxxxxxx              The University of Wisconsin
 |_*_|   608-265-5761                    Department of Computer Sciences