[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Running a persistent, common 'initialization' job: possible?



Hi Jason,
Thanks for the suggestions...

On Tue, Jul 1, 2008 at 9:18 PM, Jason Stowe <jstowe@xxxxxxxxxxxxxxxxxx> wrote:
> Mark,
> There are many different ways to accomplish this initialization,
> depending upon the specifics of what kind and the reasons for your
> initialization, . You could use the STARTD_CRON functionality. Create
> a STARTD_CRON job that runs with a very large period, a couple of
> billion seconds should do, so that it effectively it only runs when
> the STARTD starts up .This wouldn't be a "Job" per se, but you could
> put the slots in an "Owner" state while it ran. Once the STARTD_CRON
> runs you could set your START expression to True.
>

Yes that would seem to work.  I'll look into it further.

> Depending upon the topology of your pool, you could also create jobs
> for an accounting group with very good priority, and have the jobs run
> first with on_exit_remove to false. Using this method, it might be
> somewhat difficult to *guarantee* that these jobs runs first,
> depending upon the other jobs/policies in your pool.
>

OK, I really need a guarantee that the init job has been run.
I could possibly set/inspect env variables or files to work this out
in each job, but I'd like to keep the jobs as simple as possible.
This could be done using a DAGMan script for each job but that seems
overkill for something that should only run once on a machine.

> There are other ways of guaranteeing initialization, including
> wrapping the condor_master or the condor_startd, using the
> USER_JOB_WRAPPER to verify the node is initialized before processing a
> job, etc. etc. What are the specifics of your use case? Those will
> determine which one of these you use.
>

USER_JOB_WRAPPER seems more fragile.

At this stage I'm thinking things through. Context would (hopefully)
be Amazon's EC2:
 - Manually start the Condor master, jobs are submitted to the Condor
master, including a worker initialization job.
 - A script on the master:
   - Checks job queue length, if >0, starts one, or more, Condor worker AMI's.
   - Use results from condor_ q, condor_stats or condor_status to
decide if a new worker AMI should be started.

I also thought about trying the dynamic deployment facility, but I'm
not an "expert Condor user and administrator" so it wasn't designed
for me :)
It looks powerful, but too elaborate for my situation - although I may
end up evolving to it...

Finally, it seems to me, at the moment, the cleanest solution is to
use some boot/start-up script on the worker AMI to run what is needed
and then start the condor service.

Thanks for the helpful suggestions.

Regards
Mark



> Hope this helps,
> Jason
>
>
> --
> ===================================
> Jason A. Stowe
> cell: 607.227.9686
> main: 888.292.5320
>
> Cycle Computing, LLC
> Leader in Condor Grid Solutions
> Enterprise Condor Support and Management Tools
>
> http://www.cyclecomputing.com
> http://www.cyclecloud.com
>
> On Tue, Jul 1, 2008 at 4:26 AM, Mark V <mvyver@xxxxxxxxx> wrote:
>> Hi Group,
>> I'd appreciate if anyone can indicate if the following is feasible,
>> and how it might be acheived most elegantly:
>>
>> Scenario:
>> A computer starts as a work-only machine, connects to the Condor
>> master and runs a generic 'initialization' job.
>> This job should be persistent, that is, it is not removed so that if N
>> workers commence at or a round the same time they all run the same
>> initalization job.
>>
>> Appreciate any comments or insights.
>>
>>  Mark
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>>
>