[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] [condor-users] Iterative Computations [was: Does stork avoid retransfering data?]



Hi Doug,

Based on your guidance, I implemented
iterative execution with Makeflow by
generating one Makeflow file for
each iteration. It works well.

When I will get to larger problems,
I will adopt the WorkQueue and
coordinate the tasks using the
approach you suggested.

Many thanks.
Gabriel



On Thu, Mar 1, 2012 at 8:54 PM, Douglas Thain <dthain@xxxxxx> wrote:
> Gabriel -
>
> Sorry, I wasn't clear before.
>
> When using Work Queue, you write the "coordinator logic" in C, Perl, or Python
> using the Work Queue API.  Your "tasks" are just your ordinary applications
> that expect to use normal input and output files.
>
> So, in psuedo-code, your Work Queue program would  be this:
>
> do {
>         create_next_generation();
>
>         for each mutation {
>                        t = task_create("evaluate part.n");
>                        specify_input_files("evaluate","part.n","some
> other config file");
>                        specify_output_files("output.n");
>                        task_submit(t);
>         }
>
>         while( tasks_running()>0 ) {
>                       t = task_wait();
>                       score = get_score_from_output_file();
>                       best = MAX(score,best);
>         }
> } while( best < 13 );
>
> If you really want to use Makeflow, I think your trick below won't work, because
> Makeflow doesn't re-evaluate results if they appear to complete successfully.
> (That would make it an O(n^2) algorithm.)
>
> But, you could do a "continuation-passing" style by having one makeflow run
> one generation, the run a script which decides whether it is complete
> and, if not,
> generates a NEW makeflow file with a new name:
>
> stop-test: part1 part2 part1_score part2_score
>              LOCAL test_completion part1_score part2_score part1
> part2 1 > generation.2.mf
>              LOCAL makeflow generation.2.mf
>
> Now, that's going to have some limitations, since you will end up with
> a distinct file
> (and a new makeflow process) for each generation, but it might work
> for a small problem.
>
> That said, I think WQ is a better fit for what you are trying to accomplish.
> We would be happy to help you get it going.
>
> Cheers,
> Doug
>
>
> On Thu, Mar 1, 2012 at 2:01 PM, Gabriel Mateescu
> <gabriel.mateescu@xxxxxxxxx> wrote:
>> Hi Doug,
>>
>> If I understand correctly, the WorkQueue
>> approach requires the master and slave
>> components of the application to use the
>> WorkQueue API to communicate via the queue.
>>
>> Makeflow has the advantage of supporting
>> applications that are only aware of input and
>> output files.
>>
>> Thinking more about makeflow, it seems that it
>> can support loops as follows: I can replace the
>> rule for the stop-test target in the previous message
>> with:
>>
>> stop-test: part1 part2 part1_score part2_score
>>              test_completion part1_score part2_score part1 part2>
>> where test_completion is a tool that checks the stopping
>> criterion using part1_score and part2_score, and if the
>> criterion is not met, test_completion uses part1 and part2
>> to build a new input.data file and then runs makeflow:
>>
>>      makeflow  ga.makefile
>>
>> This seems to work if Makeflow checks the timestamps
>> of the files, not only their existence.
>>
>> Is makeflow aware of the file timestamps?
>>
>> Thank you.
>> Gabriel
>>
>>
>> On Thu, Mar 1, 2012 at 6:43 PM, Douglas Thain <dthain@xxxxxx> wrote:
>>> Gabriele -
>>>
>>> For that kind of application, I suggest that you go one level down in
>>> our stack, and use the Work Queue library:
>>> http://nd.edu/~ccl/software/workqueue/
>>>
>>> Work Queue provides a fork-join programming interface that is Condor-compatible.
>>> You write your app in C, Python, or Perl, link it against the WQ library,
>>> then submit "workers" to Condor, which connect to the application and
>>> start running components.
>>>
>>> WQ has been used to implement things like genetic algorithms, replica
>>> exchange, distributed
>>> image processing -- anything that can be expressed in a fork-join style.
>>>
>>> I suspect you would find it easier to write your application logic
>>> (parsing files, checking conditions)
>>> in a general purpose language, rather than trying to squeeze it into Makeflow.
>>>
>>> Would that meet your needs?
>>>
>>> Doug
>>>
>>>
>>>
>>> On Thu, Mar 1, 2012 at 12:19 PM, Gabriel Mateescu
>>> <gabriel.mateescu@xxxxxxxxx> wrote:
>>>> I am thinking about using Makeflow to
>>>> build a workflow engine that manages
>>>> a genetic algorithm which contains
>>>> two operations:
>>>>
>>>>  1. next_gen tool: generates the next generation
>>>>     of candidates and partitions the candidates
>>>>     in two disjoint parts, part1 and part2;
>>>>
>>>>  2. evaluate tool: computes the fitness of every
>>>>      candidate in either the part1 or part2 input.
>>>>
>>>> A Makefile that encodes the workflow for a
>>>> Genetic Algorithm may look like this:
>>>>
>>>>
>>>>  part1 part2:  input.data
>>>>                    next_gen input.data
>>>>
>>>>  part1_score:  part1
>>>>                     evaluate part1 > part1_score
>>>>
>>>>  part2_score:  part2
>>>>                     evaluate part2 > part2_score
>>>>
>>>>  stop-test:     part1 part2 part1_score part2_score
>>>>                    if ( max(part1_score, part2_score) < 13) {
>>>>                         update input.data part1 part2
>>>>                         make part1 part2
>>>>                    }
>>>>
>>>> The intent of the rule for the target stop-test is to
>>>> specify that a new make cycle should be executed
>>>> if max(part1_score, part2_score) >= 13.
>>>>
>>>> So it seems that in order to implement loops,
>>>> one needs to support a conditional statement
>>>> and the recursive invocation of make-flow.
>>>>
>>>> Thank you.
>>>> Gabriel
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Mar 1, 2012 at 5:46 PM, Douglas Thain <dthain@xxxxxx> wrote:
>>>>> Gabriel -
>>>>>
>>>>> Makeflow does not currently do that; it is just a static DAG.
>>>>>
>>>>> However, this is an important workflow pattern that we have been thinking about,
>>>>> and we could work with you to get something going in a way that is
>>>>> Condor-compatible.
>>>>>
>>>>> Can you share some more detailed use cases?
>>>>> Is there anyone else on the list interested in such a capability?
>>>>>
>>>>> Doug
>>>>>
>>>>>
>>>>> On Thu, Mar 1, 2012 at 11:33 AM, Gabriel Mateescu
>>>>> <gabriel.mateescu@xxxxxxxxx> wrote:
>>>>>> Hi Doug,
>>>>>>
>>>>>> Is it possible to express with Makeflow
>>>>>> iterative computations that repeat a set
>>>>>> of steps until some stopping criterion
>>>>>> is met?
>>>>>>
>>>>>> Thanks.
>>>>>> Gabriel
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 11, 2011 at 5:55 PM, Douglas Thain <dthain@xxxxxx> wrote:
>>>>>>> Thomas -
>>>>>>>
>>>>>>> You might consider using Makeflow for this task:
>>>>>>> http://www.nd.edu/~ccl/software/makeflow
>>>>>>>
>>>>>>> The idea is that you express your tasks in Makeflow, submit a bunch of
>>>>>>> 'worker' processes to Condor, and Makeflow will distribute tasks among
>>>>>>> the workers.  If they have some common executables and input files,
>>>>>>> they will be automatically cached at the workers, so you don't have to
>>>>>>> keep transmitting them.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Doug
>>>>>>>
>>>>>>> On Tue, Jan 11, 2011 at 8:27 AM, Rowe, Thomas <rowet@xxxxxxxxxx> wrote:
>>>>>>>> I have to run a simulation about a thousand times with different seeds.  The
>>>>>>>> simulation executable and data total about 100MB.  This sounds like a job
>>>>>>>> for DAGMan & Stork, because this 100MB collection of files needs to get
>>>>>>>> copied around reliably, and some large output files need to be transferred
>>>>>>>> back to the originating machine reliably.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> My question: Does Stork and/or DAGMan do anything intelligent about avoiding
>>>>>>>> recopying files?  The input files are identical for all thousand runs; only
>>>>>>>> the seed varies.  But I would like to have Condor manage each run
>>>>>>>> individually.  So does all the data and the executable get copied around a
>>>>>>>> thousand times, cleaned up after each run?  If the thousand reps are child
>>>>>>>> to the Stork job that transfers files in place, does everything just work
>>>>>>>> with no extraneous recopying of input data?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Thomas Rowe
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Condor-users mailing list
>>>>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>>>>>> subject: Unsubscribe
>>>>>>>> You can also unsubscribe by visiting
>>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>>>>
>>>>>>>> The archives can be found at:
>>>>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Condor-users mailing list
>>>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>>>>> subject: Unsubscribe
>>>>>>> You can also unsubscribe by visiting
>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>>>
>>>>>>> The archives can be found at:
>>>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>>>> _______________________________________________
>>>>>> Condor-users mailing list
>>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>>>> subject: Unsubscribe
>>>>>> You can also unsubscribe by visiting
>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>>
>>>>>> The archives can be found at:
>>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>>> _______________________________________________
>>>>> Condor-users mailing list
>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>>> subject: Unsubscribe
>>>>> You can also unsubscribe by visiting
>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>
>>>>> The archives can be found at:
>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>> _______________________________________________
>>>> Condor-users mailing list
>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting
>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/condor-users/
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/