[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] [condor-users] Iterative Computations [was: Does stork avoid retransfering data?]



Gabriel -

Sorry, I wasn't clear before.

When using Work Queue, you write the "coordinator logic" in C, Perl, or Python
using the Work Queue API.  Your "tasks" are just your ordinary applications
that expect to use normal input and output files.

So, in psuedo-code, your Work Queue program would  be this:

do {
         create_next_generation();

         for each mutation {
                        t = task_create("evaluate part.n");
                        specify_input_files("evaluate","part.n","some
other config file");
                        specify_output_files("output.n");
                        task_submit(t);
         }

         while( tasks_running()>0 ) {
                       t = task_wait();
                       score = get_score_from_output_file();
                       best = MAX(score,best);
         }
} while( best < 13 );

If you really want to use Makeflow, I think your trick below won't work, because
Makeflow doesn't re-evaluate results if they appear to complete successfully.
(That would make it an O(n^2) algorithm.)

But, you could do a "continuation-passing" style by having one makeflow run
one generation, the run a script which decides whether it is complete
and, if not,
generates a NEW makeflow file with a new name:

stop-test: part1 part2 part1_score part2_score
              LOCAL test_completion part1_score part2_score part1
part2 1 > generation.2.mf
              LOCAL makeflow generation.2.mf

Now, that's going to have some limitations, since you will end up with
a distinct file
(and a new makeflow process) for each generation, but it might work
for a small problem.

That said, I think WQ is a better fit for what you are trying to accomplish.
We would be happy to help you get it going.

Cheers,
Doug


On Thu, Mar 1, 2012 at 2:01 PM, Gabriel Mateescu
<gabriel.mateescu@xxxxxxxxx> wrote:
> Hi Doug,
>
> If I understand correctly, the WorkQueue
> approach requires the master and slave
> components of the application to use the
> WorkQueue API to communicate via the queue.
>
> Makeflow has the advantage of supporting
> applications that are only aware of input and
> output files.
>
> Thinking more about makeflow, it seems that it
> can support loops as follows: I can replace the
> rule for the stop-test target in the previous message
> with:
>
> stop-test: part1 part2 part1_score part2_score
>              test_completion part1_score part2_score part1 part2>
> where test_completion is a tool that checks the stopping
> criterion using part1_score and part2_score, and if the
> criterion is not met, test_completion uses part1 and part2
> to build a new input.data file and then runs makeflow:
>
>      makeflow  ga.makefile
>
> This seems to work if Makeflow checks the timestamps
> of the files, not only their existence.
>
> Is makeflow aware of the file timestamps?
>
> Thank you.
> Gabriel
>
>
> On Thu, Mar 1, 2012 at 6:43 PM, Douglas Thain <dthain@xxxxxx> wrote:
>> Gabriele -
>>
>> For that kind of application, I suggest that you go one level down in
>> our stack, and use the Work Queue library:
>> http://nd.edu/~ccl/software/workqueue/
>>
>> Work Queue provides a fork-join programming interface that is Condor-compatible.
>> You write your app in C, Python, or Perl, link it against the WQ library,
>> then submit "workers" to Condor, which connect to the application and
>> start running components.
>>
>> WQ has been used to implement things like genetic algorithms, replica
>> exchange, distributed
>> image processing -- anything that can be expressed in a fork-join style.
>>
>> I suspect you would find it easier to write your application logic
>> (parsing files, checking conditions)
>> in a general purpose language, rather than trying to squeeze it into Makeflow.
>>
>> Would that meet your needs?
>>
>> Doug
>>
>>
>>
>> On Thu, Mar 1, 2012 at 12:19 PM, Gabriel Mateescu
>> <gabriel.mateescu@xxxxxxxxx> wrote:
>>> I am thinking about using Makeflow to
>>> build a workflow engine that manages
>>> a genetic algorithm which contains
>>> two operations:
>>>
>>>  1. next_gen tool: generates the next generation
>>>     of candidates and partitions the candidates
>>>     in two disjoint parts, part1 and part2;
>>>
>>>  2. evaluate tool: computes the fitness of every
>>>      candidate in either the part1 or part2 input.
>>>
>>> A Makefile that encodes the workflow for a
>>> Genetic Algorithm may look like this:
>>>
>>>
>>>  part1 part2:  input.data
>>>                    next_gen input.data
>>>
>>>  part1_score:  part1
>>>                     evaluate part1 > part1_score
>>>
>>>  part2_score:  part2
>>>                     evaluate part2 > part2_score
>>>
>>>  stop-test:     part1 part2 part1_score part2_score
>>>                    if ( max(part1_score, part2_score) < 13) {
>>>                         update input.data part1 part2
>>>                         make part1 part2
>>>                    }
>>>
>>> The intent of the rule for the target stop-test is to
>>> specify that a new make cycle should be executed
>>> if max(part1_score, part2_score) >= 13.
>>>
>>> So it seems that in order to implement loops,
>>> one needs to support a conditional statement
>>> and the recursive invocation of make-flow.
>>>
>>> Thank you.
>>> Gabriel
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Mar 1, 2012 at 5:46 PM, Douglas Thain <dthain@xxxxxx> wrote:
>>>> Gabriel -
>>>>
>>>> Makeflow does not currently do that; it is just a static DAG.
>>>>
>>>> However, this is an important workflow pattern that we have been thinking about,
>>>> and we could work with you to get something going in a way that is
>>>> Condor-compatible.
>>>>
>>>> Can you share some more detailed use cases?
>>>> Is there anyone else on the list interested in such a capability?
>>>>
>>>> Doug
>>>>
>>>>
>>>> On Thu, Mar 1, 2012 at 11:33 AM, Gabriel Mateescu
>>>> <gabriel.mateescu@xxxxxxxxx> wrote:
>>>>> Hi Doug,
>>>>>
>>>>> Is it possible to express with Makeflow
>>>>> iterative computations that repeat a set
>>>>> of steps until some stopping criterion
>>>>> is met?
>>>>>
>>>>> Thanks.
>>>>> Gabriel
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 11, 2011 at 5:55 PM, Douglas Thain <dthain@xxxxxx> wrote:
>>>>>> Thomas -
>>>>>>
>>>>>> You might consider using Makeflow for this task:
>>>>>> http://www.nd.edu/~ccl/software/makeflow
>>>>>>
>>>>>> The idea is that you express your tasks in Makeflow, submit a bunch of
>>>>>> 'worker' processes to Condor, and Makeflow will distribute tasks among
>>>>>> the workers.  If they have some common executables and input files,
>>>>>> they will be automatically cached at the workers, so you don't have to
>>>>>> keep transmitting them.
>>>>>>
>>>>>> Cheers,
>>>>>> Doug
>>>>>>
>>>>>> On Tue, Jan 11, 2011 at 8:27 AM, Rowe, Thomas <rowet@xxxxxxxxxx> wrote:
>>>>>>> I have to run a simulation about a thousand times with different seeds.  The
>>>>>>> simulation executable and data total about 100MB.  This sounds like a job
>>>>>>> for DAGMan & Stork, because this 100MB collection of files needs to get
>>>>>>> copied around reliably, and some large output files need to be transferred
>>>>>>> back to the originating machine reliably.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> My question: Does Stork and/or DAGMan do anything intelligent about avoiding
>>>>>>> recopying files?  The input files are identical for all thousand runs; only
>>>>>>> the seed varies.  But I would like to have Condor manage each run
>>>>>>> individually.  So does all the data and the executable get copied around a
>>>>>>> thousand times, cleaned up after each run?  If the thousand reps are child
>>>>>>> to the Stork job that transfers files in place, does everything just work
>>>>>>> with no extraneous recopying of input data?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Thomas Rowe
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Condor-users mailing list
>>>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>>>>> subject: Unsubscribe
>>>>>>> You can also unsubscribe by visiting
>>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>>>
>>>>>>> The archives can be found at:
>>>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Condor-users mailing list
>>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>>>> subject: Unsubscribe
>>>>>> You can also unsubscribe by visiting
>>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>>
>>>>>> The archives can be found at:
>>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>>> _______________________________________________
>>>>> Condor-users mailing list
>>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>>> subject: Unsubscribe
>>>>> You can also unsubscribe by visiting
>>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>
>>>>> The archives can be found at:
>>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>> _______________________________________________
>>>> Condor-users mailing list
>>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting
>>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/condor-users/
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/condor-users/
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/