[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Application specific scheduler



I think that DAGMan is the clear winner in the competition
with DRMAA and SOAP because it is a workflow engine, as
opposed to a single job manager, so it provides more functionality.

If there is something that may need improvement in DAGMan,
it is that I do not understand why, in case of failure, one has
to restart the workflow rather than retry the failed jobs, possibly
on different execution nodes.

Gabriel


On Sat, Jun 28, 2014 at 4:15 PM, Miron Livny <miron@xxxxxxxxxxx> wrote:
> Miha,
>
> First, I strongly encourage everyone who does not agree with my
> recommendation to voice their thoughts.
>
> Second, I would argue that Web Interfaces and DRMAA are at different levels
> of the software stack required to support automation of job dependencies.
> One way or the other, you will need the functionality provided in by DAGMan
> in your stack. You can rewrite it, or you can use DAGMan.
>
> Miron
>
>
>
> On 6/24/2014 11:29 AM, Miha Ahronovitz wrote:
>>
>> Miron,
>>
>>         I would recommend to go with option #2 with the understanding
>>         that you need to decide whether step 3 of DAG number n will
>>         submit DAG number n+1 as an independent HTCondor job or whether
>>         it will create a "nested" DAG so that all jobs will be part of
>>         one BIG DAG.
>>
>>
>>         You will also have to keep in mind when the a DAGMan job is
>>         restarted as it will play back all the nodes including the nodes
>>         that interact with the database.
>>
>>
>> To me it sound like explaining something in English using Ndebele words
>> from Bullawayo. Why not web interface? Why not DRMAA? Why DAGman? If you
>> "recommend", you kill the discussion and few people will dare to
>> contradict you.
>>
>> M
>>
>>
>>
>> On Mon, Jun 23, 2014 at 3:42 PM, Miron Livny <miron@xxxxxxxxxxx
>> <mailto:miron@xxxxxxxxxxx>> wrote:
>>
>>     Nick,
>>
>>     I would recommend to go with option #2 with the understanding that
>>     you need to decide whether step 3 of DAG number n will submit DAG
>>     number n+1 as an independent HTCondor job or whether it will create
>>     a "nested" DAG so that all jobs will be part of one BIG DAG.
>>
>>     You will also have to keep in mind when the a DAGMan job is
>>     restarted as it will play back all the nodes including the nodes
>>     that interact with the database.
>>
>>     Miron
>>
>>
>>
>>
>>     On 6/22/2014 10:07 AM, Nick Cooper wrote:
>>
>>         Hi All,
>>
>>         I am currently looking at migrating from our home grown
>> distributed
>>         computing software to HTCondor. Over the years, user have created
>>         complex "job managers" written in C++ which are equivalent to
>>         application specific DAGMan scripts. To reduce the burden on users
>>         migrating to HTCondor we would like to provide an adaptor
>>         between a "job
>>         manager" and HTCondor.
>>
>>         An example of a simple "Job Manager" is one which (all within
>>         the same
>>         cluster):
>>         1. Requests 1000 simulation jobs to be executed
>>         2. When all 1000 simulation jobs are completed, creates a
>>         database and
>>         loads the results into it
>>         3. Does analysis on the results in the database and based on the
>>         analysis requests further simulation jobs to be executed. All
>>         without
>>         any user involvement.
>>
>>           From what I have read our options are:
>>         1. Web Service: Write an adapter using the SOAP interface. I
>> suspect
>>         there is not enough feedback regarding when a job completes /
>> fails.
>>         2. DAGMan: Write an adapter that generates DAGMan scripts.
>>         3. DRMAA: Write an adapter that submits and monitors jobs via
>>         the DRMAA API.
>>
>>         Can someone confirm if I am one the correct track?
>>         Does anyone have any suggestions / words of wisdom for this kind
>> of
>>         requirement?
>>
>>         Further info:
>>         - Windows based pool
>>         - Job manager is a C++ DLL
>>         - Looking at using the current stable release of HTCondor
>>         - Jobs will run in the Vanilla Universe
>>         - Jobs will need to be run under the submitters Active Directory
>>         credentials
>>
>>         Thanks Nick
>>
>>
>>         _________________________________________________
>>
>>         HTCondor-users mailing list
>>         To unsubscribe, send a message to
>>         htcondor-users-request@xxxxxxxxxxxxx
>>         <mailto:htcondor-users-request@xxxxxxxxxxx> with a
>>
>>         subject: Unsubscribe
>>         You can also unsubscribe by visiting
>>         https://lists.cs.wisc.edu/__mailman/listinfo/htcondor-__users
>>
>>         <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
>>
>>         The archives can be found at:
>>         https://lists.cs.wisc.edu/__archive/htcondor-users/
>>         <https://lists.cs.wisc.edu/archive/htcondor-users/>
>>
>>     _________________________________________________
>>
>>     HTCondor-users mailing list
>>     To unsubscribe, send a message to
>>     htcondor-users-request@xxxxxxxxxxxxx
>>     <mailto:htcondor-users-request@xxxxxxxxxxx> with a
>>
>>     subject: Unsubscribe
>>     You can also unsubscribe by visiting
>>     https://lists.cs.wisc.edu/__mailman/listinfo/htcondor-__users
>>
>>     <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
>>
>>     The archives can be found at:
>>     https://lists.cs.wisc.edu/__archive/htcondor-users/
>>
>>     <https://lists.cs.wisc.edu/archive/htcondor-users/>
>>
>>
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with
>> a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/