[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Application specific scheduler



Miha,

First, I strongly encourage everyone who does not agree with my recommendation to voice their thoughts.

Second, I would argue that Web Interfaces and DRMAA are at different levels of the software stack required to support automation of job dependencies. One way or the other, you will need the functionality provided in by DAGMan in your stack. You can rewrite it, or you can use DAGMan.

Miron


On 6/24/2014 11:29 AM, Miha Ahronovitz wrote:
Miron,

        I would recommend to go with option #2 with the understanding
        that you need to decide whether step 3 of DAG number n will
        submit DAG number n+1 as an independent HTCondor job or whether
        it will create a "nested" DAG so that all jobs will be part of
        one BIG DAG.


        You will also have to keep in mind when the a DAGMan job is
        restarted as it will play back all the nodes including the nodes
        that interact with the database.


To me it sound like explaining something in English using Ndebele words
from Bullawayo. Why not web interface? Why not DRMAA? Why DAGman? If you
"recommend", you kill the discussion and few people will dare to
contradict you.

M



On Mon, Jun 23, 2014 at 3:42 PM, Miron Livny <miron@xxxxxxxxxxx
<mailto:miron@xxxxxxxxxxx>> wrote:

    Nick,

    I would recommend to go with option #2 with the understanding that
    you need to decide whether step 3 of DAG number n will submit DAG
    number n+1 as an independent HTCondor job or whether it will create
    a "nested" DAG so that all jobs will be part of one BIG DAG.

    You will also have to keep in mind when the a DAGMan job is
    restarted as it will play back all the nodes including the nodes
    that interact with the database.

    Miron




    On 6/22/2014 10:07 AM, Nick Cooper wrote:

        Hi All,

        I am currently looking at migrating from our home grown distributed
        computing software to HTCondor. Over the years, user have created
        complex "job managers" written in C++ which are equivalent to
        application specific DAGMan scripts. To reduce the burden on users
        migrating to HTCondor we would like to provide an adaptor
        between a "job
        manager" and HTCondor.

        An example of a simple "Job Manager" is one which (all within
        the same
        cluster):
        1. Requests 1000 simulation jobs to be executed
        2. When all 1000 simulation jobs are completed, creates a
        database and
        loads the results into it
        3. Does analysis on the results in the database and based on the
        analysis requests further simulation jobs to be executed. All
        without
        any user involvement.

          From what I have read our options are:
        1. Web Service: Write an adapter using the SOAP interface. I suspect
        there is not enough feedback regarding when a job completes / fails.
        2. DAGMan: Write an adapter that generates DAGMan scripts.
        3. DRMAA: Write an adapter that submits and monitors jobs via
        the DRMAA API.

        Can someone confirm if I am one the correct track?
        Does anyone have any suggestions / words of wisdom for this kind of
        requirement?

        Further info:
        - Windows based pool
        - Job manager is a C++ DLL
        - Looking at using the current stable release of HTCondor
        - Jobs will run in the Vanilla Universe
        - Jobs will need to be run under the submitters Active Directory
        credentials

        Thanks Nick


        _________________________________________________
        HTCondor-users mailing list
        To unsubscribe, send a message to
        htcondor-users-request@xxxxxxxxxxxxx
        <mailto:htcondor-users-request@xxxxxxxxxxx> with a
        subject: Unsubscribe
        You can also unsubscribe by visiting
        https://lists.cs.wisc.edu/__mailman/listinfo/htcondor-__users
        <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>

        The archives can be found at:
        https://lists.cs.wisc.edu/__archive/htcondor-users/
        <https://lists.cs.wisc.edu/archive/htcondor-users/>

    _________________________________________________
    HTCondor-users mailing list
    To unsubscribe, send a message to
    htcondor-users-request@xxxxxxxxxxxxx
    <mailto:htcondor-users-request@xxxxxxxxxxx> with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/__mailman/listinfo/htcondor-__users
    <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>

    The archives can be found at:
    https://lists.cs.wisc.edu/__archive/htcondor-users/
    <https://lists.cs.wisc.edu/archive/htcondor-users/>




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/