[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor Group Drive me Crazy.......



Ah. I understand what you want to happen now. I'm not sure what setup you need though.

Having all group_vcs jobs submitted with a concurrency limit set at 13 seems like it would be ideal but you've said that doesn't work for some reason.

Could you configure 13 of the slots in your pool with some special parameter like "NeedFLEXLM = True" and then have the jobs for the group_vcs all require it? That would give you a pool of 13 machines where the jobs could run. Your quota setup would then allow any one sub group to use all of them if only one subgroup had jobs and *might* get the pre-emption to kick in when you have queued jobs from multiple sub groups but I'm not sure.

Sounds like a crazy idea but that's all I can think of...

joe

Sassy Natan wrote:


On Mon, Nov 14, 2011 at 9:14 PM, Joe Boyd <boyd@xxxxxxxx <mailto:boyd@xxxxxxxx>> wrote:

    Why can't you just set all the GROUP_ACCEPT_SURPLUS parameters to
    FALSE?  What doesn't work the way you want then?


Well if I set it to false (based on the configuration) the GROUP_QUOTA_group_vcs.____verification_single will only have up to 5 jobs.... If the GROUP_ACCEPT_SURPLUS is set to true then if no one running a job from the other groups then it will have 13.

The all idea is to have some control on jobs that require FlexLM lic.
I know about Concurrency /Limits /but I found it not to be a good option in my case.

My users want to make sure that If the provide one of the VCS group it will start run right away. (this under the considerations that no other jobs from the same group is in the pool. So if 10 jobs already exists in the pool from the group design_____single, and 10 more jobs are being submitted with the design_____single group definition, then it is understood that the pool will be process in a FIFO way)


Here is the conf again:
       GROUP_QUOTA_group_vcs = 13
       GROUP_QUOTA_group_vcs.design_____single = 4
       GROUP_QUOTA_group_vcs.design_____list = 1
       GROUP_QUOTA_group_vcs.____verification_single = 5
       GROUP_QUOTA_group_vcs.____verification_list  = 3


    joe

    Sassy Natan wrote:

        Thanks For the Help Man!

        On Mon, Nov 14, 2011 at 8:01 PM, Joe Boyd <boyd@xxxxxxxx
        <mailto:boyd@xxxxxxxx> <mailto:boyd@xxxxxxxx
        <mailto:boyd@xxxxxxxx>>> wrote:

           I may have missed a part of the thread or something as I'm
        not sure
           what you're trying to have it do in the end.

           If you have any of the GROUP_ACCEPT_SURPLUS_* parameters set
        to TRUE
           that group will end up running more than the quota set if you
        have
           free slots.  You said it only gets 13 jobs running when you
        submit
           that first job.  Is that true even after several negotiation
           cycles???  I'm surprised it wouldn't run more.
        yes, even after several negotiation cycles running job not going
        up more then 13.

        If I send the two submission file the same time (60 job totals
        of two group)  I also don't get more then 13 jobs...
I see that you have

           GROUP_ACCEPT_SURPLUS_group_vcs = FALSE

           but I don't think that's going to make the top level
        group_vcs not
           go above it's 13 if the subgroups have it TRUE.  I'm not sure
        that
           parameter is really enforced down the hierarchy (sounds like it's
           not from your experience).  Is that why you're saying it
        shouldn't
           run more than 13?  Because of the setting I quote above?

        Yes this is what I'm saying.....

        In that case I don't understand the preemption method.
        what do u suggester?

joe


           Sassy Natan wrote:

               Hi Joe,

               Well yes this is true... When setting the
        GROUP_ACCEPT_SURPLUS_*
               to FALSE jobs doesn't leap outside the quota limit.

               However, Since this configuration use Sub Groups I expect to
               have dynamic allocation inside the group.
               So in my configuration:

               GROUP_QUOTA_group_vcs = 13
               GROUP_QUOTA_group_vcs.design_____single = 4
               GROUP_QUOTA_group_vcs.design_____list = 1
               GROUP_QUOTA_group_vcs.____verification_single = 5
               GROUP_QUOTA_group_vcs.____verification_list  = 3
                        The VCS group has limit of 13 slots right?

               So when someone from the vcs.verification_single send a Job
               (queue 30) - and the pool is clean (no jobs at the
        moment) the
               number of current running jobs should be 13 (17 in idle)
               This in fact what happen when I submit the job. But once
        a send
               a new job (queue 30) - from the verification_list I would
        expect
               that at least 3 jobs will run right away, causing 3 jobs from
               the  vcs.verification_single group to be preempted or killed.
               However what is happening is that the 13 jobs of the
                vcs.verification_single group are keep running and 3
        something
               even 4 jobs being added to running state. Leaving me with
        total
               of 16-17 running jobs which is not good.

               Any Guess?

               I working on this all day without any luck :-(

               Thanks
               Sassy          On Mon, Nov 14, 2011 at 5:43 PM, Joe Boyd
        <boyd@xxxxxxxx <mailto:boyd@xxxxxxxx>
               <mailto:boyd@xxxxxxxx <mailto:boyd@xxxxxxxx>>
        <mailto:boyd@xxxxxxxx <mailto:boyd@xxxxxxxx>

               <mailto:boyd@xxxxxxxx <mailto:boyd@xxxxxxxx>>>> wrote:

                  If you want those groups to be limited to only what
        the quota has
                  you don't want to set these to TRUE do you?

                  GROUP_ACCEPT_SURPLUS_group_______vcs.verification_list
         = TRUE
GROUP_ACCEPT_SURPLUS_group_______vcs.verification_single = TRUE



                  That's telling it that those groups can use any "surplus"
               slots in
                  the pool outside of the quota configuration if no one
        else is
               using
                  them. If you set those to FALSE doesn't it do what you
        want?

                  joe


                  Sassy Natan wrote:

                      Hi Again....

                      I'm kind of lost here.
                      Enable debug mode and check the logs and still no
        good.


                      I attach the condor.local.conf file ....


                      Thanks for the help....


                      On Sun, Nov 13, 2011 at 6:00 PM, Sassy Natan
               <sassyn@xxxxxxxxx <mailto:sassyn@xxxxxxxxx>
        <mailto:sassyn@xxxxxxxxx <mailto:sassyn@xxxxxxxxx>>
                      <mailto:sassyn@xxxxxxxxx <mailto:sassyn@xxxxxxxxx>
        <mailto:sassyn@xxxxxxxxx <mailto:sassyn@xxxxxxxxx>>>
               <mailto:sassyn@xxxxxxxxx <mailto:sassyn@xxxxxxxxx>
        <mailto:sassyn@xxxxxxxxx <mailto:sassyn@xxxxxxxxx>>

                      <mailto:sassyn@xxxxxxxxx <mailto:sassyn@xxxxxxxxx>
        <mailto:sassyn@xxxxxxxxx <mailto:sassyn@xxxxxxxxx>>>>> wrote:

                         Hi All
                         Here is cut and paste from my condor
        configuration file:

                         GROUP_NAMES = GROUP_VCS, GROUP_VCS.DESIGN_SINGLE,
                         GROUP_VCS.DESIGN_LIST,
        GROUP_VCS.VERIFICATION_SINGLE,
                         GROUP_VCS.VERIFICATION_LIST

                         GROUP_QUOTA_group_vcs = 13
                         GROUP_QUOTA_group_vcs.design_______single = 4
                         GROUP_QUOTA_group_vcs.design_______list = 1
                         GROUP_QUOTA_group_vcs.______verification_single = 5
                         GROUP_QUOTA_group_vcs.______verification_list  = 3




                         GROUP_AUTOREGROUP = FALSE
                         GROUP_ACCEPT_SURPLUS = FALSE

                         GROUP_AUTOREGROUP_group_vcs = FALSE
                         GROUP_ACCEPT_SURPLUS_group_vcs = FALSE

                         GROUP_AUTOREGROUP_group_vcs.______design_single
        = FALSE
GROUP_ACCEPT_SURPLUS_group_______vcs.design_single = TRUE

                         GROUP_AUTOREGROUP_group_vcs.______design_list =
        FALSE
GROUP_ACCEPT_SURPLUS_group_______vcs.design_list = TRUE

GROUP_AUTOREGROUP_group_vcs.______verification_single =
               FALSE
GROUP_ACCEPT_SURPLUS_group_______vcs.verification_single
               = TRUE

GROUP_AUTOREGROUP_group_vcs.______verification_list = FALSE GROUP_ACCEPT_SURPLUS_group_______vcs.verification_list

                = TRUE



                         I have now 2 submission files, each with 100
        Jobs....
                         submit the first file name:
        verification_single.sub start
                      processing
                         13 jobs as expected (with the
                         group group_vcs.verification_single specified
        in the
               submit file)

                         so far everything is good...
                         after 5 min I now submitting the next file
                         name verification_list.sub (with the
                         group group_vcs.verification_list specified in the
               submit file)

                         Expected results are that at least 4 jobs from
                      verification_list.sub
                         will start run and total of 13 fobs will run in the
               cluster.
                          All other 187 jobs should be idle consider none of
               them as
                      finished
                         (Each submission include 100 jobs).

                         However the real results is that I get 18 jobs
        running
               which
                      is not
                         good! Why? Why? Why? Why?
                         I just don't understand it.

                         I also enable NEGOTIATOR_CONSIDER_PREEMPTION
        since I would
                      like to
                         use PREEMPTION.
                         I would expect that from the 13 running process
        from
                         the verification_single.sub submission, once I
        submit
                         the  verification_list.sub, 4 jobs will be
        PREEMPT...

                         Takes for any help....
                         Sassy




------------------------------______--------------------------__--__--__------------

                      _____________________________________________________


                      Condor-users mailing list
                      To unsubscribe, send a message to
                      condor-users-request@xxxxxxxxxxxxxxxxx
                      <mailto:condor-users-request@
        <mailto:condor-users-request@>____cs.wisc.edu <http://cs.wisc.edu>

               <mailto:condor-users-request@xxxxxxxxxxxxx
        <mailto:condor-users-request@xxxxxxxxxxx>>> with a

                      subject: Unsubscribe
                      You can also unsubscribe by visiting
https://lists.cs.wisc.edu/______mailman/listinfo/condor-users
        <https://lists.cs.wisc.edu/____mailman/listinfo/condor-users>
<https://lists.cs.wisc.edu/____mailman/listinfo/condor-users
        <https://lists.cs.wisc.edu/__mailman/listinfo/condor-users>>


<https://lists.cs.wisc.edu/____mailman/listinfo/condor-users
        <https://lists.cs.wisc.edu/__mailman/listinfo/condor-users>
<https://lists.cs.wisc.edu/__mailman/listinfo/condor-users
        <https://lists.cs.wisc.edu/mailman/listinfo/condor-users>>__>

                      The archives can be found at:
https://lists.cs.wisc.edu/______archive/condor-users/
        <https://lists.cs.wisc.edu/____archive/condor-users/>
               <https://lists.cs.wisc.edu/____archive/condor-users/
        <https://lists.cs.wisc.edu/__archive/condor-users/>>
<https://lists.cs.wisc.edu/____archive/condor-users/
        <https://lists.cs.wisc.edu/__archive/condor-users/>
               <https://lists.cs.wisc.edu/__archive/condor-users/
        <https://lists.cs.wisc.edu/archive/condor-users/>>>

                  _____________________________________________________

                  Condor-users mailing list
                  To unsubscribe, send a message to
               condor-users-request@xxxxxxxxxxxxxxxxx
                  <mailto:condor-users-request@
        <mailto:condor-users-request@>____cs.wisc.edu <http://cs.wisc.edu>

               <mailto:condor-users-request@xxxxxxxxxxxxx
        <mailto:condor-users-request@xxxxxxxxxxx>>> with a

                  subject: Unsubscribe
                  You can also unsubscribe by visiting
https://lists.cs.wisc.edu/______mailman/listinfo/condor-users
        <https://lists.cs.wisc.edu/____mailman/listinfo/condor-users>
<https://lists.cs.wisc.edu/____mailman/listinfo/condor-users
        <https://lists.cs.wisc.edu/__mailman/listinfo/condor-users>>


<https://lists.cs.wisc.edu/____mailman/listinfo/condor-users
        <https://lists.cs.wisc.edu/__mailman/listinfo/condor-users>
<https://lists.cs.wisc.edu/__mailman/listinfo/condor-users
        <https://lists.cs.wisc.edu/mailman/listinfo/condor-users>>__>

                  The archives can be found at:
                  https://lists.cs.wisc.edu/______archive/condor-users/
        <https://lists.cs.wisc.edu/____archive/condor-users/>
               <https://lists.cs.wisc.edu/____archive/condor-users/
        <https://lists.cs.wisc.edu/__archive/condor-users/>>
                  <https://lists.cs.wisc.edu/____archive/condor-users/
        <https://lists.cs.wisc.edu/__archive/condor-users/>
               <https://lists.cs.wisc.edu/__archive/condor-users/
        <https://lists.cs.wisc.edu/archive/condor-users/>>>





------------------------------____----------------------------__--__------------

               ___________________________________________________
               Condor-users mailing list
               To unsubscribe, send a message to
               condor-users-request@xxxxxxxxxxxxxxx
               <mailto:condor-users-request@xxxxxxxxxxxxx
        <mailto:condor-users-request@xxxxxxxxxxx>> with a
               subject: Unsubscribe
               You can also unsubscribe by visiting
https://lists.cs.wisc.edu/____mailman/listinfo/condor-users
        <https://lists.cs.wisc.edu/__mailman/listinfo/condor-users>
<https://lists.cs.wisc.edu/__mailman/listinfo/condor-users
        <https://lists.cs.wisc.edu/mailman/listinfo/condor-users>>

               The archives can be found at:
               https://lists.cs.wisc.edu/____archive/condor-users/
        <https://lists.cs.wisc.edu/__archive/condor-users/>
               <https://lists.cs.wisc.edu/__archive/condor-users/
        <https://lists.cs.wisc.edu/archive/condor-users/>>

           ___________________________________________________
           Condor-users mailing list
           To unsubscribe, send a message to
        condor-users-request@xxxxxxxxxxxxxxx
           <mailto:condor-users-request@xxxxxxxxxxxxx
        <mailto:condor-users-request@xxxxxxxxxxx>> with a
           subject: Unsubscribe
           You can also unsubscribe by visiting
           https://lists.cs.wisc.edu/____mailman/listinfo/condor-users
        <https://lists.cs.wisc.edu/__mailman/listinfo/condor-users>
           <https://lists.cs.wisc.edu/__mailman/listinfo/condor-users
        <https://lists.cs.wisc.edu/mailman/listinfo/condor-users>>

           The archives can be found at:
           https://lists.cs.wisc.edu/____archive/condor-users/
        <https://lists.cs.wisc.edu/__archive/condor-users/>
           <https://lists.cs.wisc.edu/__archive/condor-users/
        <https://lists.cs.wisc.edu/archive/condor-users/>>



        ------------------------------__------------------------------__------------

        _________________________________________________
        Condor-users mailing list
        To unsubscribe, send a message to
        condor-users-request@xxxxxxxxxxxxx
        <mailto:condor-users-request@xxxxxxxxxxx> with a
        subject: Unsubscribe
        You can also unsubscribe by visiting
        https://lists.cs.wisc.edu/__mailman/listinfo/condor-users
        <https://lists.cs.wisc.edu/mailman/listinfo/condor-users>

        The archives can be found at:
        https://lists.cs.wisc.edu/__archive/condor-users/
        <https://lists.cs.wisc.edu/archive/condor-users/>

    _________________________________________________
    Condor-users mailing list
    To unsubscribe, send a message to condor-users-request@xxxxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx> with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/__mailman/listinfo/condor-users
    <https://lists.cs.wisc.edu/mailman/listinfo/condor-users>

    The archives can be found at:
    https://lists.cs.wisc.edu/__archive/condor-users/
    <https://lists.cs.wisc.edu/archive/condor-users/>



------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/