[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor Group Drive me Crazy.......



On Mon, Nov 14, 2011 at 06:43:15PM +0200, Sassy Natan wrote:
> Hi Joe,
>
> Well yes this is true...
> When setting the GROUP_ACCEPT_SURPLUS_* to FALSE jobs doesn't leap outside
> the quota limit.
>
> However, Since this configuration use Sub Groups I expect to have dynamic
> allocation inside the group.
> So in my configuration:
>
> GROUP_QUOTA_group_vcs = 13
> GROUP_QUOTA_group_vcs.design_single = 4
> GROUP_QUOTA_group_vcs.design_list = 1
> GROUP_QUOTA_group_vcs.verification_single = 5
> GROUP_QUOTA_group_vcs.verification_list  = 3
>
>
> The VCS group has limit of 13 slots right?
>
> So when someone from the vcs.verification_single send a Job (queue 30) -
> and the pool is clean (no jobs at the moment) the number of current running
> jobs should be 13 (17 in idle)
> This in fact what happen when I submit the job. But once a send a new job (
> queue 30) - from the verification_list I would expect that at least 3 jobs
> will run right away, causing 3 jobs from the  vcs.verification_single group
> to be preempted or killed.

Just one note, you have next three lines in your local configuration file
(from previous mail)

SUSPEND = FALSE
PREEMPT = FALSE
KILL = FALSE

This is a reason, why jobs will not be preempted or killed.

Try to look look at Policy settings
http://www.cs.wisc.edu/condor/manual/v7.6/3_5Policy_Configuration.html

Lukas

> However what is happening is that the 13 jobs of the  vcs.verification_single
> group are keep running and 3 something even 4 jobs being added to running
> state. Leaving me with total of 16-17 running jobs which is not good.
>
> Any Guess?
>
> I working on this all day without any luck :-(
>
> Thanks
> Sassy
>
> On Mon, Nov 14, 2011 at 5:43 PM, Joe Boyd <boyd@xxxxxxxx> wrote:
>
> > If you want those groups to be limited to only what the quota has you
> > don't want to set these to TRUE do you?
> >
> > GROUP_ACCEPT_SURPLUS_group_**vcs.verification_list  = TRUE
> > GROUP_ACCEPT_SURPLUS_group_**vcs.verification_single = TRUE
> >
> > That's telling it that those groups can use any "surplus" slots in the
> > pool outside of the quota configuration if no one else is using them. If
> > you set those to FALSE doesn't it do what you want?
> >
> > joe
> >
> >
> > Sassy Natan wrote:
> >
> >> Hi Again....
> >>
> >> I'm kind of lost here.
> >> Enable debug mode and check the logs and still no good.
> >>
> >>
> >> I attach the condor.local.conf file ....
> >>
> >>
> >> Thanks for the help....
> >>
> >>
> >> On Sun, Nov 13, 2011 at 6:00 PM, Sassy Natan <sassyn@xxxxxxxxx <mailto:
> >> sassyn@xxxxxxxxx>> wrote:
> >>
> >>    Hi All
> >>    Here is cut and paste from my condor configuration file:
> >>
> >>    GROUP_NAMES = GROUP_VCS, GROUP_VCS.DESIGN_SINGLE,
> >>    GROUP_VCS.DESIGN_LIST, GROUP_VCS.VERIFICATION_SINGLE,
> >>    GROUP_VCS.VERIFICATION_LIST
> >>
> >>    GROUP_QUOTA_group_vcs = 13
> >>    GROUP_QUOTA_group_vcs.design_**single = 4
> >>    GROUP_QUOTA_group_vcs.design_**list = 1
> >>    GROUP_QUOTA_group_vcs.**verification_single = 5
> >>    GROUP_QUOTA_group_vcs.**verification_list  = 3
> >>
> >>
> >>    GROUP_AUTOREGROUP = FALSE
> >>    GROUP_ACCEPT_SURPLUS = FALSE
> >>
> >>    GROUP_AUTOREGROUP_group_vcs = FALSE
> >>    GROUP_ACCEPT_SURPLUS_group_vcs = FALSE
> >>
> >>    GROUP_AUTOREGROUP_group_vcs.**design_single = FALSE
> >>    GROUP_ACCEPT_SURPLUS_group_**vcs.design_single = TRUE
> >>
> >>    GROUP_AUTOREGROUP_group_vcs.**design_list = FALSE
> >>    GROUP_ACCEPT_SURPLUS_group_**vcs.design_list = TRUE
> >>
> >>    GROUP_AUTOREGROUP_group_vcs.**verification_single = FALSE
> >>    GROUP_ACCEPT_SURPLUS_group_**vcs.verification_single = TRUE
> >>
> >>    GROUP_AUTOREGROUP_group_vcs.**verification_list  = FALSE
> >>    GROUP_ACCEPT_SURPLUS_group_**vcs.verification_list  = TRUE
> >>
> >>
> >>    I have now 2 submission files, each with 100 Jobs....
> >>    submit the first file name: verification_single.sub start processing
> >>    13 jobs as expected (with the
> >>    group group_vcs.verification_single specified in the submit file)
> >>
> >>    so far everything is good...
> >>    after 5 min I now submitting the next file
> >>    name verification_list.sub (with the
> >>    group group_vcs.verification_list specified in the submit file)
> >>
> >>    Expected results are that at least 4 jobs from verification_list.sub
> >>    will start run and total of 13 fobs will run in the cluster.     All
> >> other 187 jobs should be idle consider none of them as finished
> >>    (Each submission include 100 jobs).
> >>
> >>    However the real results is that I get 18 jobs running which is not
> >>    good! Why? Why? Why? Why?
> >>    I just don't understand it.
> >>
> >>    I also enable NEGOTIATOR_CONSIDER_PREEMPTION since I would like to
> >>    use PREEMPTION.
> >>    I would expect that from the 13 running process from
> >>    the verification_single.sub submission, once I submit
> >>    the  verification_list.sub, 4 jobs will be PREEMPT...
> >>
> >>    Takes for any help....
> >>    Sassy
> >>
> >>
> >>
> >>
> >> ------------------------------**------------------------------**
> >> ------------
> >>
> >> ______________________________**_________________
> >> Condor-users mailing list
> >> To unsubscribe, send a message to condor-users-request@xxxxxxxx**edu<condor-users-request@xxxxxxxxxxx>with a
> >> subject: Unsubscribe
> >> You can also unsubscribe by visiting
> >> https://lists.cs.wisc.edu/**mailman/listinfo/condor-users<https://lists.cs.wisc.edu/mailman/listinfo/condor-users>
> >>
> >> The archives can be found at:
> >> https://lists.cs.wisc.edu/**archive/condor-users/<https://lists.cs.wisc.edu/archive/condor-users/>
> >>
> > ______________________________**_________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxx**edu<condor-users-request@xxxxxxxxxxx>with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/**mailman/listinfo/condor-users<https://lists.cs.wisc.edu/mailman/listinfo/condor-users>
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/**archive/condor-users/<https://lists.cs.wisc.edu/archive/condor-users/>
> >

> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/