[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor Group Drive me Crazy.......



Hi Joe,

Well yes this is true... 
When setting the GROUP_ACCEPT_SURPLUS_* to FALSE jobs doesn't leap outside the quota limit.

However, Since this configuration use Sub Groups I expect to have dynamic allocation inside the group.
So in my configuration:

GROUP_QUOTA_group_vcs = 13
GROUP_QUOTA_group_vcs.design_single = 4
GROUP_QUOTA_group_vcs.design_list = 1
GROUP_QUOTA_group_vcs.verification_single = 5
GROUP_QUOTA_group_vcs.verification_list  = 3
 

The VCS group has limit of 13 slots right?

So when someone from the vcs.verification_single send a Job (queue 30) - and the pool is clean (no jobs at the moment) the number of current running jobs should be 13 (17 in idle)
This in fact what happen when I submit the job. But once a send a new job (queue 30) - from the verification_list I would expect that at least 3 jobs will run right away, causing 3 jobs from the  vcs.verification_single group to be preempted or killed.
However what is happening is that the 13 jobs of the  vcs.verification_single group are keep running and 3 something even 4 jobs being added to running state. Leaving me with total of 16-17 running jobs which is not good.

Any Guess?

I working on this all day without any luck :-(

Thanks
Sassy  

On Mon, Nov 14, 2011 at 5:43 PM, Joe Boyd <boyd@xxxxxxxx> wrote:
If you want those groups to be limited to only what the quota has you don't want to set these to TRUE do you?

GROUP_ACCEPT_SURPLUS_group_vcs.verification_list  = TRUE
GROUP_ACCEPT_SURPLUS_group_vcs.verification_single = TRUE

That's telling it that those groups can use any "surplus" slots in the pool outside of the quota configuration if no one else is using them. If you set those to FALSE doesn't it do what you want?

joe


Sassy Natan wrote:
Hi Again....

I'm kind of lost here.
Enable debug mode and check the logs and still no good.


I attach the condor.local.conf file ....


Thanks for the help....


On Sun, Nov 13, 2011 at 6:00 PM, Sassy Natan <sassyn@xxxxxxxxx <mailto:sassyn@xxxxxxxxx>> wrote:

   Hi All
   Here is cut and paste from my condor configuration file:

   GROUP_NAMES = GROUP_VCS, GROUP_VCS.DESIGN_SINGLE,
   GROUP_VCS.DESIGN_LIST, GROUP_VCS.VERIFICATION_SINGLE,
   GROUP_VCS.VERIFICATION_LIST

   GROUP_QUOTA_group_vcs = 13
   GROUP_QUOTA_group_vcs.design_single = 4
   GROUP_QUOTA_group_vcs.design_list = 1
   GROUP_QUOTA_group_vcs.verification_single = 5
   GROUP_QUOTA_group_vcs.verification_list  = 3


   GROUP_AUTOREGROUP = FALSE
   GROUP_ACCEPT_SURPLUS = FALSE

   GROUP_AUTOREGROUP_group_vcs = FALSE
   GROUP_ACCEPT_SURPLUS_group_vcs = FALSE

   GROUP_AUTOREGROUP_group_vcs.design_single = FALSE
   GROUP_ACCEPT_SURPLUS_group_vcs.design_single = TRUE

   GROUP_AUTOREGROUP_group_vcs.design_list = FALSE
   GROUP_ACCEPT_SURPLUS_group_vcs.design_list = TRUE

   GROUP_AUTOREGROUP_group_vcs.verification_single = FALSE
   GROUP_ACCEPT_SURPLUS_group_vcs.verification_single = TRUE

   GROUP_AUTOREGROUP_group_vcs.verification_list  = FALSE
   GROUP_ACCEPT_SURPLUS_group_vcs.verification_list  = TRUE


   I have now 2 submission files, each with 100 Jobs....
   submit the first file name: verification_single.sub start processing
   13 jobs as expected (with the
   group group_vcs.verification_single specified in the submit file)

   so far everything is good...
   after 5 min I now submitting the next file
   name verification_list.sub (with the
   group group_vcs.verification_list specified in the submit file)

   Expected results are that at least 4 jobs from verification_list.sub
   will start run and total of 13 fobs will run in the cluster.     All other 187 jobs should be idle consider none of them as finished
   (Each submission include 100 jobs).

   However the real results is that I get 18 jobs running which is not
   good! Why? Why? Why? Why?
   I just don't understand it.

   I also enable NEGOTIATOR_CONSIDER_PREEMPTION since I would like to
   use PREEMPTION.
   I would expect that from the 13 running process from
   the verification_single.sub submission, once I submit
   the  verification_list.sub, 4 jobs will be PREEMPT...

   Takes for any help....
   Sassy




------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxedu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxedu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/