[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor Group Drive me Crazy.......



Hi Again....

I'm kind of lost here.
Enable debug mode and check the logs and still no good.


I attach the condor.local.conf file ....


Thanks for the help....


On Sun, Nov 13, 2011 at 6:00 PM, Sassy Natan <sassyn@xxxxxxxxx> wrote:
Hi All
Here is cut and paste from my condor configuration file:

GROUP_NAMES = GROUP_VCS, GROUP_VCS.DESIGN_SINGLE, GROUP_VCS.DESIGN_LIST, GROUP_VCS.VERIFICATION_SINGLE, GROUP_VCS.VERIFICATION_LIST

GROUP_QUOTA_group_vcs = 13
GROUP_QUOTA_group_vcs.design_single = 4
GROUP_QUOTA_group_vcs.design_list = 1
GROUP_QUOTA_group_vcs.verification_single = 5
GROUP_QUOTA_group_vcs.verification_list  = 3


GROUP_AUTOREGROUP = FALSE
GROUP_ACCEPT_SURPLUS = FALSE

GROUP_AUTOREGROUP_group_vcs = FALSE
GROUP_ACCEPT_SURPLUS_group_vcs = FALSE

GROUP_AUTOREGROUP_group_vcs.design_single = FALSE
GROUP_ACCEPT_SURPLUS_group_vcs.design_single = TRUE

GROUP_AUTOREGROUP_group_vcs.design_list = FALSE
GROUP_ACCEPT_SURPLUS_group_vcs.design_list = TRUE

GROUP_AUTOREGROUP_group_vcs.verification_single = FALSE
GROUP_ACCEPT_SURPLUS_group_vcs.verification_single = TRUE

GROUP_AUTOREGROUP_group_vcs.verification_list  = FALSE
GROUP_ACCEPT_SURPLUS_group_vcs.verification_list  = TRUE


I have now 2 submission files, each with 100 Jobs....
submit the first file name: verification_single.sub start processing 13 jobs as expected (with the group group_vcs.verification_single specified in the submit file)

so far everything is good... 

after 5 min I now submitting the next file name verification_list.sub (with the group group_vcs.verification_list specified in the submit file)

Expected results are that at least 4 jobs from verification_list.sub will start run and total of 13 fobs will run in the cluster. 
All other 187 jobs should be idle consider none of them as finished (Each submission include 100 jobs).

However the real results is that I get 18 jobs running which is not good! Why? Why? Why? Why?
I just don't understand it.

I also enable NEGOTIATOR_CONSIDER_PREEMPTION since I would like to use PREEMPTION.
I would expect that from the 13 running process from the verification_single.sub submission, once I submit the  verification_list.sub, 4 jobs will be PREEMPT...

Takes for any help....
Sassy



CONDOR_HOST = $(FULL_HOSTNAME)

COLLECTOR_NAME = Personal Condor at $(FULL_HOSTNAME)

START = TRUE

SUSPEND = FALSE

PREEMPT = FALSE

KILL = FALSE


DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD

UID_DOMAIN            = x.com
COLLECTOR_NAME        = x
ALLOW_READ            = $(ALLOW_READ),  cpu*.x.com, appsrv.x.com
ALLOW_WRITE           = $(ALLOW_WRITE), cpu*.x.com, appsrv.x.com


SEC_CLIENT_AUTHENTICATION_METHODS = CLAIMTOBE, NTSSPI, PASSWORD
SEC_DEFAULT_AUTHENTICATION_METHODS = $(SEC_CLIENT_AUTHENTICATION_METHODS)


DAGMAN_LOG_ON_NFS_IS_ERROR = FALSE

FILESYSTEM_DOMAIN = X.com

USE_NFS                = True

ALLOW_ADMINISTRATOR = appsrv.x.com

NEGOTIATOR_CONSIDER_PREEMPTION = True

CLAIM_WORKLIFE = 0

QUEUE_SUPER_USERS = root, condor

NEGOTIATOR_DEBUG = D_FULLDEBUG

NEGOTIATOR_INTERVAL = 30

SCHEDD_INTERVAL = 15

GROUP_ACCEPT_SURPLUS = FALSE

NEGOTIATE_ALL_JOBS_IN_CLUSTER = TRUE

NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE


HOSTALLOW_ADMINISTRATOR = appsrv.x.com
HOSTALLOW_CONFIG = appsrv.x.com
HOSTALLOW_NEGOTIATOR = appsrv.x.com

SETTABLE_ATTRS_CONFIG = *
SETTABLE_ATTRS_ADMINISTRATOR = *

QUEUE_ALL_USERS_TRUSTED = False

ENABLE_SOAP = True
ALLOW_SOAP=*

ENABLE_HISTORY_ROTATION = True

MAX_HISTORY_LOG = 8000000

MAX_HISTORY_ROTATIONS = 5


GROUP_NAMES = GROUP_VCS, GROUP_VCS.DESIGN_SINGLE, GROUP_VCS.DESIGN_LIST, GROUP_VCS.VERIFICATION_SINGLE, GROUP_VCS.VERIFICATION_LIST

GROUP_QUOTA_group_vcs = 13
GROUP_QUOTA_group_vcs.design_single = 4
GROUP_QUOTA_group_vcs.design_list = 1
GROUP_QUOTA_group_vcs.verification_single = 5
GROUP_QUOTA_group_vcs.verification_list  = 3

GROUP_AUTOREGROUP = FALSE

GROUP_ACCEPT_SURPLUS = FALSE

GROUP_AUTOREGROUP_group_vcs = FALSE
GROUP_ACCEPT_SURPLUS_group_vcs = FALSE
GROUP_AUTOREGROUP_group_vcs.design_single = FALSE
GROUP_ACCEPT_SURPLUS_group_vcs.design_single = TRUE
GROUP_AUTOREGROUP_group_vcs.design_list = FALSE
GROUP_ACCEPT_SURPLUS_group_vcs.design_list = TRUE
GROUP_AUTOREGROUP_group_vcs.verification_single = FALSE
GROUP_ACCEPT_SURPLUS_group_vcs.verification_single = TRUE
GROUP_AUTOREGROUP_group_vcs.verification_list  = FALSE
GROUP_ACCEPT_SURPLUS_group_vcs.verification_list  = TRUE

GROUP_PRIO_FACTOR_GROUP_VCS = 1.0
GROUP_PRIO_FACTOR_GROUP_VCS.DESIGN_SINGLE = 1.0
GROUP_PRIO_FACTOR_GROUP_VCS.DESIGN_LIST = 1.0
GROUP_PRIO_FACTOR_GROUP_VCS.VERIFICATION_SINGLE = 1.0
GROUP_PRIO_FACTOR_GROUP_VCS.VERIFICATION_LIST = 1.0


SUBMIT_EXPRS = AccountingGroup
AccountingGroup = strcat(GroupName, ".", ifThenElse(regexps("(.*)\.(.*)", Owner, "\1_\2") =!= "", regexps("(.*)\.(.*)", Owner, "\1_\2"), Owner))

NEGOTIATOR_CONSIDER_PREEMPTION=True
PREEMPTION_REQUIREMENTS = ((SubmitterGroupQuota =!= UNDEFINED && (SubmitterGroupResourcesInUse < SubmitterGroupQuota)) &&  ((RemoteGroupQuota=?=UNDEFINED) || (RemoteGroupResourcesInUse > RemoteGroupQuota)))

SCHEDD_DEBUG = D_FULLDEBUG

NEGOTIATOR_DEBUG = D_FULLDEBUG
NEGOTIATOR_DEBUG = D_FULLDEBUG
NEGOTIATOR_USE_SLOT_WEIGHTS = FALSE
NEGOTIATOR_INTERVAL = 30

SCHEDD_INTERVAL = 15
CLAIM_WORKLIFE = 0

NUM_CPUS = 10

HFS_ROUND_ROBIN_RATE = 100000000
HFS_MAX_ALLOCATION_ROUNDS = 1