Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Group accounting quota not working?

Date: Wed, 11 Jul 2007 18:33:01 -0700
From: Rick Lan <Rick.Lan@xxxxxxxxxxxx>
Subject: Re: [Condor-users] Group accounting quota not working?
Thanks Jason

I tried using 
+AccountingGroup="group_fxp@xxxxxxxx"

"bbnet.ad" is my UID_DOMAIN. When I do, no jobs would start. So I tried
again with no domain. Both job submitted starts running and
condor_userprio -all returns:

C:\Condor>condor_userprio -all
Last Priority Update:  7/11 18:22
                                    Effective   Real     Priority   Res
Accumu
lated       Usage            Last
User Name                           Priority  Priority    Factor    Used
Usage
(hrs)    Start Time       Usage Time
------------------------------      --------- -------- ------------ ----
------
----- ---------------- ----------------
group_fxp@xxxxxxxx                       5.10     0.51        10.00    2
 0.83  7/11/2007 17:08  7/11/2007 18:22
------------------------------      --------- -------- ------------ ----
------
----- ---------------- ----------------
Number of users: 1                                                     2
 0.83  7/11/2007 17:08  7/10/2007 18:24


It seems (log snip below) that negotiator starts job 25158.0 and blocks
the 2nd (job 25159.0). However, SchedLog show that it fires up the 2nd
job. What am I missing here? Any suggestions?


***** NegotiatorLog

7/11 18:17:23 ---------- Started Negotiation Cycle ----------
7/11 18:17:23 Phase 1:  Obtaining ads from collector ...
7/11 18:17:23   Getting all public ads ...
7/11 18:17:23 Trying to query collector <172.25.4.162:9618>
7/11 18:17:23   Sorting 6 ads ...
7/11 18:17:23   Getting startd private ads ...
7/11 18:17:23 Trying to query collector <172.25.4.162:9618>
7/11 18:17:23 Got ads: 6 public and 2 private
7/11 18:17:23 Public ads include 1 submitter, 2 startd
7/11 18:17:23 Entering compute_signficant_attrs()
7/11 18:17:23 Leaving compute_signficant_attrs() -
result=JobUniverse,LastCheckpointPlatform,NumCkpts
7/11 18:17:23 Phase 2:  Performing accounting ...
7/11 18:17:23 group group_fxp static quota = 1
7/11 18:17:23 Group Table : group group_fxp quota 1 usage 0 prio 0.00
7/11 18:17:23 Group group_fxp - skipping, no submitters
7/11 18:17:23 Group *none* - negotiating
7/11 18:17:23 Phase 3:  Sorting submitter ads by priority ...
7/11 18:17:23 Phase 4.1:  Negotiating with schedds ...
7/11 18:17:23     NumStartdAds = 2
7/11 18:17:23     NormalFactor = 1.000000
7/11 18:17:23     MaxPrioValue = 5.060980
7/11 18:17:23     NumScheddAds = 1
7/11 18:17:23   Negotiating with group_fxp@xxxxxxxx at
<172.25.4.162:4119>
7/11 18:17:23 0 seconds so far
7/11 18:17:23   Calculating schedd limit with the following parameters
7/11 18:17:23     ScheddPrio       = 5.060980
7/11 18:17:23     ScheddPrioFactor = 10.000000
7/11 18:17:23     scheddShare      = 1.000000
7/11 18:17:23     scheddAbsShare   = 1.000000
7/11 18:17:23     ScheddUsage      = 0
7/11 18:17:23     scheddLimit      = 2
7/11 18:17:23     MaxscheddLimit   = 2
7/11 18:17:23 Socket to <172.25.4.162:4119> already in cache, reusing
7/11 18:17:23     Sending SEND_JOB_INFO/eom
7/11 18:17:23     Getting reply from schedd ...
7/11 18:17:23     Got JOB_INFO command; getting classad/eom
7/11 18:17:23     Request 25158.00000:
7/11 18:17:23 Start of sorting MatchList (len=2)
7/11 18:17:23 Finished sorting MatchList
7/11 18:17:23       Connecting to startd vm2@xxxxxxxxxxxxxxxx at
<172.25.4.162:4120>
7/11 18:17:23       Sending MATCH_INFO/capability to
vm2@xxxxxxxxxxxxxxxx
7/11 18:17:23       (Capability is "<172.25.4.162:4120>#1184202015#5" )
7/11 18:17:23       Sending PERMISSION, capability, startdAd to schedd
7/11 18:17:23       Matched 25158.0 group_fxp@xxxxxxxx
<172.25.4.162:4119> preempting none <172.25.4.162:4120>
vm2@xxxxxxxxxxxxxxxx
7/11 18:17:23       Notifying the accountant
7/11 18:17:23       Successfully matched with vm2@xxxxxxxxxxxxxxxx
7/11 18:17:23     Sending SEND_JOB_INFO/eom
7/11 18:17:23     Getting reply from schedd ...
7/11 18:17:23     Got NO_MORE_JOBS;  done negotiating
7/11 18:17:23   Schedd group_fxp@xxxxxxxx got all it wants; removing it.
7/11 18:17:23 ---------- Finished Negotiation Cycle ----------
7/11 18:17:30 New cycle requested but just finished one -- delaying 13
secs
7/11 18:17:37 Getting state information from the accountant
7/11 18:17:43 ---------- Started Negotiation Cycle ----------
7/11 18:17:43 Phase 1:  Obtaining ads from collector ...
7/11 18:17:43   Getting all public ads ...
7/11 18:17:43 Trying to query collector <172.25.4.162:9618>
7/11 18:17:43   Sorting 6 ads ...
7/11 18:17:43   Getting startd private ads ...
7/11 18:17:43 Trying to query collector <172.25.4.162:9618>
7/11 18:17:43 Got ads: 6 public and 2 private
7/11 18:17:43 Public ads include 1 submitter, 2 startd
7/11 18:17:43 Entering compute_signficant_attrs()
7/11 18:17:43 Leaving compute_signficant_attrs() -
result=JobUniverse,LastCheckpointPlatform,NumCkpts
7/11 18:17:43 Phase 2:  Performing accounting ...
7/11 18:17:43 Trimmed out 1 startd ads not Unclaimed
7/11 18:17:43 group group_fxp static quota = 1
7/11 18:17:43 Group Table : group group_fxp quota 1 usage 0 prio 0.00
7/11 18:17:43 Group group_fxp - skipping, no submitters
7/11 18:17:43 Group *none* - negotiating
7/11 18:17:43 Phase 3:  Sorting submitter ads by priority ...
7/11 18:17:43 Phase 4.1:  Negotiating with schedds ...
7/11 18:17:43     NumStartdAds = 2
7/11 18:17:43     NormalFactor = 1.000000
7/11 18:17:43     MaxPrioValue = 5.061770
7/11 18:17:43     NumScheddAds = 1
7/11 18:17:43   Negotiating with group_fxp@xxxxxxxx at
<172.25.4.162:4119>
7/11 18:17:43 0 seconds so far
7/11 18:17:43   Calculating schedd limit with the following parameters
7/11 18:17:43     ScheddPrio       = 5.061770
7/11 18:17:43     ScheddPrioFactor = 10.000000
7/11 18:17:43     scheddShare      = 1.000000
7/11 18:17:43     scheddAbsShare   = 1.000000
7/11 18:17:43     ScheddUsage      = 1
7/11 18:17:43     scheddLimit      = 1
7/11 18:17:43     MaxscheddLimit   = 1
7/11 18:17:43 Socket to <172.25.4.162:4119> already in cache, reusing
7/11 18:17:43     Sending SEND_JOB_INFO/eom
7/11 18:17:43     Getting reply from schedd ...
7/11 18:17:43     Got JOB_INFO command; getting classad/eom
7/11 18:17:43     Request 25159.00000:
7/11 18:17:43       Connecting to startd vm1@xxxxxxxxxxxxxxxx at
<172.25.4.162:4120>
7/11 18:17:43       Sending MATCH_INFO/capability to
vm1@xxxxxxxxxxxxxxxx
7/11 18:17:43       (Capability is "<172.25.4.162:4120>#1184202015#6" )
7/11 18:17:43       Sending PERMISSION, capability, startdAd to schedd
7/11 18:17:43       Matched 25159.0 group_fxp@xxxxxxxx
<172.25.4.162:4119> preempting none <172.25.4.162:4120>
vm1@xxxxxxxxxxxxxxxx
7/11 18:17:43       Notifying the accountant
7/11 18:17:43       Successfully matched with vm1@xxxxxxxxxxxxxxxx
7/11 18:17:43     Reached submitter resource limit: 1 ... stopping
7/11 18:17:43   This schedd hit its scheddlimit.
7/11 18:17:43 ---------- Finished Negotiation Cycle ----------


***** SchedLog

7/11 18:17:23 (pid:3664) Entered negotiate
7/11 18:17:23 (pid:3664) *** SwapSpace = 1257744
7/11 18:17:23 (pid:3664) *** ReservedSwap = 5120
7/11 18:17:23 (pid:3664) *** Shadow Size Estimate = 1800
7/11 18:17:23 (pid:3664) *** Start Limit For Swap = 695
7/11 18:17:23 (pid:3664) *** Current num of active shadows = 0
7/11 18:17:23 (pid:3664) Negotiating for owner: group_fxp@xxxxxxxx
7/11 18:17:23 (pid:3664)
AutoCluster:config(JobUniverse,LastCheckpointPlatform,NumCkpts) invoked
7/11 18:17:23 (pid:3664) removing auto cluster id 1
7/11 18:17:23 (pid:3664) Checking consistency running and runnable jobs
7/11 18:17:23 (pid:3664) Tables are consistent
7/11 18:17:23 (pid:3664) Rebuilt prioritized runnable job list in
0.000s.
7/11 18:17:23 (pid:3664) Sent job 25158.0 (autocluster=0)
7/11 18:17:23 (pid:3664) In case PERMISSION
7/11 18:17:23 (pid:3664) Enqueued contactStartd
startd=<172.25.4.162:4120>
7/11 18:17:23 (pid:3664) Out of jobs - 1 jobs matched, 0 jobs idle,
flock level = 0
7/11 18:17:23 (pid:3664) Return from Handler <<Negotiator Command>>
7/11 18:17:23 (pid:3664) Calling Timer handler 92 (checkContactQueue)
7/11 18:17:23 (pid:3664) In checkContactQueue(), args = 00E16290,
host=<172.25.4.162:4120>
7/11 18:17:23 (pid:3664) In Scheduler::contactStartd()
7/11 18:17:23 (pid:3664) <172.25.4.162:4120>#1184202015#5
group_fxp@xxxxxxxx <172.25.4.162:4120> 25158.0
7/11 18:17:23 (pid:3664) Return from Timer handler 92
(checkContactQueue)
7/11 18:17:23 (pid:3664) Calling Handler <to startd <172.25.4.162:4120>>
7/11 18:17:23 (pid:3664) In Scheduler::startdContactConnectHandler
7/11 18:17:23 (pid:3664) Got mrec data pointer 00E08990
7/11 18:17:23 (pid:3664) Registered startd contact socket.
7/11 18:17:23 (pid:3664) Return from Handler <to startd
<172.25.4.162:4120>>
7/11 18:17:23 (pid:3664) Calling Handler <to startd <172.25.4.162:4120>>
7/11 18:17:23 (pid:3664) In Scheduler::startdContactSockHandler
7/11 18:17:23 (pid:3664) Got mrec data pointer 00E08990
7/11 18:17:23 (pid:3664) Timer set...
7/11 18:17:23 (pid:3664) Return from Handler <to startd
<172.25.4.162:4120>>
7/11 18:17:24 (pid:3664) Calling Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:24 (pid:3664) DaemonCore: Command received via TCP from host
<172.25.4.162:4467>
7/11 18:17:24 (pid:3664) DaemonCore: received command 1111 (QMGMT_CMD),
calling handler (handle_q)
7/11 18:17:24 (pid:3664) Calling HandleReq <handle_q> (0)
7/11 18:17:24 (pid:3664) condor_read(): Socket closed when trying to
read 5 bytes from <172.25.4.162:4467>
7/11 18:17:24 (pid:3664) IO: EOF reading packet header
7/11 18:17:24 (pid:3664) QMGR Connection closed
7/11 18:17:24 (pid:3664) Return from HandleReq <handle_q>
7/11 18:17:24 (pid:3664) Return from Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:25 (pid:3664) Calling Timer handler 7 (StartJobs)
7/11 18:17:25 (pid:3664) -------- Begin starting jobs --------
7/11 18:17:25 (pid:3664) Job 25158.0: is runnable
7/11 18:17:25 (pid:3664) Scheduler::start_std - job=25158.0 on
<172.25.4.162:4120>
7/11 18:17:25 (pid:3664) Queueing job 25158.0 in runnable job queue
7/11 18:17:25 (pid:3664) start next job after 2 sec, JobsThisBurst 0
7/11 18:17:25 (pid:3664) Match (<172.25.4.162:4120>#1184202015#5) -
running 25158.0
7/11 18:17:25 (pid:3664) -------- Done starting jobs --------
7/11 18:17:25 (pid:3664) Return from Timer handler 7 (StartJobs)
7/11 18:17:27 (pid:3664) Calling Timer handler 94 (StartJobHandler)
7/11 18:17:27 (pid:3664) Job prep for 25158.0 will not block, calling
aboutToSpawnJobHandler() directly
7/11 18:17:27 (pid:3664) perm::init() starting up for account (lanr)
domain (BBNET)
7/11 18:17:27 (pid:3664) perm::init: Found Account Name lanr
7/11 18:17:27 (pid:3664) perm::init() starting up for account (lanr)
domain (BBNET)
7/11 18:17:27 (pid:3664) perm::init: Found Account Name lanr
7/11 18:17:27 (pid:3664) aboutToSpawnJobHandler() completed for job
25158.0, attempting to spawn job handler
7/11 18:17:27 (pid:3664) Starting add_shadow_birthdate(25158.0)
7/11 18:17:27 (pid:3664) GetBinaryType() returned 0
7/11 18:17:27 (pid:3664) Added shadow record for PID 3252, job (25158.0)
7/11 18:17:27 (pid:3664) 
7/11 18:17:27 (pid:3664) ..................
7/11 18:17:27 (pid:3664) .. Shadow Recs (1/1)
7/11 18:17:27 (pid:3664) .. 3252, 25158.0, F, <172.25.4.162:4120>,
cur_hosts=1, status=2
7/11 18:17:27 (pid:3664) ..................

7/11 18:17:27 (pid:3664) Started shadow for job 25158.0 on
"<172.25.4.162:4120>", (shadow pid = 3252)
7/11 18:17:27 (pid:3664) -------- Begin starting jobs --------
7/11 18:17:27 (pid:3664) match (<172.25.4.162:4120>#1184202015#5)
already running a job
7/11 18:17:27 (pid:3664) -------- Done starting jobs --------
7/11 18:17:27 (pid:3664) Return from Timer handler 94 (StartJobHandler)
7/11 18:17:28 (pid:3664) Calling Timer handler 6 (timeout)
7/11 18:17:28 (pid:3664) JobsRunning = 1
7/11 18:17:28 (pid:3664) JobsIdle = 0
7/11 18:17:28 (pid:3664) JobsHeld = 0
7/11 18:17:28 (pid:3664) JobsRemoved = 0
7/11 18:17:28 (pid:3664) LocalUniverseJobsRunning = 0
7/11 18:17:28 (pid:3664) LocalUniverseJobsIdle = 0
7/11 18:17:28 (pid:3664) SchedUniverseJobsRunning = 0
7/11 18:17:28 (pid:3664) SchedUniverseJobsIdle = 0
7/11 18:17:28 (pid:3664) N_Owners = 1
7/11 18:17:28 (pid:3664) MaxJobsRunning = 200
7/11 18:17:28 (pid:3664) Trying to update collector <172.25.4.162:9618>
7/11 18:17:28 (pid:3664) Attempting to send update via UDP to collector
LANR-XP.bbnet.ad <172.25.4.162:9618>
7/11 18:17:28 (pid:3664) Sent HEART BEAT ad to 1 collectors. Number of
submittors=1
7/11 18:17:28 (pid:3664) Changed attribute: RunningJobs = 1
7/11 18:17:28 (pid:3664) Changed attribute: IdleJobs = 0
7/11 18:17:28 (pid:3664) Changed attribute: HeldJobs = 0
7/11 18:17:28 (pid:3664) Changed attribute: FlockedJobs = 0
7/11 18:17:28 (pid:3664) Changed attribute: Name = "group_fxp@xxxxxxxx"
7/11 18:17:28 (pid:3664) Sent ad to central manager for
group_fxp@xxxxxxxx
7/11 18:17:28 (pid:3664) Trying to update collector <172.25.4.162:9618>
7/11 18:17:28 (pid:3664) Attempting to send update via UDP to collector
LANR-XP.bbnet.ad <172.25.4.162:9618>
7/11 18:17:28 (pid:3664) Sent ad to 1 collectors for group_fxp@xxxxxxxx
7/11 18:17:28 (pid:3664) ============ Begin clean_shadow_recs
=============
7/11 18:17:28 (pid:3664) ============ End clean_shadow_recs
=============
7/11 18:17:28 (pid:3664) Return from Timer handler 6 (timeout)
7/11 18:17:30 (pid:3664) Calling Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:30 (pid:3664) DaemonCore: Command received via TCP from host
<172.25.4.162:4477>
7/11 18:17:30 (pid:3664) DaemonCore: received command 479 (STORE_CRED),
calling handler (cred_access_handler)
7/11 18:17:30 (pid:3664) Calling HandleReq <cred_access_handler> (0)
7/11 18:17:30 (pid:3664) Checking for lanr@BBNET in credential storage.
7/11 18:17:30 (pid:3664) Succeeded to log in lanr@BBNET
7/11 18:17:30 (pid:3664) Switching back to old priv state.
7/11 18:17:30 (pid:3664) Return from HandleReq <cred_access_handler>
7/11 18:17:30 (pid:3664) Return from Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:30 (pid:3664) Calling Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:30 (pid:3664) DaemonCore: Command received via TCP from host
<172.25.4.162:4478>
7/11 18:17:30 (pid:3664) DaemonCore: received command 1111 (QMGMT_CMD),
calling handler (handle_q)
7/11 18:17:30 (pid:3664) Calling HandleReq <handle_q> (0)
7/11 18:17:30 (pid:3664) sspi_server_auth() entered
7/11 18:17:30 (pid:3664) sspi_server_auth() looping
7/11 18:17:30 (pid:3664) sspi_server_auth(): user name is: "lanr"
7/11 18:17:30 (pid:3664) sspi_server_auth(): domain name is: "BBNET"
7/11 18:17:30 (pid:3664) sspi_server_auth() exiting
7/11 18:17:30 (pid:3664) ZKM: setting default map to lanr@bbnet
7/11 18:17:30 (pid:3664) OwnerCheck retval 1 (success),no ad
7/11 18:17:30 (pid:3664) OwnerCheck retval 1 (success),no ad
7/11 18:17:30 (pid:3664) Prioritized runnable job list will be rebuilt,
because ClassAd attribute JobStatus=1 changed
7/11 18:17:30 (pid:3664) condor_read(): Socket closed when trying to
read 5 bytes from <172.25.4.162:4478>
7/11 18:17:30 (pid:3664) IO: EOF reading packet header
7/11 18:17:30 (pid:3664) QMGR Connection closed
7/11 18:17:30 (pid:3664) Return from HandleReq <handle_q>
7/11 18:17:30 (pid:3664) Return from Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:30 (pid:3664) DaemonCore: Command received via UDP from host
<172.25.4.162:4481>
7/11 18:17:30 (pid:3664) DaemonCore: received command 421 (RESCHEDULE),
calling handler (reschedule_negotiator)
7/11 18:17:30 (pid:3664) Calling HandleReq <reschedule_negotiator> (0)
7/11 18:17:30 (pid:3664) Called reschedule_negotiator()
7/11 18:17:30 (pid:3664) Sending RESCHEDULE command to negotiator(s)
7/11 18:17:30 (pid:3664) Will use UDP to update collector
LANR-XP.bbnet.ad <172.25.4.162:9618>
7/11 18:17:30 (pid:3664) Trying to query collector <172.25.4.162:9618>
7/11 18:17:30 (pid:3664) Return from HandleReq <reschedule_negotiator>
7/11 18:17:31 (pid:3664) Calling Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:31 (pid:3664) DaemonCore: Command received via TCP from host
<172.25.4.162:4484>
7/11 18:17:31 (pid:3664) DaemonCore: received command 1111 (QMGMT_CMD),
calling handler (handle_q)
7/11 18:17:31 (pid:3664) Calling HandleReq <handle_q> (0)
7/11 18:17:31 (pid:3664) condor_read(): Socket closed when trying to
read 5 bytes from <172.25.4.162:4484>
7/11 18:17:31 (pid:3664) IO: EOF reading packet header
7/11 18:17:31 (pid:3664) QMGR Connection closed
7/11 18:17:31 (pid:3664) Return from HandleReq <handle_q>
7/11 18:17:31 (pid:3664) Return from Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:32 (pid:3664) Calling Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:32 (pid:3664) Return from Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:32 (pid:3664) DaemonCore: Command received via UDP from host
<172.25.4.162:4486>
7/11 18:17:32 (pid:3664) DaemonCore: received command 60008
(DC_CHILDALIVE), calling handler (HandleChildAliveCommand)
7/11 18:17:32 (pid:3664) Calling HandleReq <HandleChildAliveCommand> (0)
7/11 18:17:32 (pid:3664) Return from HandleReq <HandleChildAliveCommand>
7/11 18:17:32 (pid:3664) Calling Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:32 (pid:3664) DaemonCore: Command received via TCP from host
<172.25.4.162:4492>
7/11 18:17:32 (pid:3664) DaemonCore: received command 1111 (QMGMT_CMD),
calling handler (handle_q)
7/11 18:17:32 (pid:3664) Calling HandleReq <handle_q> (0)
7/11 18:17:32 (pid:3664) condor_read(): Socket closed when trying to
read 5 bytes from <172.25.4.162:4492>
7/11 18:17:32 (pid:3664) IO: EOF reading packet header
7/11 18:17:32 (pid:3664) QMGR Connection closed
7/11 18:17:32 (pid:3664) Return from HandleReq <handle_q>
7/11 18:17:32 (pid:3664) Return from Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:33 (pid:3664) Calling Timer handler 6 (timeout)
7/11 18:17:33 (pid:3664) JobsRunning = 1
7/11 18:17:33 (pid:3664) JobsIdle = 1
7/11 18:17:33 (pid:3664) JobsHeld = 0
7/11 18:17:33 (pid:3664) JobsRemoved = 0
7/11 18:17:33 (pid:3664) LocalUniverseJobsRunning = 0
7/11 18:17:33 (pid:3664) LocalUniverseJobsIdle = 0
7/11 18:17:33 (pid:3664) SchedUniverseJobsRunning = 0
7/11 18:17:33 (pid:3664) SchedUniverseJobsIdle = 0
7/11 18:17:33 (pid:3664) N_Owners = 1
7/11 18:17:33 (pid:3664) MaxJobsRunning = 200
7/11 18:17:33 (pid:3664) Trying to update collector <172.25.4.162:9618>
7/11 18:17:33 (pid:3664) Attempting to send update via UDP to collector
LANR-XP.bbnet.ad <172.25.4.162:9618>
7/11 18:17:33 (pid:3664) Sent HEART BEAT ad to 1 collectors. Number of
submittors=1
7/11 18:17:33 (pid:3664) Changed attribute: RunningJobs = 1
7/11 18:17:33 (pid:3664) Changed attribute: IdleJobs = 1
7/11 18:17:33 (pid:3664) Changed attribute: HeldJobs = 0
7/11 18:17:33 (pid:3664) Changed attribute: FlockedJobs = 0
7/11 18:17:33 (pid:3664) Changed attribute: Name = "group_fxp@xxxxxxxx"
7/11 18:17:33 (pid:3664) Sent ad to central manager for
group_fxp@xxxxxxxx
7/11 18:17:33 (pid:3664) Trying to update collector <172.25.4.162:9618>
7/11 18:17:33 (pid:3664) Attempting to send update via UDP to collector
LANR-XP.bbnet.ad <172.25.4.162:9618>
7/11 18:17:33 (pid:3664) Sent ad to 1 collectors for group_fxp@xxxxxxxx
7/11 18:17:33 (pid:3664) ============ Begin clean_shadow_recs
=============
7/11 18:17:33 (pid:3664) ============ End clean_shadow_recs
=============
7/11 18:17:33 (pid:3664) Return from Timer handler 6 (timeout)
7/11 18:17:34 (pid:3664) Calling Timer handler 7 (StartJobs)
7/11 18:17:34 (pid:3664) -------- Begin starting jobs --------
7/11 18:17:34 (pid:3664) match (<172.25.4.162:4120>#1184202015#5)
already running a job
7/11 18:17:34 (pid:3664) -------- Done starting jobs --------
7/11 18:17:34 (pid:3664) Return from Timer handler 7 (StartJobs)
7/11 18:17:41 (pid:3664) Calling Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:41 (pid:3664) DaemonCore: Command received via TCP from host
<172.25.4.162:4501>
7/11 18:17:41 (pid:3664) DaemonCore: received command 1111 (QMGMT_CMD),
calling handler (handle_q)
7/11 18:17:41 (pid:3664) Calling HandleReq <handle_q> (0)
7/11 18:17:41 (pid:3664) condor_read(): Socket closed when trying to
read 5 bytes from <172.25.4.162:4501>
7/11 18:17:41 (pid:3664) IO: EOF reading packet header
7/11 18:17:41 (pid:3664) QMGR Connection closed
7/11 18:17:41 (pid:3664) Return from HandleReq <handle_q>
7/11 18:17:41 (pid:3664) Return from Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:43 (pid:3664) Calling Handler <<Negotiator Command>>
7/11 18:17:43 (pid:3664) Activity on stashed negotiator socket
7/11 18:17:43 (pid:3664) 
7/11 18:17:43 (pid:3664) Entered negotiate
7/11 18:17:43 (pid:3664) *** SwapSpace = 1187256
7/11 18:17:43 (pid:3664) *** ReservedSwap = 5120
7/11 18:17:43 (pid:3664) *** Shadow Size Estimate = 1800
7/11 18:17:43 (pid:3664) *** Start Limit For Swap = 656
7/11 18:17:43 (pid:3664) *** Current num of active shadows = 1
7/11 18:17:43 (pid:3664) Negotiating for owner: group_fxp@xxxxxxxx
7/11 18:17:43 (pid:3664)
AutoCluster:config(JobUniverse,LastCheckpointPlatform,NumCkpts) invoked
7/11 18:17:43 (pid:3664) removing auto cluster id 1
7/11 18:17:43 (pid:3664) Checking consistency running and runnable jobs
7/11 18:17:43 (pid:3664) Tables are consistent
7/11 18:17:43 (pid:3664) Rebuilt prioritized runnable job list in
0.000s.
7/11 18:17:43 (pid:3664) Sent job 25159.0 (autocluster=0)
7/11 18:17:43 (pid:3664) In case PERMISSION
7/11 18:17:43 (pid:3664) Enqueued contactStartd
startd=<172.25.4.162:4120>
7/11 18:17:43 (pid:3664) Out of jobs - 1 jobs matched, 0 jobs idle,
flock level = 0
7/11 18:17:43 (pid:3664) JobsRunning = 1
7/11 18:17:43 (pid:3664) JobsIdle = 1
7/11 18:17:43 (pid:3664) JobsHeld = 0
7/11 18:17:43 (pid:3664) JobsRemoved = 0
7/11 18:17:43 (pid:3664) LocalUniverseJobsRunning = 0
7/11 18:17:43 (pid:3664) LocalUniverseJobsIdle = 0
7/11 18:17:43 (pid:3664) SchedUniverseJobsRunning = 0
7/11 18:17:43 (pid:3664) SchedUniverseJobsIdle = 0
7/11 18:17:43 (pid:3664) N_Owners = 1
7/11 18:17:43 (pid:3664) MaxJobsRunning = 200
7/11 18:17:43 (pid:3664) Trying to update collector <172.25.4.162:9618>
7/11 18:17:43 (pid:3664) Attempting to send update via UDP to collector
LANR-XP.bbnet.ad <172.25.4.162:9618>
7/11 18:17:43 (pid:3664) Sent HEART BEAT ad to 1 collectors. Number of
submittors=1
7/11 18:17:43 (pid:3664) Changed attribute: RunningJobs = 1
7/11 18:17:43 (pid:3664) Changed attribute: IdleJobs = 1
7/11 18:17:43 (pid:3664) Changed attribute: HeldJobs = 0
7/11 18:17:43 (pid:3664) Changed attribute: FlockedJobs = 1
7/11 18:17:43 (pid:3664) Changed attribute: Name = "group_fxp@xxxxxxxx"
7/11 18:17:43 (pid:3664) Sent ad to central manager for
group_fxp@xxxxxxxx
7/11 18:17:43 (pid:3664) Trying to update collector <172.25.4.162:9618>
7/11 18:17:43 (pid:3664) Attempting to send update via UDP to collector
LANR-XP.bbnet.ad <172.25.4.162:9618>
7/11 18:17:43 (pid:3664) Sent ad to 1 collectors for group_fxp@xxxxxxxx
7/11 18:17:43 (pid:3664) ============ Begin clean_shadow_recs
=============
7/11 18:17:43 (pid:3664) ============ End clean_shadow_recs
=============
7/11 18:17:43 (pid:3664) Return from Handler <<Negotiator Command>>
7/11 18:17:43 (pid:3664) Calling Timer handler 102 (checkContactQueue)
7/11 18:17:43 (pid:3664) In checkContactQueue(), args = 00E26980,
host=<172.25.4.162:4120>
7/11 18:17:43 (pid:3664) In Scheduler::contactStartd()
7/11 18:17:43 (pid:3664) <172.25.4.162:4120>#1184202015#6
group_fxp@xxxxxxxx <172.25.4.162:4120> 25159.0
7/11 18:17:43 (pid:3664) Return from Timer handler 102
(checkContactQueue)
7/11 18:17:43 (pid:3664) Calling Handler <to startd <172.25.4.162:4120>>
7/11 18:17:43 (pid:3664) In Scheduler::startdContactConnectHandler
7/11 18:17:43 (pid:3664) Got mrec data pointer 00E09120
7/11 18:17:43 (pid:3664) Registered startd contact socket.
7/11 18:17:43 (pid:3664) Return from Handler <to startd
<172.25.4.162:4120>>
7/11 18:17:43 (pid:3664) Calling Handler <to startd <172.25.4.162:4120>>
7/11 18:17:43 (pid:3664) In Scheduler::startdContactSockHandler
7/11 18:17:43 (pid:3664) Got mrec data pointer 00E09120
7/11 18:17:43 (pid:3664) Timer set...
7/11 18:17:43 (pid:3664) Return from Handler <to startd
<172.25.4.162:4120>>
7/11 18:17:44 (pid:3664) Calling Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:44 (pid:3664) DaemonCore: Command received via TCP from host
<172.25.4.162:4512>
7/11 18:17:44 (pid:3664) DaemonCore: received command 1111 (QMGMT_CMD),
calling handler (handle_q)
7/11 18:17:44 (pid:3664) Calling HandleReq <handle_q> (0)
7/11 18:17:44 (pid:3664) condor_read(): Socket closed when trying to
read 5 bytes from <172.25.4.162:4512>
7/11 18:17:44 (pid:3664) IO: EOF reading packet header
7/11 18:17:44 (pid:3664) QMGR Connection closed
7/11 18:17:44 (pid:3664) Return from HandleReq <handle_q>
7/11 18:17:44 (pid:3664) Return from Handler
<DaemonCore::HandleReqSocketHandler>
7/11 18:17:45 (pid:3664) Calling Timer handler 7 (StartJobs)
7/11 18:17:45 (pid:3664) -------- Begin starting jobs --------
7/11 18:17:45 (pid:3664) match (<172.25.4.162:4120>#1184202015#5)
already running a job
7/11 18:17:45 (pid:3664) Job 25159.0: is runnable
7/11 18:17:45 (pid:3664) Scheduler::start_std - job=25159.0 on
<172.25.4.162:4120>
7/11 18:17:45 (pid:3664) Queueing job 25159.0 in runnable job queue
7/11 18:17:45 (pid:3664) start next job after 2 sec, JobsThisBurst 0
7/11 18:17:45 (pid:3664) Match (<172.25.4.162:4120>#1184202015#6) -
running 25159.0
7/11 18:17:45 (pid:3664) -------- Done starting jobs --------


Thanks
Rick

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Jason Stowe
Sent: Wednesday, July 11, 2007 5:51 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Group accounting quota not working?

Rick,
I believe your accounting groups should be
+AccountingGroup="group_fxp@xxxxxxxxxx"

where domain.com is the UID_DOMAIN. If that doesn't work, sending out
the results of condor_userprio -all when the jobs are running will help
determine the underlying issue.

Best of luck,
Jason


On 7/10/07, Rick Lan <Rick.Lan@xxxxxxxxxxxx> wrote:
> Hello
>
> I have been testing group accounting on a dual-core laptop by setting 
> the group quota to 1 and auto regroup to false, but doesn't seem to 
> work. Both cores would serve jobs when there are more than one jobs. 
> Is there a misconfiguration? Any help would be appreciated. I'm 
> running Condor 6.9.2 on Windows XP. Attached are (1)submit file and 
> (2)condor_config.
>
> In submit file:
> +AccountingGroup = "group_fxp"
>
> In condor_config:
> GROUP_NAMES = group_fxp
> GROUP_QUOTA_group_fxp = 1
> GROUP_PRIO_FACTOR_group_fxp = 2.0
> GROUP_AUTOREGROUP = FALSE
>
>
>
>
> Thanks
> Rick
>
> Conexant E-mail Firewall (Conexant.Com) made the following 
> annotations-----------------------------------------------------------
> ----------********************** Legal Disclaimer 
> ****************************
>
> "This email may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or
distribution by others is strictly prohibited. If you have received the
message in error, please advise the sender by reply email and delete the
message. Thank you."
>
> **********************************************************************
>
> ---------------------------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx 
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>
>


-- 

===================================
Jason A. Stowe

Phone: 607.227.9686
jstowe@xxxxxxxxxxxxxxxxxx

Cycle Computing, LLC
http://www.cyclecomputing.com
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/
References:
- [Condor-users] Group accounting quota not working?
  - From: Rick Lan
- Re: [Condor-users] Group accounting quota not working?
  - From: Jason Stowe
Prev by Date: [Condor-users] Parallel preemption test causing immediate job evictions and inverted priorities
Next by Date: [Condor-users] USE_VISIBLE_DESKTOP and CREDD issues
Previous by thread: Re: [Condor-users] Group accounting quota not working?
Next by thread: [Condor-users] running executables on other machines of cluster
Index(es):
- Date
- Thread
Mailing List Archives

Public Access

Re: [Condor-users] Group accounting quota not working?