[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Long benchmark in parallel with jobs?



Thanks!

I actually thought about appending to the START _expression_ so whatever admins define in other configuration files stay there. I'm already using a separated config.d file for that so it's perfect.

Regards,
Samir

On Thu, Mar 19, 2015 at 12:50 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
On 3/18/2015 5:48 PM, Samir Cury wrote:
Hi,

I was following this part of the documentation :

http://research.cs.wisc.edu/htcondor/manual/v7.9/4_4Hooks.html

That does what I need. I can actually get my ClassAds in place :

[root@compute-10-16 ~]# condor_status -l $HOSTNAME | grep MC
CMSTimePerEventMC = 46.8265

However, I observed in the logs [1], that even though the benchmark
starts at 15:22, actual jobs start at 15:23 and the benchmark only
finished at 15:26 - which doesn't surprise me as I know that the
benchmark is long, about 5 minutes.

My problem is -- if user jobs start in parallel with benchmarks, it
could very well change the result if jobs and benchmarks are sharing CPU
resources. Is this what is happening? Any way to avoid it?


Perhaps just append in your condor_config (ideally around the same place your define your benchmark) something like

 # Don't allow jobs to start until CMSMC benchmark is done
 START = ( $(START) ) && CMSTimePerEventMC =!= UNDEFINED

regards
Todd


Thanks,
Samir


[1] :

03/18/15 15:22:32 CronJob: Initializing job 'mips'
(/usr/libexec/condor/condor_mips)
03/18/15 15:22:32 CronJob: Initializing job 'kflops'
(/usr/libexec/condor/condor_kflops)
03/18/15 15:22:32 CronJob: Initializing job 'cmsmc'
(/tmp/CMSSW-benchmarks-master/MC/run.sh)
03/18/15 15:22:32 slot1: State change: IS_OWNER is false
03/18/15 15:22:32 slot1: Changing state: Owner -> Unclaimed
03/18/15 15:22:32 State change: RunBenchmarks is TRUE
03/18/15 15:22:32 slot1: Changing activity: Idle -> Benchmarking
03/18/15 15:22:32 BenchMgr:StartBenchmarks()
03/18/15 15:22:32 slot2: State change: IS_OWNER is false
03/18/15 15:22:32 slot2: Changing state: Owner -> Unclaimed
03/18/15 15:22:32 State change: RunBenchmarks is TRUE
03/18/15 15:22:32 slot2: Changing activity: Idle -> Benchmarking
03/18/15 15:22:32 slot2: Changing activity: Benchmarking -> Idle
03/18/15 15:22:32 slot3: State change: IS_OWNER is false
03/18/15 15:22:32 slot3: Changing state: Owner -> Unclaimed
03/18/15 15:22:32 State change: RunBenchmarks is TRUE
03/18/15 15:22:32 slot3: Changing activity: Idle -> Benchmarking
03/18/15 15:22:32 slot3: Changing activity: Benchmarking -> Idle
03/18/15 15:22:32 slot4: State change: IS_OWNER is false
03/18/15 15:22:32 slot4: Changing state: Owner -> Unclaimed
03/18/15 15:22:32 State change: RunBenchmarks is TRUE
03/18/15 15:22:32 slot4: Changing activity: Idle -> Benchmarking
03/18/15 15:22:32 slot4: Changing activity: Benchmarking -> Idle
03/18/15 15:23:17 slot1: Request accepted.
03/18/15 15:23:17 slot1: Remote owner is uscms4779@domain
03/18/15 15:23:17 slot1: State change: claiming protocol successful
03/18/15 15:23:17 slot1: Changing state and activity:
Unclaimed/Benchmarking -> Claimed/Idle
03/18/15 15:23:17 slot2: Request accepted.
03/18/15 15:23:17 slot2: Remote owner is uscms4779@domain
03/18/15 15:23:17 slot2: State change: claiming protocol successful
03/18/15 15:23:17 slot2: Changing state: Unclaimed -> Claimed
03/18/15 15:23:17 slot3: Request accepted.
03/18/15 15:23:17 slot3: Remote owner is uscms4779@domain
03/18/15 15:23:17 slot3: State change: claiming protocol successful
03/18/15 15:23:17 slot3: Changing state: Unclaimed -> Claimed
03/18/15 15:23:17 slot4: Request accepted.
03/18/15 15:23:17 slot4: Remote owner is uscms4779@domain
03/18/15 15:23:17 slot4: State change: claiming protocol successful
03/18/15 15:23:17 slot4: Changing state: Unclaimed -> Claimed
03/18/15 15:23:17 slot1: Got activate_claim request from shadow
(10.3.10.128)
03/18/15 15:23:17 slot1: Remote job ID is 1862290.0
03/18/15 15:23:17 slot1: Got universe "VANILLA" (5) from request classad
03/18/15 15:23:17 slot1: State change: claim-activation protocol successful
03/18/15 15:23:17 slot1: Changing activity: Idle -> Busy
03/18/15 15:23:17 slot2: match_info called
03/18/15 15:23:17 slot3: match_info called
03/18/15 15:23:17 slot2: Got activate_claim request from shadow
(10.3.10.128)
03/18/15 15:23:17 slot2: Remote job ID is 1862291.0
03/18/15 15:23:17 slot2: Got universe "VANILLA" (5) from request classad
03/18/15 15:23:17 slot2: State change: claim-activation protocol successful
03/18/15 15:23:17 slot2: Changing activity: Idle -> Busy
03/18/15 15:23:17 slot4: match_info called
03/18/15 15:23:17 slot1: match_info called
03/18/15 15:23:17 slot3: Got activate_claim request from shadow
(10.3.10.128)
03/18/15 15:23:17 slot3: Remote job ID is 1862293.0
03/18/15 15:23:17 slot3: Got universe "VANILLA" (5) from request classad
03/18/15 15:23:17 slot3: State change: claim-activation protocol successful
03/18/15 15:23:17 slot3: Changing activity: Idle -> Busy
03/18/15 15:23:17 slot4: Got activate_claim request from shadow
(10.3.10.128)
03/18/15 15:23:17 slot4: Remote job ID is 1862296.0
03/18/15 15:23:17 slot4: Got universe "VANILLA" (5) from request classad
03/18/15 15:23:17 slot4: State change: claim-activation protocol successful
03/18/15 15:23:17 slot4: Changing activity: Idle -> Busy
03/18/15 15:26:40 State change: benchmarks completed



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing ÂDepartment of Computer Sciences
HTCondor Technical Lead        1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132Â Â Â Â Â Â Â Â Â Madison, WI 53706-1685

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Help to increase the world's CPU efficiency - BOINC

http://www.boincstats.com/signature/user_2330739_project-1.gif