[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Long benchmark in parallel with jobs?



Hi,

I was following this part of the documentation :

http://research.cs.wisc.edu/htcondor/manual/v7.9/4_4Hooks.html

That does what I need. I can actually get my ClassAds in place :Â

[root@compute-10-16 ~]# condor_status -l $HOSTNAME | grep MC
CMSTimePerEventMC = 46.8265

However, I observed in the logs [1], that even though the benchmark starts atÂ15:22, actual jobs start atÂ15:23 and the benchmark only finished at 15:26 - which doesn't surprise me as I know that the benchmark is long, about 5 minutes.

My problem is -- if user jobs start in parallel with benchmarks, it could very well change the result if jobs and benchmarks are sharing CPU resources. Is this what is happening? Any way to avoid it?

Thanks,
Samir


[1] :

03/18/15 15:22:32 CronJob: Initializing job 'mips' (/usr/libexec/condor/condor_mips)
03/18/15 15:22:32 CronJob: Initializing job 'kflops' (/usr/libexec/condor/condor_kflops)
03/18/15 15:22:32 CronJob: Initializing job 'cmsmc' (/tmp/CMSSW-benchmarks-master/MC/run.sh)
03/18/15 15:22:32 slot1: State change: IS_OWNER is false
03/18/15 15:22:32 slot1: Changing state: Owner -> Unclaimed
03/18/15 15:22:32 State change: RunBenchmarks is TRUE
03/18/15 15:22:32 slot1: Changing activity: Idle -> Benchmarking
03/18/15 15:22:32 BenchMgr:StartBenchmarks()
03/18/15 15:22:32 slot2: State change: IS_OWNER is false
03/18/15 15:22:32 slot2: Changing state: Owner -> Unclaimed
03/18/15 15:22:32 State change: RunBenchmarks is TRUE
03/18/15 15:22:32 slot2: Changing activity: Idle -> Benchmarking
03/18/15 15:22:32 slot2: Changing activity: Benchmarking -> Idle
03/18/15 15:22:32 slot3: State change: IS_OWNER is false
03/18/15 15:22:32 slot3: Changing state: Owner -> Unclaimed
03/18/15 15:22:32 State change: RunBenchmarks is TRUE
03/18/15 15:22:32 slot3: Changing activity: Idle -> Benchmarking
03/18/15 15:22:32 slot3: Changing activity: Benchmarking -> Idle
03/18/15 15:22:32 slot4: State change: IS_OWNER is false
03/18/15 15:22:32 slot4: Changing state: Owner -> Unclaimed
03/18/15 15:22:32 State change: RunBenchmarks is TRUE
03/18/15 15:22:32 slot4: Changing activity: Idle -> Benchmarking
03/18/15 15:22:32 slot4: Changing activity: Benchmarking -> Idle
03/18/15 15:23:17 slot1: Request accepted.
03/18/15 15:23:17 slot1: Remote owner is uscms4779@domain
03/18/15 15:23:17 slot1: State change: claiming protocol successful
03/18/15 15:23:17 slot1: Changing state and activity: Unclaimed/Benchmarking -> Claimed/Idle
03/18/15 15:23:17 slot2: Request accepted.
03/18/15 15:23:17 slot2: Remote owner is uscms4779@domain
03/18/15 15:23:17 slot2: State change: claiming protocol successful
03/18/15 15:23:17 slot2: Changing state: Unclaimed -> Claimed
03/18/15 15:23:17 slot3: Request accepted.
03/18/15 15:23:17 slot3: Remote owner is uscms4779@domain
03/18/15 15:23:17 slot3: State change: claiming protocol successful
03/18/15 15:23:17 slot3: Changing state: Unclaimed -> Claimed
03/18/15 15:23:17 slot4: Request accepted.
03/18/15 15:23:17 slot4: Remote owner is uscms4779@domain
03/18/15 15:23:17 slot4: State change: claiming protocol successful
03/18/15 15:23:17 slot4: Changing state: Unclaimed -> Claimed
03/18/15 15:23:17 slot1: Got activate_claim request from shadow (10.3.10.128)
03/18/15 15:23:17 slot1: Remote job ID is 1862290.0
03/18/15 15:23:17 slot1: Got universe "VANILLA" (5) from request classad
03/18/15 15:23:17 slot1: State change: claim-activation protocol successful
03/18/15 15:23:17 slot1: Changing activity: Idle -> Busy
03/18/15 15:23:17 slot2: match_info called
03/18/15 15:23:17 slot3: match_info called
03/18/15 15:23:17 slot2: Got activate_claim request from shadow (10.3.10.128)
03/18/15 15:23:17 slot2: Remote job ID is 1862291.0
03/18/15 15:23:17 slot2: Got universe "VANILLA" (5) from request classad
03/18/15 15:23:17 slot2: State change: claim-activation protocol successful
03/18/15 15:23:17 slot2: Changing activity: Idle -> Busy
03/18/15 15:23:17 slot4: match_info called
03/18/15 15:23:17 slot1: match_info called
03/18/15 15:23:17 slot3: Got activate_claim request from shadow (10.3.10.128)
03/18/15 15:23:17 slot3: Remote job ID is 1862293.0
03/18/15 15:23:17 slot3: Got universe "VANILLA" (5) from request classad
03/18/15 15:23:17 slot3: State change: claim-activation protocol successful
03/18/15 15:23:17 slot3: Changing activity: Idle -> Busy
03/18/15 15:23:17 slot4: Got activate_claim request from shadow (10.3.10.128)
03/18/15 15:23:17 slot4: Remote job ID is 1862296.0
03/18/15 15:23:17 slot4: Got universe "VANILLA" (5) from request classad
03/18/15 15:23:17 slot4: State change: claim-activation protocol successful
03/18/15 15:23:17 slot4: Changing activity: Idle -> Busy
03/18/15 15:26:40 State change: benchmarks completed