[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] All jobs running at once



Hi Jordan,

If you do want to go back to the âlocalâ environment, I think you might want to look at the START_LOCAL_UNIVERSE configuration option.  From the condor manual:

ââ"
START_LOCAL_UNIVERSE
A boolean value that defaults to TotalLocalJobsRunning < 200. The condor_schedd uses this macro to determine whether to start a local universe job. At intervals determined by SCHEDD_INTERVAL, the condor_schedd daemon evaluates this macro for each idle local universe job that it has. For each job, if the START_LOCAL_UNIVERSE macro is True, then the job's Requirements expression is evaluated. If both conditions are met, then the job is allowed to begin execution.
The following example only allows 10 local universe jobs to execute concurrently. The attribute TotalLocalJobsRunning is supplied by condor_schedd's ClassAd:

    START_LOCAL_UNIVERSE = TotalLocalJobsRunning < 10
âââ

Indeed, for CPU-intensive jobs, "vanilla universeâ might have the best default settings.  I see local universe primarily utilized for utility jobs (submission of other jobs, various bookkeeping tasks, cleanup).

Hope this helps,

Brian

> On Jan 31, 2016, at 10:07 PM, Jordan Poppenk <jpoppenk@xxxxxxxxxx> wrote:
> 
> Dear Condor users, 
> 
> This issue turned out to be self-inflicted. As I am running on a single machine, I was submitting jobs using a "local" environment, as this seemed like the intuitively correct. When the "local" environment is specified, jobs run in an unlimited fashion; but when I submit jobs using a "vanilla" environment, queuing behaves perfectly.
> 
> Special thanks to Francisco Pereira for helping me work through this off-list.
> 
> Cheers,
> Jordan
> 
> 
> Jordan Poppenk, Ph.D.
> Canada Research Chair in Cognitive Neuroimaging
> Department of Psychology and Centre for Neuroscience Studies
> Queen's University
> http://popmem.com
> 613-533-6009
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 2016-01-31, 4:49 AM, "Jordan Poppenk" <jpoppenk@xxxxxxxxxx> wrote:
> 
>> Dear Condor users,
>> 
>> I am attempting to get condor configured on my server (n machines = 1, n_cpus = 6). I am running Ubuntu 14.0.3 with condor version Debian-8.4.2 as distributed via NeuroDebian.
>> 
>> I'm able to run jobs, but condor ignores all limitations I place. For instance, I set RESERVED_MEMORY=4000, and MAX_JOBS_RUNNING=1 in the condor_config.local file, and in submit files, reserve_cpu=1 and reserve_memory=4Gb. I then submitted about 200 jobs, and every single one of them starts, quickly depleting all available memory.
>> 
>> I investigated condor_status during a recent run and noticed that all the slots were shown to be unclaimed / idle. Investigating the log files, I see "Number of Active Workers 0" in the collector log, which on each negotiation receives 12 ads (cpu hyperthreading). This leads me to believe there is a problem with the collector.
>> 
>> I tried condor_restart to no effect, but I don't know where to go from here. Can you please help?
>> 
>> Cheers,
>> Jordan
>> 
>> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/