[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] All jobs running at once



Dear Condor users,

I am attempting to get condor configured on my server (n machines = 1, n_cpus = 6). I am running Ubuntu 14.0.3 with condor version Debian-8.4.2 as distributed via NeuroDebian.

I'm able to run jobs, but condor ignores all limitations I place. For instance, I set RESERVED_MEMORY=4000, and MAX_JOBS_RUNNING=1 in the condor_config.local file, and in submit files, reserve_cpu=1 and reserve_memory=4Gb. I then submitted about 200 jobs, and every single one of them starts, quickly depleting all available memory.

I investigated condor_status during a recent run and noticed that all the slots were shown to be unclaimed / idle. Investigating the log files, I see "Number of Active Workers 0" in the collector log, which on each negotiation receives 12 ads (cpu hyperthreading). This leads me to believe there is a problem with the collector.

I tried condor_restart to no effect, but I don't know where to go from here. Can you please help?

Cheers,
Jordan




Jordan Poppenk, Ph.D.
Canada Research Chair in Cognitive Neuroimaging
Department of Psychology and Centre for Neuroscience Studies
Queen's University
http://popmem.com
613-533-6009