[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Some jobs from same cluster won't run



On 4/16/2015 1:46 AM, Steffen Grunewald wrote: I

But the jobs apparently had a bigger memory footprint at the time
of preemption, and no slot with 1221 (? typing this off my memory) MB
is currently available (-better-analyze seems to suggest that the
maximum currently is in the 900ish region).

Accesses involving copy on write (for one) tend to be over-reported by linux kernel, and that's what condor sees (and will auto-insert if you don't explicitly set request_memory). If you're not using cgroups or immediate allocation, the node should swap. If your numbers are correct, a 900ish node will need 300MB of swap to run a 1221ish job. You may not want to hit the swap, but that's a different issue -- if you'd rather have the job run, albeit slowly, request less than 1221MB memory.

Dimitri