Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Some jobs from same cluster won't run

Date: Thu, 16 Apr 2015 20:32:26 -0500
From: Dmitri Maziuk <dmaziuk@xxxxxxxxxxxxx>
Subject: Re: [HTCondor-users] Some jobs from same cluster won't run

On 4/16/2015 1:46 AM, Steffen Grunewald wrote: I

But the jobs apparently had a bigger memory footprint at the time
of preemption, and no slot with 1221 (? typing this off my memory) MB
is currently available (-better-analyze seems to suggest that the
maximum currently is in the 900ish region).

Accesses involving copy on write (for one) tend to be over-reported bylinux kernel, and that's what condor sees (and will auto-insert if youdon't explicitly set request_memory). If you're not using cgroups orimmediate allocation, the node should swap. If your numbers are correct,a 900ish node will need 300MB of swap to run a 1221ish job. You may notwant to hit the swap, but that's a different issue -- if you'd ratherhave the job run, albeit slowly, request less than 1221MB memory.


Dimitri

References:
- [HTCondor-users] Some jobs from same cluster won't run
  - From: Matt Mulqueen
- Re: [HTCondor-users] Some jobs from same cluster won't run
  - From: Ben Cotton
- Re: [HTCondor-users] Some jobs from same cluster won't run
  - From: Steffen Grunewald

Prev by Date: [HTCondor-users] Tomorrows CHTC Team meeting - MOVED TO RM 3310
Next by Date: [HTCondor-users] LoadAvg vs. CondorLoadAvg
Previous by thread: Re: [HTCondor-users] Some jobs from same cluster won't run
Next by thread: [HTCondor-users] Tomorrows CHTC Team meeting - MOVED TO RM 3310
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Some jobs from same cluster won't run