[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Tracking available memory on a compute host



I found a couple old mentions that "VirtualMemory" and/or
"TotalVirtualMemory" are updated as a machine runs, and one might be
able to use that to make sure there's enough memory available on a
host to run jobs.  However in my experimenting I found it was not
updated nearly often enough to be useful - I gobbled up half the
memory on a machine and the number wasn't changed even 15 minutes
later, though there were updated classads received from it (and I was
querying it directly anyway).

This comes up because I had a user who had queued jobs that kept
flocking to another user's machine where there were available cores,
but no available memory (local usage, outside of HTCondor).  Those
queued jobs kept getting killed by oom_killer shortly after starting,
but then new jobs would flock there.  Thus, I'm looking for some way
to add to the requirements test of a job that the host in question has
enough free virtual memory to run the job.

-- 
Steve Huston - W2SRH - Unix Sysadmin, PICSciE/CSES & Astrophysical Sci
  Princeton University  |    ICBM Address: 40.346344   -74.652242
    345 Lewis Library   |"On my ship, the Rocinante, wheeling through
  Princeton, NJ   08544 | the galaxies; headed for the heart of Cygnus,
    (267) 793-0852      | headlong into mystery."  -Rush, 'Cygnus X-1'