Thank you for your suggestions. Just want to point out that RESERVED_MEMORY was picked up by HTCondor after restarting condor. So condor_reconfig is not enough, manual probably needs to address it.
Thank you all for the help.
On 10/23/2018 3:30 PM, Dimitri Maziuk via HTCondor-users wrote:
On 10/23/2018 02:07 PM, Zhuo Zhang via HTCondor-users wrote:Thank you for the inputs Michael. My previous email saying that worker1 has 128GB memory is not accurate. I used "free -m" to check the total memory on that machine:Keep in mind that linux kernel cannot always accurately track several kinds of memory allocations including mmap'ed files, tmpfs, etc. Plus of course FS buffers are typically considered "free". Maybe try a KILL _expression_ for when a job goes over total_memory-3072?
-- Zhuo Zhang, ASSISTT I.M. Systems Group (IMSG), NOAA/NESDIS/STAR 5825 University Research Court, Suite 1500 (IMSG), Cube 1500-11 College Park, MD 20740 Tel: (240) 582-3585 (x23017)