Hello
I am seeing jobs killed when they exceed their requested memory.
I believe I have shut off any preemption or eviction, but that does not
seem to be the case. Below is our condor_local and a typical submit file
(we are running using DAGman), and a submit.log. Note that we
request_memory is 24GB and we seem to be exiting approximately (and
prematurely) at 24GB. I believe the process may be requesting more and
these particular nodes have a lot more (unused) memory on them.
Is there a way to never kill a job due to memory? Or am I misreading the
logs?