Could anyone help me on this?
I thought this is a common problem
where a typical Condor user might face with.
Is there an easy way to prevent starting
a job to the machine which has a temporary memory shortage problem?
The users on the machines I use
tend to behave quite nasty and just one problematic machine clogged with zombie
processes flushes the whole Condor task list in an hour. It takes quite careful
preparation to make a Condor task list and the list is flushed with just a
single problematic node in the cluster.
I hope there is at least one
person who can help me on this.
I have a question about using dynamic memory checking on
I use following Requirements.
Requirements = Memory >=
But this cause problem, if the target machine has about 100 MB
of memory due to zombie processes. The schedd process will start new jobs on
the target machine and jobs will be killed. Very soon, the whole job list gets
exhausted due to just one problematic machine in the cluster.
I know that zombie processes must be removed soon, but I
want to make Condor act smartly on this unfortunate event and save the job list
to itself. So my question is this.
*** I want to prevent schedd from starting new jobs when the
current available virtual space is smaller than the given threshold value. ***
So what I want is something like this.
Requirements = Memory >= 650 &&
Dynamic_Available_Memory_Size >= 200
I tried Image_Size attribute setting to 200000 KB according
to this manual section.
But, somehow, the job was submitted to the machine which had
less than 200MB virtual memory space and eventually this job was killed due to
memory shortage. I changed Image_Size to about 700MB, but still schedd assigned
jobs to the problematic machine.
Can you help me on this issue?