[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Java universe and memory (moved from devel to user)



moved to a more useful mail list - didn't spot it in my original reply

On Wed, Mar 5, 2008 at 1:07 PM, Craig Bruce <pcxcb1@xxxxxxxxxxxxxxxx> wrote:
 > Hi,
 >
 >  I'm successfully using 7.0.1 on a linux pool. We run a lot of java jobs that
 >  use lots of RAM. It is not unusual to underestimate the amount of RAM we
 >  need to pass to the JVM:
 >  java_vm_args   = -Xmx900m
 >
 >  If it isn't enough the JVM will not complete the task and the error file
 >  confirms this:
 >  java.lang.OutOfMemoryError: Java heap space
 >
 >  However, condor will evict this job and thus resubmit somewhere else. As the
 >  memory value has not been altered the same error will result. Should the
 >  task not just complete in this case? Otherwise users think the job is
 >  running/waiting to rematch, but really it needs cancelling, modifying and
 >  resubmitting.

 check whether the resulting exit code is consistent and happens only
 in this or similar events.
 If not and you can alter you application use something like:
  try
  {
  }
  catch (OutOfMemoryException e)
  {
     // log it however you normally would
     System.exit(some constant number you know)
  }

 the on_exit_remove or on_exit_hold can trap this and place it on hold
 for you to deal with.


 Matt