[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Java jobs preempted when an Error is thrown?



I've just started experimenting with Condor. I'm trying to run code using the Java universe, and I am facing an odd kind of behaviour. If my Java program throws an Error (eg OutOfMemoryError) the program is evicted and sits idle on the queue rather than exiting.

For example consider the code:

Hello.java:

public class Hello {
  public static void main( String [] args ) {
     System.out.println("Hello, world!\n");
     throw new Error();
  }
}

---
Hello.condor:

universe = java
executable = Hello.class
arguments = Hello
output = Hello.output
error = Hello.error
log = Hello.log
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
queue

---

If I run this code, I get the log:

Hello.log:

000 (080.000.000) 10/02 13:26:14 Job submitted from host: <129.94.208.135:59292>
...
001 (080.000.000) 10/02 13:26:17 Job executing on host: <129.94.208.140:48354>
...
004 (080.000.000) 10/02 13:26:17 Job was evicted.
       (0) Job was not checkpointed.
               Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
               Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
       404  -  Run Bytes Sent By Job
       451  -  Run Bytes Received By Job
----


but if I change the Error in Hello.java to a RuntimeException, I get:

000 (081.000.000) 10/02 13:32:33 Job submitted from host: <129.94.208.135:59292>
...
001 (081.000.000) 10/02 13:32:37 Job executing on host: <129.94.208.140:48354>
...
005 (081.000.000) 10/02 13:32:37 Job terminated.
        (0) Abnormal termination (signal 15)
        (0) No core file
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        415  -  Run Bytes Sent By Job
        462  -  Run Bytes Received By Job
        415  -  Total Bytes Sent By Job
        462  -  Total Bytes Received By Job

I don't see any point in having these jobs replaced onto the queue. Is there any way to change this behaviour?

Malcolm