[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problem with submission of Java jobs



Title: Message
There seems to be a problem with the way Condor handles the submission of Java jobs. Consider the  
following example. Sorry about the length of this but to get the details across I need to be clear  
and complete.I want to submit several jobs each with their own set of parameter files for the java  
program.  So we have the following directory structure.
T03_Bob/           - holds common jar files used by all experiments and common data files
    ecj.jar
    javacsv.jar
    yearssn-and-ice-cores-crete-1721-1983-multivariate-train1.dat
    yearssn-and-ice-cores-crete-1721-1983-multivariate-test1.dat
    Experiments/                    - holds the condor job submission files
        b_01_submitAllJobs.bat      - file to submit the condor job
        b_02_allJobs.sub            - file with condor paramters
        EXP_000001/                 - directory for experiment 1
            ssnAndIceCores.ALLparams
        EXP_000002/                 - directory for experiment 2
            ssnAndIceCores.ALLparams
        EXP_000003/                 - directory for experiment 3
            ssnAndIceCores.ALLparams
 
b_01_submitAllJobs.bat 
------------------------------------
condor_submit b_02_allJobs.sub
 
 
b_02_allJobs.sub
--------------------------
universe             = java
executable              = ..\..\ecj.jar
arguments               = ec.Evolve -file ssnAndIceCores.ALLparams
transfer_input_files    = ..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-train1.dat,..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-test1.dat,ssnAndIceCores.ALLparams
jar_files               = ..\..\ecj.jar,..\..\javacsv.jar
initialdir              = EXP_000001/
should_transfer_files   = YES
when_to_transfer_output = ON_EXIT
log                     = 00_condorNode.log
error                   = 00_condorNode.err
output                  = 00_condorNode.out
Queue
 
executable              = ..\..\ecj.jar
arguments               = ec.Evolve -file ssnAndIceCores.ALLparams
transfer_input_files    = ..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-train1.dat,..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-test1.dat,ssnAndIceCores.ALLparams
jar_files               = ..\..\ecj.jar,..\..\javacsv.jar
initialdir              = EXP_000002/
should_transfer_files   = YES
when_to_transfer_output = ON_EXIT
log                     = 00_condorNode.log
error                   = 00_condorNode.err
output                  = 00_condorNode.out
Queue
 
executable              = ..\..\ecj.jar
arguments               = ec.Evolve -file ssnAndIceCores.ALLparams
transfer_input_files    = ..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-train1.dat,..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-test1.dat,ssnAndIceCores.ALLparams
jar_files               = ..\..\ecj.jar,..\..\javacsv.jar
initialdir              = EXP_000003/
should_transfer_files   = YES
when_to_transfer_output = ON_EXIT
log                     = 00_condorNode.log
error                   = 00_condorNode.err
output                  = 00_condorNode.out
Queue
 
 
Note that the executable class (with main) is actually ec.Evolve BUT this class is in the ecj.jar file.
I tried using ec.Evolve in this and other java submissions (that work) but found that Condor can't
deal with the class when it's in a jar file ... it looks for the class file and can't find it to 'transfer' (even
though the example in the manual with a jar file that contains the clasees suggests that you
should put the executable class in thie executable statement). So I found that it was necessary to
specify the jar file in the executable. So given this starting point I try to submit the job ...
 
D:\ecj\Condor\T03_Bob\Experiments>condor_submit b_02_allJobs.sub
Submitting job(s).
ERROR: failed to transfer executable file ..\..\ecj.jar
D:\ecj\Condor\T03_Bob\Experiments>
 
Since we set the inital directory to EXP_000001, etc. we expected that it would find the jar files
in ../../ relative to the initial directoy as it does for the files to be transferred. but it cannot find the jar file.
We changed the references to the jar file in the executable statement so it reads:
 
executable              = ..\ecj.jar
Now it gets past the first message suggesting it found the file relative to the submit directory.
But it gives the following message:
 
D:\ecj\Condor\T03_Bob\Experiments>condor_submit b_02_allJobs.sub
Submitting job(s)
ERROR: Can't open "D:\ecj\Condor\T03_Bob\Experiments\EXP_000001/..\ecj.jar"  with flags 00 (No such file or directory)
 
So initially it expected the execuatable jar file to be in a directory relative to the submit directory and then later
it expects it to be in a directory relative to the initial directory. But condor can't have it both ways. I would
consider this to be a bug. If I put the jar files in both the T03_Bob and Experiments directory it will submit the jobs correctly.
 
I also wanted to ask why Condor is trying to transfer the executable file when reading the executable
statement since it seems it only needs to use this to determine where to find the correct class
to start the program (it should also accept ec.Evolve the class with the main method as specified
in the manual but this does not work). Once the files are transferred (as specified in the  
transfer_input_files and jar_files directives) it can try to find the executable.
 
Thanks, Bob.