[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problem with submission of Java jobs



Title: Message
There seems to be a problem with the way Condor handles the submission of Java jobs. Consider the following example.
Sorry about the length of this but to get the details across I need to be clear and complete.
 
I want to submit several jobs each with their own set of parameter files for the java program. so we have the following
directory structure.
 
T03_Bob/           - holds common jar files used by all experiments and common data files
    ecj.jar
    javacsv.jar
    yearssn-and-ice-cores-crete-1721-1983-multivariate-train1.dat
    yearssn-and-ice-cores-crete-1721-1983-multivariate-test1.dat
    Experiments/                    - holds the condor job submission files
        b_01_submitAllJobs.bat      - file to submit the condor job
        b_02_allJobs.sub            - file with condor paramters
        EXP_000001/                 - directory for experiment 1
            ssnAndIceCores.ALLparams
        EXP_000002/                 - directory for experiment 2
            ssnAndIceCores.ALLparams
        EXP_000003/                 - directory for experiment 3
            ssnAndIceCores.ALLparams
 
b_01_submitAllJobs.bat 
------------------------------------
condor_submit b_02_allJobs.sub
 
 
b_02_allJobs.sub
--------------------------
universe             = java
# requirements         = (OpSys == "WINNT50") || (OpSys == "WINNT51")
# requirements         = (Machine == "ir41165valdes") || (Machine == "ir41128valdes")
 
# This file contains all experiments to submit to condor
# results will be placed into the individual experiment's directory
 
# Use Condor's File Transfer Mechanism instead of, for example
# using a shared file system. I've sent an email to see if another
# file transfer policy can be used besides copying back to the
# submitter's machine (and potentially overwriting the contents) and
# the response was no (as of 17 MAR 2004)
 
executable              = ..\..\ecj.jar
arguments               = ec.Evolve -file ssnAndIceCores.ALLparams
transfer_input_files    = ..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-train1.dat,..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-test1.dat,ssnAndIceCores.ALLparams
jar_files               = ..\..\ecj.jar,..\..\javacsv.jar
initialdir              = EXP_000001/
should_transfer_files   = YES
when_to_transfer_output = ON_EXIT
log                     = 00_condorNode.log
error                   = 00_condorNode.err
output                  = 00_condorNode.out
Queue
 
executable              = ..\..\ecj.jar
arguments               = ec.Evolve -file ssnAndIceCores.ALLparams
transfer_input_files    = ..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-train1.dat,..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-test1.dat,ssnAndIceCores.ALLparams
jar_files               = ..\..\ecj.jar,..\..\javacsv.jar
initialdir              = EXP_000002/
should_transfer_files   = YES
when_to_transfer_output = ON_EXIT
log                     = 00_condorNode.log
error                   = 00_condorNode.err
output                  = 00_condorNode.out
Queue
 
executable              = ..\..\ecj.jar
arguments               = ec.Evolve -file ssnAndIceCores.ALLparams
transfer_input_files    = ..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-train1.dat,..\..\yearssn-and-ice-cores-crete-1721-1983-multivariate-test1.dat,ssnAndIceCores.ALLparams
jar_files               = ..\..\ecj.jar,..\..\javacsv.jar
initialdir              = EXP_000003/
should_transfer_files   = YES
when_to_transfer_output = ON_EXIT
log                     = 00_condorNode.log
error                   = 00_condorNode.err
output                  = 00_condorNode.out
Queue
 
 
Note that the executable class (with main) is actually ec.Evolve BUT this class is in the ecj.jar file.
I tried using ec.Evolve in this and other java submissions (that work) but found that Condor can't
deal with the class when it's in a jar file ... it looks for the class file and can't find it to 'transfer' (even
though the example in the manual with a jar file that contains the clasees suggests that you
should put the executable class in thie executable statement). So I found that it was necessary to
specify the jar file in the executable. So given this starting point I try to submit the job ...
 
D:\ecj\Condor\T03_Bob\Experiments>condor_submit b_02_allJobs.sub
Submitting job(s).
ERROR: failed to transfer executable file ..\..\ecj.jar
D:\ecj\Condor\T03_Bob\Experiments>
 
Since we set the inital directory to EXP_000001, etc. we expected that it would find the jar files
in ../../ relative to the initial directoy as it does for the files to be transferred. but it cannot find the jar file.
We changed the references to the jar file in the executable statement so it reads:
 
executable              = ..\ecj.jar
Now it gets past the first message suggesting it found the file relative to the submit directory.
But it gives the following message:
 
D:\ecj\Condor\T03_Bob\Experiments>condor_submit b_02_allJobs.sub
Submitting job(s)
ERROR: Can't open "D:\ecj\Condor\T03_Bob\Experiments\EXP_000001/..\ecj.jar"  with flags 00 (No such file or directory)
 
So initially it expected the execuatable jar file to be in a directory relative to the submit directory and then later
it expects it to be in a directory relative to the initial directory. BUt condor can't have it both ways. I would
consider this a bug. If I put the jar files in both the T