[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job Submission Problem



On Feb 28, 2008, at 9:01 PM, Saurabh Agarwal wrote:

I have a job which is run as ./executable_name < in.input.
The "in.input" file is a plain text file, but has information to read another file called as "in.data". All the files are stored in the same directory along with the executable. Following is the condor script I am using to submit my job. However, when the job starts executing, it fails to read "in.data" file, which is supplied in the same directory. Kindly, let me know where am I doing wrong. Following is the condor script which I am using to submit this job:

Universe = parallel
initialdir = /backup/benchmark_condor/
Executable = mp1script
#WantIOProxy = True
Output = benchmark.out
Error = benchmark.err
Log = benchmark.log
machine_count = 11
getenv = True
should_transfer_files = yes
when_to_transfer_output = on_exit_or_evict
Queue

The executable mp1script  is just a simple command like :

# Set this to the bin directory of MPICH installation
MPDIR=/usr/local/bin
PATH=$MPDIR:.:$PATH
export PATH

## run the actual mpijob
mpirun -v -np $_CONDOR_NPROCS /backup/benchmark_condor/ executableName < /backup/benchmark_condor/in.input

I have condor 6.8.6 installed on 3 of my machines and is running successfully other MPI jobs in which I am just redirecting stdin, and not reading any other file.


Condor can handle a job's data files in two fashions and you're mixing them. Things will be less confusing if you pick just one.

The first method is to rely on a shared filesystem. This is used when should_transfer_files is set to 'no'. The directory set by initialdir must be on the shared filesystem and Condor will run the job in that directory.

The second method is to transfer all of the job's file between the submit and execute machines. This is used when should_transfer_files is set to 'yes'. The job's executable and standard input, output, and error are transferred. You can specify additional files to be transfered using transfer_input_files and transfer_output_files. The job is run in a temporary directory on the execute machine. The job's files are transferred into and out of that directory before and after the job runs.

You have file transfer enabled, but your script is pulling the executable and standard input directly from your initialdir (which I'm assuming is shared). I'll bet the trouble you're having is that 'in.data' in your standard input file doesn't have the full path to the file. Since the job is running in a temporary directory, it would then be looking in that directory for the file.

Thanks and regards,
Jaime Frey
UW-Madison Condor Team