[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Parallel submission issue



Hi

I'm trying to sumbit an MPI job to my condor pool.

The problem is that when I ask it to run on 2 cpus (ie 1 computer), it's fine, but when I ask for 4 CPU (ie 2 computer), one seems not to find the file to write the output.

Here is the submission script : 
$ cat sub-cond.cmd
universe = parallel
executable = mp2script
arguments = /nfs/opt/amber/amber9/exe/sander.MPI -O -i md.in -o TGA07.1.out -p TGA07.top  -c TGA07.0.rst -r TGA07.1.rst -x TGA07.1.trj -e TGA07.1.ene 
machine_count = 4 
should_transfer_files = yes
when_to_transfer_output = on_exit_OR_EVICT
transfer_input_files = /nfs/opt/amber/amber9/exe/sander.MPI,md.in,TGA07.top,TGA07.0.rst
Output  = sanderMPI.out 
Error   = sanderMPI.err
Log     = sanderMPI.log
queue

I'm starting the script from a directory that is nfs-shared : 

(/nfs/test-space/amber)$ ls
blu.sh  clean.sh  md.in  mdinfo  mp2script  mpd.hosts  run_MD.sh  sub-cond.cmd  TGA07.0.rst  TGA07.top

The error is a typical amber error when it can't find the result file (TGA07.1.out is an output file, doesn't exist before runnning the progam.: 

$ more sanderMPI.err
0:
0:   Unit    6 Error on OPEN: TGA07.1.out

0: [cli_0]: aborting job:
0: application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
$

So, where is my problem ? NFS ? file transfer ?

Any help would be greatly appreciated :)

Nicolas

----------------------------------------------------
CNRS - UPR 9080 : Laboratoire de Biochimie Theorique

Institut de Biologie Physico-Chimique
13 rue Pierre et Marie Curie
75005 PARIS - FRANCE

Tel : +33 158 41 51 70
Fax : +33 158 41 50 26
----------------------------------------------------