[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] MPI and Permission denied


I try to run parallel job.
The central manager is
I build a dedicated scheduller on
I submit the job on this machine. Here is the submit file

universe        = parallel
executable = scriptmpicondor
arguments       = simplempi
output          = out.$(NODE)
error           = err.$(NODE)
log             = simple.log
should_transfer_files = yes
when_to_transfer_output = on_exit
transfer_input_files = simplempi
machine_count= 4

The job was held: Here is the log file :

012 (075.000.000) 05/21 17:11:21 Job was held.
Error from starter on vm1@abel05: STARTER at failed to send file(s) to <>; SHADOW at failed to write to file /home/condormpi/testmpi/out.0: (errno 13) Permission denied
        Code 12 Subcode 13

When i submit the job the file out.0 is owned by condormpi. The file out.1 is owned by the user nobody. After the user nobody want to write in out.0.

I didn't succeeded to run the entire job with condormpi user.

If i use NFS and no transfert files the error is
HoldReason = "Error from starter on vm2@abel10: Failed to open '/home/condormpi/testmpi/out.0' as standard output: Permission denied (errno 13)"

Here is the condor_config options

UID_DOMAIN      = es8.univ-orleans.fr


##  Internet domain of machines sharing a common file system.
##  If your machines don't use a network file system, set it to
##  to specify that each machine has its own file system.
FILESYSTEM_DOMAIN       = es8.univ-orleans.fr


Emmanuel Le Guirriec
Ingenieur de Recherche Calcul Scientifique CNRS
Federation Denis Poisson
Universite d'Orleans
BP 6759
45067 Orleans Cedex 2
tel / 48.50