[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] mpirun access "Permission denied"



Hi,

I am using Fedora Core 5, mpich 1.2.4 and Condor 6.7.21.

I have Condor set up on two machines.  I am attempting to run an mpi job.  I
am submitting an mpi job from the centralized manager (machine A).

The job runs correctly on the centralized manager machine (machine A) but in
the temporary executable directory on the other  machine (machine B), I get
a "permission denied" message attempting to use mpirun.

Apparently, when condor logs in as "nobody" on machine B, it cannot access
mpirun in directory /usr/local/mpich-1.2.4/bin/.  I have the execute
properties checked set for owner, group and others for this exe.  What could
be the problem?  Did I forget to set a permission?

In summary, on the centralized manager (machine A) mpirun can be accessed
ok; it cannot on machine B.

On machine B, the output shows the following message while machine A
continues to function.

******************************************************************
p0_3614:  p4_error: Timeout in making connection to remote process on
machineB.url.xxx.xxx: 0
p0_3614: (302.004790) net_send: could not write to fd=4, errno = 32
******************************************************************

Apparently, Condor gives up on machine B and uses only machine A.

Again, can someone point out why Condor cannot access mpirun through the
nobody account on machine B?

Sincerely,

Christopher Jon Jursa
Geoinformatics Laboratory
School of Information Sciences
University of Pittsburgh
web: http://gis.sis.pitt.edu
email: cjursa@xxxxxxxxxxxx
phone: 412-624-8858