OK, I can get it to work as expected under v7.4.3 if I change the
permissions on Condor's spool directory on the submit host from 0644
to 1777. However, under v7.2 it worked fine with perms of just 0644,
so why do we now need these less secure settings?|
On 06/10/2010 11:27, Mark Calleja wrote:
Our users have come across a problem for MPI jobs running under
the parallel universe when upgrading from 7.2.5 to 7.4.3, and
though we have found a workaround (mentioned below), it would be
great if we can identify a proper fix.
The issue is that jobs using the "usual" MPI wrapper script (e.g.
mp1script) for such jobs now fail with the following:
error 0 chirp putting identity keys back
Looking in the ShadowLog, it seems that a new permissions problem
rears its head:
09/13 10:48:29 (55247.0) (30445): Request to run on slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <172.24.89.94:9696>
09/13 10:48:29 (55247.0) (30445): FileTransfer::Init():
Permission denied (errno: 13)
We have found that we can get around the issue by spooling the
data on submission, i.e. via "condor_submit -spool" and then
retrieving the data on completion via condor_transfer_data, before
finally removing the job from the queue manually with condor_rm.
This new behaviour is perplexing, as there have been no new
configuration changes made to the hosts on upgrade.
Have we missed something necessary in the upgrade? From the
release notes I can't discern any such new requirement, and having
to remember to manually retrieve output and remove completed jobs
from the queue is a pain in the unmentionables.
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at:
The Cavendish Laboratory, University of Cambridge,
J J Thomson Avenue, Cambridge, CB3 0HE, UK
Tel. (+44/0) 1223 746627