[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] MPI process manager and Condor not understanding each other?


      I'm trying to run mpi applications using more than one node through condor. But so far no luck. 
      If I'm not wrong condor_ssh script should be used and it is not. Instead of that, node 0 is trying to launch hydra_pmi_proxy to the other nodes. And that is when the problems start. 
      First, the keys created by sshd.sh in condor, are not being used. Only after adding a ssh key generated outside condor, in a regular authorized_keys file, node 0 could connect to the remote node 1. 
      Now, the following error happens:

[proxy:0:1@xxxxxxxxxxxxxxxxx] launch_procs (./pm/pmiserv/pmip_cb.c:651): unable to change wdir to /opt/condor-7.2.4/local.nc14/execute/dir_14220 (No such file or directory)

       nc14 runs node 0 in that case while nc08 runs node 1. 

       May I have something misconfigured? I wonder if hydra works at all with Condor... 
       What are your suggestions here?

       Advice is highly appreciated.