[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] MPI job issue ( error 0 chirp putting identity keys back )



Hi,

  I have setup condor-7.6.4 and am able to run single node jobs, but when I am trying MPI job (MPICH) , I am getting the following error in .err file :

  -------------
  chirp: couldn't putfile: No such file or directory
  /var/spool/condor/libexec/sshd.sh: line 69: 16129 Aborted                 $CONDOR_CHIRP put -perm 0700 $idkey $_CONDOR_REMOTE_SPOOL_DIR/$_CONDOR_PROCNO.key
  -------------

  The .out file shows :

  ---------------
  error 0 chirp putting identity keys back
  ---------------

  My job submission script is as follows :

  --------------

  universe = parallel
  executable = /var/spool/condor/etc/examples/mp1script
  arguments = /home/kunal/condor/examples/hello_mpi
  machine_count = 2
  +WantIOProxy = True
  Output = hello_mpi.out
  error = hello_mpi.err
  Log = hello_mpi.log
  should_transfer_files = yes
  when_to_transfer_output = on_exit
  +ParallelShutdownPolicy = "WAIT_FOR_ALL"
  transfer_input_files = /home/kunal/condor/examples/hello_mpi
  queue

  ------------

  I saw similar issue in one of the earlier posts https://www-auth.cs.wisc.edu/lists/condor-users/2011-August/msg00033.shtml , but did not find any response.
 
  Any suggestions / thoughts on what I might be missing ?

Thanks & Regards,
Kunal