[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] How to get the machine file in parallel jobs




Hi Imre,
  Thanks for sharing your solution with me.  I use windows, and I found that
the environment variable %_CONDOR_JOB_AD% refer to a .job.ad file,
and some useful information are provided in that file including the machines allocated to the job.
I  wrote a C program to obtain the machine list from that file. 

Chunbao Miao
 
Date: 2012-08-27 19:36
Subject: Re: [Condor-users] How to get the machine file in parallel jobs
Hi Chunbao,
 
I had the same problem before.
I have not found proper scripts for submitting in openmpi environment.
I have found one in the share/doc/condor-7.8.1/etc/examples/ directory 
which
installs ssh daemons on the remote machines, but I cannot use it in SMP
environment.
 
Finally I found a solution, may be it helps for you.
  - I created a shell script which collects the host info from job 
status and
    creates a host file containing job IDs and slot numbers for starting 
mpirun.
  - I force the mpirun to use  condor_ssh_to_job. The only problem is the
    mpirun checks the format of the host file and if it starts with 
numbers it
    assumes these are IP addresses. So I added a constant string to the job
    IDs and a wrapper starts the condor_ssh_to_job, which removes the 
constant string.
 
I enclosed my scripts, I hope you can find it useful as well.
 
If your are using openmpi-1.4 change the last command of 
condor_openmpi.sh script to
 
exec $MPIRUN --prefix $MPI_HOME --mca plm_rsh_agent 
$_CONDOR_SSH_TO_JOB_WRAPPER \
               --hostfile $_CONDOR_PARALLEL_HOSTS_FILE $@
 
 
Best,
 
Imre
 
 
2012.08.26. 15:00 keltezéssel, miaocb@xxxxxxx írta:
> Hi All,
>      I successfully configured condor to run parallel jobs, but I can't figure out how to get a machine file that can be used by mpiexec or mpirun to start MPI jobs. Is there an environment variable that refers to the machine file?
>
> thanks
>
> Chunbao Miao
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
 
 
 
 
 
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
 
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/