Subject: [Condor-users] MPI jobs in the vanilla universe
Hi,
I am interested in running an MPI job on my cluster (which is already
running Condor 6.6.6), but within the vanilla
universe (there are some restrictions to the MPI universe setup which we
cannot abide by).
In the past, I've used Sun Grid Engine and PBS; I submitted a job asking
for "N nodes"; the batch queueing system would basically wait until N
nodes were free. When the job ran, the batch system would start my job
on the "first" machine of the N, and provide me with a file listing all
the nodes ($PBS_NODEFILE is an env var pointing to the file). At that
point, I could run mpirun with the machines file being the list of
nodes. The batch system would properly manage the nodes, in that they
would be marked as being used, rather than schedule more jobs there.
I've checked the manual, and there doesn't seem to be an equivalent in
condor that I can find. The 'machine_count' directive nt he condor job
file doesn't seem to apply to vanilla jobs, and there are no other ways
I can find to schedule a bunch of machines together.
Any suggestions. Suggestions including obscure class ad hackery are
quite welcome.