[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] MPICH error using parallel universe



Hi everyone,
I'm working on getting MPI jobs running using the parallel universe, but I seem to have hit a roadblock. Job execution is fine until it reaches the mpirun command:

mpirun -machinefile machines -nolocal -v -np $_CONDOR_NPROCS $EXECUTABLE $@

At this point, the job exits, and the outfile contains:

	running /var/condor/execute/dir_22980/cpi on 1 LINUX ch_p4 processors
	Could not find enough machines for architecture LINUX

This appears to be an MPICH error, but I can't figure out why it's happening. I've been able to execute mpirun on each of the nodes directly without a problem. Any suggestions on what to try next?

---
Andrew Howard
System Administrator
Rosen Center for Advanced Computing
Purdue University
ahoward@xxxxxxxxxxxxxxx