[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Why MPI jobs does not run ?



On 5/16/06, nguyen thanh nha <nha45@xxxxxxxxx> wrote:
Hi,
I am using Condor 6.6.11 and MPICH 1.2.4 .
I just write a hello world mpi program , i can run it
using mpirun on 2 nodes .
But when i submit it to condor ,it does not run .

condor_q -analys on nodes 1 ( also be the master) :
0 are reject ....
0 reject your job ...
0 match ,but are serving users with a better priority
in pool.
1 match ,match, but reject for unknow problem .
0 match,but will not currently preempt ...
1 are avaible to run your job .

Here condor_q -analys on nodes 2 :
0 are reject ....
0 reject your job..
0 match ,but ....
2 match ,match ,but reject for unknown problem.
0 match ...
0 are avaible to run your job.


here is my submit file
universe = MPI
execute = HelloMPI
machine_count = 2
queue

Please help me . i configured dedicateschedular . i
searched  this mail list ,but i can't sovle the
problem.
Thanks in advance .

The "match ,match, but reject for unknow problem" is well known, but it is difficult to solve because you "dont'k know" the reason :(.

So, what you have to do is check the Condor logs, and if they doesn't tell you anything usefull, increase the verbosity level of the logs. I don't remember how to do this, but it surely appears in documentation.

Hope this helps you, Bye!
--
Diego Bello Carreño
Estudiante Memorista de Ingeniería Civil Informática
UTFSM, Valparaíso, Chile
Usuario #294897 counter.li.org