[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] MPI Jobs


I am trying to run a small example MPI program (C Program given in http://www.cs.wisc.edu/condor/manual/v6.6/2_10MPI_Applications.html)

And I am trying to run this on two nodes of Linux OS. I configured those machines to run dedicated jobs. I am submitting the job to condor central manager (this also one of the dedicated resource) job is running only on this machine, and job never started on the other resource.

In the email msg it says something like this


Machine A    exited normally with status 0.

     Machine B:9609>    was never started.


In the Negotiator log in the central manager machine I am seeing like this.


Phase 4.1:  Negotiating with schedds ...

4/3 08:57:36   Negotiating with senthil@Machine B at <xx.xxx.xxx.xx:9605>

4/3 08:58:09 condor_read(): timeout reading buffer.

4/3 08:58:09     Failed to get reply from schedd

4/3 08:58:09   Error: Ignoring schedd for this cycle


Could you please help me how to run the MPI jobs in Condor.