[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Unable to run simple parallel job



Hi, we are trying to set up a Condor cluster with MPI support on several dedicated Linux machines.

At the moment we can run standard, vanilla and Java jobs, and that's all working fine. However, we are unable to run even the simplest "/bin/sleep 30" parallel job.

We followed all the instructions  and the three machines are all Dedicated machines. One of them is the DedicatedScheduler, from which we submit jobs. This is also the Condor Manager. The two "nodes" refer to this machine as their DedicatedScheduler, the DedicatedScheduler refers to itself as his DedicatedScheduler (I hope I'm still understandable).

If we run the standard example jobs, like the /bin/sleep or /bin/cat ones, they simply stay in the queue for "unknown reasons" (nothing more in condor_q -an or -be).

We've been working on this problem for a few days now and I don't know what the problem could be. There are no special things in the logs or anything.

If you have a clue of know if I could enable some kind of debug mode, please let me know.

With kind regards,
Rik v. A.