[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor + MPI + Rocks



Hello everyone,

I am running a Rocks cluster with 8 compute nodes and a head node. I  am interested in using Condor for submitting MPI jobs to the cluster. I am having a problem of making the jobs run. Here is what my condor.job file looks like

####################################
## Test Condor: submit 1 job      ##
## Condor submit description file ##
####################################
universe     = parallel
machine_count = 4
initialdir  = /home/lalovv/test
executable  = another
input       = /dev/null
output      = results/condor.$(Process).out
error       = results/condor.$(Process).err
log         = results/condor.log
notification = Error

queue


Now, when I submit this via condor_submit, the system accepts the job and puts it in the queue. The problem is that it stays there and it never runs. Same happens if I change the universe to MPI.

Here is the kicker: if I change the universe to Vanilla, the job executes but ONLY on one of the compute nodes.

Any ideas?

Thanks for your time.

==
Vasil Lalov
Department Of Computer Science
Bowling Green State University
Bowling Green, OH 43403
lalovv@xxxxxxxx