[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] problem with MPI job clusters



Hi all,
I am testing MPI jobs and have the following problem:

My submit file 'pi.cmd' is:
...
universe = mpi
machine_count = 4

output =$(Cluster).$(Process).$(NODE).out
error   =$(Cluster).$(Process).$(NODE).err
log      =$(Cluster)..log

executable = cpi
queue 2
....

When I submit this file a cluster of 2 jobs is created:

 % condor_submit pi.cmd
Submitting job(s)..
Logging submit event(s)..
2 job(s) submitted to cluster 44.
 % condor_q

-- Submitter: gtx01.esrf.fr : <160.103.6.172:60873> : gtx01.esrf.fr
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD              
  44.0   klotz          10/13 14:39   0+00:00:00 I  0   0.4  cpi              
  44.1   klotz          10/13 14:39   0+00:00:00 I  0   0.4  cpi              

2 jobs; 2 idle, 0 running, 0 held
....

When job number '44.0' has finished, the second job '44.1' will be held in the queue and never start!!!!!

 % condor_q

-- Submitter: gtx01.esrf.fr : <160.103.6.172:60873> : gtx01.esrf.fr
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD              
  44.1   klotz          10/13 14:39   0+00:00:00 I  0   0.4  cpi
....

If  I change the sumit file to:             
...
universe = mpi
machine_count = 4

output =$(Cluster).$(Process).$(NODE).out
error   =$(Cluster).$(Process).$(NODE).err
log      =$(Cluster)..log

executable = cpi
queue
executable = cpi
queue
....

I get two clusters of one job each and both will be started as expected one after the other.

Is this behavior normal????

Regards....
--
WD Klotz - Europ. Synch. Rad. Facility (ESRF) - 6 r Jules Horowitz, BP 220, 38043 Grenoble,  FRANCE
work: +33(0)4.76.88.29.21 fax:...24.27 mobile: +33(0)6.87.38.59.27 mailklotz@xxxxxxx chatskype
Please avoid sending me Word(.doc) or PowerPoint(.ppt) attachments.
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.11.14/131 - Release Date: 12/10/2005