[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[condor-users] MPI job on 6.7



Hi, 

I'm having a problem with setting up a Condor pool to 
run a MPI job. I've set up a Condor pool with 2VMs on 
a single PC to test MPI job submission, and tested it
with a job that use only one cpu. The job stays in the
queue forever and donot run. 

I'm using 6.7 for the test.

Could you help me?
      Hidemoto


The follwing lines are added to the local config file.
----------------------------
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxxxxxx"
STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler

SUSPEND    = Scheduler =!= $(DedicatedScheduler) && ($(SUSPEND))
PREEMPT    = Scheduler =!= $(DedicatedScheduler) && ($(PREEMPT))
RANK_FACTOR    = 1000000
RANK   = (Scheduler =?= $(DedicatedScheduler) * $(RANK_FACTOR))
START  = (Scheduler =?= $(DedicatedScheduler)) || ($(START))
MPI_CONDOR_RSH_PATH = /usr/local/condor/sbin
----------------------------

And I submitted a job with single cpu with the following 
submit file.
----------------------------
universe = MPI
executable = simplempi
machine_count = 1 
queue
----------------------------

Here is the log of the schedd
---------------------------------------
5/19 03:25:57 DaemonCore: Command received via TCP from host <192.168.201.3:34316>
5/19 03:25:57 DaemonCore: received command 416 (NEGOTIATE), calling handler (negotiate)
5/19 03:25:57 Negotiating for owner: DedicatedScheduler@xxxxxxxxxxxxxxxxxxxx
5/19 03:25:57 Out of servers - 0 reqs matched, 1 reqs idle, 1 reqs rejected
5/19 03:25:57 Activity on stashed negotiator socket
5/19 03:25:57 Negotiating for owner: DedicatedScheduler@xxxxxxxxxxxxxxxxxxxx
5/19 03:25:57 Out of requests - 1 reqs matched, 0 reqs idle
5/19 03:25:59 Started shadow for MPI job 3.0 (shadow pid = 11223)
5/19 03:25:59 Shadow pid 11223 exited with status 106
5/19 03:25:59 ERROR "shadow exited with incorrect usage!
" at line 1344 in file dedicated_scheduler.C


Here is the log of the shadow.

5/19 03:25:59 ******************************************************
5/19 03:25:59 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/19 03:25:59 ** $CondorVersion: 6.7.0 Apr 27 2004 $
5/19 03:25:59 ** $CondorPlatform: I386-LINUX-RH80 $
5/19 03:25:59 ** PID = 11223
5/19 03:25:59 ******************************************************
5/19 03:25:59 Using config file: /usr/local/condor/etc/condor_config
5/19 03:25:59 Using local config files: /home/condor/condor_config.local
5/19 03:25:59 DaemonCore: Command Socket at <192.168.201.3:34319>
5/19 03:25:59 ERROR: unrecognized option (0)
5/19 03:25:59 Usage: condor_shadow cluster.proc schedd_addr file_name
5/19 03:25:59 argv[0] = condor_shadow
5/19 03:25:59 argv[1] = <192.168.201.3:34309>
5/19 03:25:59 argv[2] = <192.168.201.3:34293>
5/19 03:25:59 argv[3] = <192.168.201.3:34293>#1084904434#4
5/19 03:25:59 argv[4] = 3
5/19 03:25:59 argv[5] = 0
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>