[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problem MPI jobs idle!



Hi there,

I am not get to run my mpi job in condor. I am trying example hello how
following:


Executable=hello_mpi
Universe=MPI
Output=output_mpi.out
Error=output_mpi.err
Machine_count=2
Queue

When I submitted it remain idle in queue. I set in condor_config.local file
dedicated scheduler as:DedicatedScheduler =
DedicatedScheduler@condor@labweb02.inf.ufsc.br. Where 'condor' is username
and 'labweb02.inf.ufsc.br' domainname.I set START = True in
condor_config.local too.
The ScheddLog is as following:

8/29 09:18:08 Found idle MPI cluster 686
8/29 09:18:08 Started timer (16) to call handleDedicatedJobs() in 2 secs
8/29 09:18:08 JobsRunning = 0
8/29 09:18:08 JobsIdle = 0
8/29 09:18:08 JobsHeld = 0
8/29 09:18:08 JobsRemoved = 0
8/29 09:18:08 LocalUniverseJobsRunning = 0
8/29 09:18:08 LocalUniverseJobsIdle = 0
8/29 09:18:08 SchedUniverseJobsRunning = 0
8/29 09:18:08 SchedUniverseJobsIdle = 0
8/29 09:18:08 N_Owners = 1
8/29 09:18:08 MaxJobsRunning = 200
8/29 09:18:08 ENABLE_SOAP is undefined, using default value of False
8/29 09:18:08 Trying to update collector <150.162.60.140:9618>
8/29 09:18:08 Attempting to send update via UDP to collector
labweb02.inf.ufsc.br <150.162.60.140:9618>
8/29 09:18:08 SEC_DEBUG_PRINT_KEYS is undefined, using default value of
False
8/29 09:18:08 Sent HEART BEAT ad to 1 collectors. Number of submittors=1
8/29 09:18:08 Changed attribute: RunningJobs = 0
8/29 09:18:08 Changed attribute: IdleJobs = 0
8/29 09:18:08 Changed attribute: HeldJobs = 0
8/29 09:18:08 Changed attribute: FlockedJobs = 0
8/29 09:18:08 Changed attribute: Name = "condor@xxxxxxxxxxxxxxxxxxxx"
8/29 09:18:08 Sent ad to central manager for condor@xxxxxxxxxxxxxxxxxxxx
8/29 09:18:08 Trying to update collector <150.162.60.140:9618>
8/29 09:18:08 Attempting to send update via UDP to collector
labweb02.inf.ufsc.br <150.162.60.140:9618>
8/29 09:18:08 SEC_DEBUG_PRINT_KEYS is undefined, using default value of
False
8/29 09:18:08 Sent ad to 1 collectors for condor@xxxxxxxxxxxxxxxxxxxx
8/29 09:18:08 ============ Begin clean_shadow_recs =============
8/29 09:18:08 ============ End clean_shadow_recs =============
8/29 09:18:10 Starting DedicatedScheduler::handleDedicatedJobs
8/29 09:18:10 Found 1 idle dedicated job(s)
8/29 09:18:10 DedicatedScheduler: Listing all dedicated jobs -
8/29 09:18:10 Dedicated job: 686.0 condor
8/29 09:18:10 SCHEDD_TIMEOUT_MULTIPLIER is undefined, using default value of
0
8/29 09:18:10 Will use UDP to update collector labweb02.inf.ufsc.br
<150.162.60.140:9618>
8/29 09:18:10 Trying to query collector <150.162.60.140:9618>
8/29 09:18:10 SCHEDD_TIMEOUT_MULTIPLIER is undefined, using default value of
0
8/29 09:18:10 SEC_DEBUG_PRINT_KEYS is undefined, using default value of
False
8/29 09:18:10 Found 0 potential dedicated resources
8/29 09:18:10 idle resource list
8/29 09:18:10  ************ empty ************
8/29 09:18:10 limbo resource list
8/29 09:18:10  ************ empty ************
8/29 09:18:10 unclaimed resource list
8/29 09:18:10  ************ empty ************
8/29 09:18:10 busy resource list
8/29 09:18:10  ************ empty ************
8/29 09:18:10 Trying to find 2 resource(s) for dedicated job 686.0
8/29 09:18:10 Trying to satisfy job with all possible resources
8/29 09:18:10 Can't satisfy job 686 with all possible resources... trying
next job
8/29 09:18:10 In DedicatedScheduler::publishRequestAd()
8/29 09:18:10 Trying to update collector <150.162.60.140:9618>
8/29 09:18:10 Attempting to send update via UDP to collector
labweb02.inf.ufsc.br <150.162.60.140:9618>
8/29 09:18:10 SEC_DEBUG_PRINT_KEYS is undefined, using default value of
False
8/29 09:18:10 Entering DedicatedScheduler::checkSanity()
8/29 09:18:10 Finished DedicatedScheduler::handleDedicatedJobs
8/29 09:18:18 Getting monitoring info for pid 2630
8/29 09:21:28 -------- Begin starting jobs --------
8/29 09:21:28 -------- Done starting jobs --------
8/29 09:22:18 Getting monitoring info for pid 2630
8/29 09:23:08 Found idle MPI cluster 686
8/29 09:23:08 Started timer (17) to call handleDedicatedJobs() in 2 secs
8/29 09:23:08 JobsRunning = 0
8/29 09:23:08 JobsIdle = 0
8/29 09:23:08 JobsHeld = 0
8/29 09:23:08 JobsRemoved = 0
8/29 09:23:08 LocalUniverseJobsRunning = 0
8/29 09:23:08 LocalUniverseJobsIdle = 0
8/29 09:23:08 SchedUniverseJobsRunning = 0
8/29 09:23:08 SchedUniverseJobsIdle = 0
8/29 09:23:08 N_Owners = 1
8/29 09:23:08 MaxJobsRunning = 200
8/29 09:23:08 ENABLE_SOAP is undefined, using default value of False
8/29 09:23:08 Trying to update collector <150.162.60.140:9618>
8/29 09:23:08 Attempting to send update via UDP to collector
labweb02.inf.ufsc.br <150.162.60.140:9618>
8/29 09:23:08 SEC_DEBUG_PRINT_KEYS is undefined, using default value of
False
8/29 09:23:08 Sent HEART BEAT ad to 1 collectors. Number of submittors=1
8/29 09:23:08 Changed attribute: RunningJobs = 0
8/29 09:23:08 Changed attribute: IdleJobs = 0
8/29 09:23:08 Changed attribute: HeldJobs = 0
8/29 09:23:08 Changed attribute: FlockedJobs = 0
8/29 09:23:08 Changed attribute: Name = "condor@xxxxxxxxxxxxxxxxxxxx"
8/29 09:23:08 Sent ad to central manager for condor@xxxxxxxxxxxxxxxxxxxx
8/29 09:23:08 Trying to update collector <150.162.60.140:9618>
8/29 09:23:08 Attempting to send update via UDP to collector
labweb02.inf.ufsc.br <150.162.60.140:9618>
8/29 09:23:08 SEC_DEBUG_PRINT_KEYS is undefined, using default value of
False
8/29 09:23:08 Sent ad to 1 collectors for condor@xxxxxxxxxxxxxxxxxxxx


Thanks,

Vinicius
<br><br>
_________________________________________________<br>
E-mail
enviado pelo Webmail da Fesurv<br>
www.fesurv.br - (64) 620.2200 - Rio Verde
- Goiás<br><br>