[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Jobs remain idle long time!



Hi there,


I am not get to run jobs in Condor. They remain idle almost every time. I am
new user in Condor. I installed condor in one machine, this machine is my
pool condor(manager, submit and execute). I submit some jobs and the
StartLog show this error:

8/19 22:15:11 ******************************************************
8/19 22:15:11 ** condor_startd (CONDOR_STARTD) STARTING UP
8/19 22:15:11 ** /usr/local/condor/sbin/condor_startd
8/19 22:15:11 ** $CondorVersion: 6.7.10 Aug  3 2005 $
8/19 22:15:11 ** $CondorPlatform: I386-LINUX_RH9 $
8/19 22:15:11 ** PID = 10298
8/19 22:15:11 ******************************************************
8/19 22:15:11 Using config file: /etc/condor/condor_config
8/19 22:15:11 Using local config files:
/usr/local/condor/local.labweb02/condor_config.local
8/19 22:15:11 DaemonCore: Command Socket at <150.162.60.140:35955>
8/19 22:15:20 New machine resource allocated
8/19 22:15:20 About to run initial benchmarks.
8/19 22:15:24 Completed initial benchmarks.
8/19 22:15:24 State change: IS_OWNER is false
8/19 22:15:24 Changing state: Owner -> Unclaimed
8/19 22:20:24 State change: RunBenchmarks is TRUE
8/19 22:20:24 Changing activity: Idle -> Benchmarking
8/19 22:20:28 State change: benchmarks completed
8/19 22:20:28 Changing activity: Benchmarking -> Idle
8/19 22:25:28 State change: RunBenchmarks is TRUE
8/19 22:25:28 Changing activity: Idle -> Benchmarking
8/19 22:25:32 State change: benchmarks completed
8/19 22:25:32 Changing activity: Benchmarking -> Idle
8/19 22:29:12 DaemonCore: Command received via UDP from host
<150.162.60.140:32811>
8/19 22:29:12 DaemonCore: received command 440 (MATCH_INFO), calling handler
(command_match_info)
8/19 22:29:12 match_info called
8/19 22:29:12 Received match <150.162.60.140:35955>#1124500511#1
8/19 22:29:12 State change: match notification protocol successful
8/19 22:29:12 Changing state: Unclaimed -> Matched
8/19 22:29:12 DaemonCore: Command received via TCP from host
<150.162.60.140:35981>
8/19 22:29:12 DaemonCore: received command 442 (REQUEST_CLAIM), calling
handler (command_request_claim)
8/19 22:29:12 Request accepted.
8/19 22:29:12 Remote owner is condor@xxxxxxxxxxxxxxxxxxxx
8/19 22:29:12 State change: claiming protocol successful
8/19 22:29:12 Changing state: Matched -> Claimed
8/19 22:29:14 DaemonCore: Command received via TCP from host
<150.162.60.140:35982>
8/19 22:29:14 DaemonCore: received command 444 (ACTIVATE_CLAIM), calling
handler (command_activate_claim)
8/19 22:29:14 Got activate_claim request from shadow
(<150.162.60.140:35982>)
8/19 22:29:14 Remote job ID is 1.0
8/19 22:29:14 exec_starter( labweb02.inf.ufsc.br, 10, 11 ) : pid 10367
8/19 22:29:14 execl(/usr/local/condor/sbin/condor_starter.std,
"condor_starter", labweb02.inf.ufsc.br, 0)
8/19 22:29:14 Got universe "STANDARD" (1) from request classad
8/19 22:29:14 State change: claim-activation protocol successful
8/19 22:29:14 Changing activity: Idle -> Busy
8/19 22:29:17 DaemonCore: Command received via TCP from host
<150.162.60.140:35989>
8/19 22:29:17 DaemonCore: received command 404 (DEACTIVATE_CLAIM_FORCIBLY),
calling handler (command_handler)
8/19 22:29:17 Called deactivate_claim_forcibly()
8/19 22:29:17 Starter pid 10367 exited with status 0
8/19 22:29:17 State change: starter exited
8/19 22:29:17 Changing activity: Busy -> Idle
8/19 22:29:17 DaemonCore: Command received via UDP from host
<150.162.60.140:32811>
8/19 22:29:17 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler (command_release_claim)
8/19 22:29:17 State change: received RELEASE_CLAIM command
8/19 22:29:17 Changing state and activity: Claimed/Idle ->
Preempting/Vacating
8/19 22:29:17 State change: No preempting claim, returning to owner
8/19 22:29:17 Changing state and activity: Preempting/Vacating -> Owner/Idle
8/19 22:29:17 State change: IS_OWNER is false
8/19 22:29:17 Changing state: Owner -> Unclaimed
8/19 22:29:17 DaemonCore: Command received via UDP from host
<150.162.60.140:32811>
8/19 22:29:17 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler (command_release_claim)
============> 8/19 22:29:17 Error: can't find resource with
ClaimId(<150.162.60.140:35955>#1124500511#1 <=================
8/19 22:29:19 State change: RunBenchmarks is TRUE
8/19 22:29:19 Changing activity: Idle -> Benchmarking
8/19 22:29:27 State change: benchmarks completed
8/19 22:29:27 Changing activity: Benchmarking -> Idle
8/19 22:29:39 DaemonCore: Command received via UDP from host
<150.162.60.140:32812>
8/19 22:29:39 DaemonCore: received command 440 (MATCH_INFO), calling handler
(command_match_info)
8/19 22:29:39 match_info called
8/19 22:29:39 Received match <150.162.60.140:35955>#1124500511#3
8/19 22:29:39 State change: match notification protocol successful
8/19 22:29:39 Changing state: Unclaimed -> Matched

Does anyone know why my jobs remain idle during long time? Is it related
with this error in StartLog?



Thanks,


Vinicius<br><br>
_________________________________________________<br>
E-mail
enviado pelo Webmail da Fesurv<br>
www.fesurv.br - (64) 620.2200 - Rio Verde
- Goiás<br><br>