[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job run time limit ?



* do you have space in the spool directory?
Yes, a lot

* is the job consuming all available memory and swap?
* will you program run to completion outside of Condor?
No and yes : if the problem was with the job, there would be something in the stdout file..There is nothing.

* has your execute machine logged any errors?
Yes, in the StartLog : it received a "RELEASE_CLAIM" command from the central-manager (what is it ?), and after lost connection with it. And that for both virtual machines.
Any idea about that ?

9/12 09:10:40 DaemonCore: Command received via UDP from host <172.18.45.80:64684>
9/12 09:10:40 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler (command_handler)
9/12 09:10:40 vm1: State change: received RELEASE_CLAIM command
9/12 09:10:40 vm1: Changing state and activity: Claimed/Busy -> Preempting/Vacating
9/12 09:10:40 Can't connect to <192.168.1.15:49252>:0, errno = 61
9/12 09:10:40 Will keep trying for 10 seconds...
9/12 09:10:50 Connect failed for 10 seconds; returning FALSE
9/12 09:10:50 ERROR:
SECMAN:2003:TCP connection to <192.168.1.15:49252> failed

9/12 09:10:50 Send_Signal: ERROR Connect to <192.168.1.15:49252> failed.9/12 09:10:50 vm1: Error sending signal to starter, errno = 22 (Unknown error: 0)
9/12 09:10:52 vm1: State change: Error sending signals to starter
9/12 09:10:52 vm1: Changing state and activity: Preempting/Vacating -> Owner/Idle
9/12 09:10:52 vm1: State change: IS_OWNER is false
9/12 09:10:52 vm1: Changing state: Owner -> Unclaimed
9/12 09:10:53 State change: RunBenchmarks is TRUE
9/12 09:10:53 vm1: Changing activity: Idle -> Benchmarking
9/12 09:10:57 State change: benchmarks completed
9/12 09:10:57 vm1: Changing activity: Benchmarking -> Idle
9/12 09:10:57 DaemonCore: Command received via UDP from host <172.18.45.80:64685>
9/12 09:10:57 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler (command_handler)
9/12 09:10:57 Error: can't find resource with capability (<192.168.1.15:49234>#9327191778)
9/12 09:10:57 Starter pid 454 died on signal 10 (signal 10)
9/12 09:14:33 Starter pid 490 died on signal 10 (signal 10)
9/12 09:14:35 vm2: State change: starter exited
9/12 09:14:36 vm2: Changing activity: Busy -> Idle
9/12 09:14:36 DaemonCore: Command received via UDP from host <172.18.45.80:64725>
9/12 09:14:36 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler (command_handler)
9/12 09:14:36 vm2: State change: received RELEASE_CLAIM command
9/12 09:14:36 vm2: Changing state and activity: Claimed/Idle -> Preempting/Vacating
9/12 09:14:36 vm2: State change: No preempting claim, returning to owner
9/12 09:14:36 vm2: Changing state and activity: Preempting/Vacating -> Owner/Idle
9/12 09:14:36 vm2: State change: IS_OWNER is false
9/12 09:14:36 vm2: Changing state: Owner -> Unclaimed
9/12 09:14:36 DaemonCore: Command received via UDP from host <172.18.45.80:64726>
9/12 09:14:36 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler (command_handler)
9/12 09:14:36 Error: can't find resource with capability (<192.168.1.15:49234>#1160262051)