[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] starter failed to connect to collector



On Oct 14, 2005, at 1:44 AM, DeVoil, Peter wrote:

I have a few machines in a 30 node winXP pool that refuse to start jobs.
I see these in the starter log:


...
10/14 14:59:38 vm2: Received match <192.168.0.162:1353>#4441711918
10/14 14:59:38 vm2: State change: match notification protocol successful
10/14 14:59:38 vm2: Changing state: Unclaimed -> Matched
10/14 15:01:38 vm2: State change: match timed out
10/14 15:01:38 vm2: Changing state: Matched -> Owner
10/14 15:01:38 vm2: State change: IS_OWNER is false
10/14 15:01:38 vm2: Changing state: Owner -> Unclaimed
...
10/14 15:04:50 DaemonCore: Command received via TCP from host
<192.168.0.98:3484>
10/14 15:04:50 DaemonCore: received command 442 (REQUEST_CLAIM), calling
handler (command_request_claim)
10/14 15:04:50 Error: can't find resource with capability
(<192.168.0.162:1353>#4441711918)
....

It appears the schedd that was matched to this startd took over 5 minutes to connect to it to start the job. We'd have to look at the schedd log to see why it took so long.


+----------------------------------+---------------------------------+
|            Jaime Frey            |  Public Split on Whether        |
|        jfrey@xxxxxxxxxxx         |  Bush Is a Divider              |
|  http://www.cs.wisc.edu/~jfrey/  |         -- CNN Scrolling Banner |
+----------------------------------+---------------------------------+