[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Other machines don't accept jobs!



Hi all,

I have 3 machines on my pool. node1 (central manager), node2 and node 3.
(all can execute and can submit jobs).
Here's what happened:

jobs submitted from node3 can be executed on all 3 machines.

jobs submitted from node1 can NOT be executed on node3, but okey on node1
& node2.

jobs submitted from node2 can NOT be executed on node3, but okey also on
node1 and node2.


Log file has the following:
###########################################
022 (067.000.000) 10/12 21:58:15 Job disconnected, attempting to reconnect
Socket between submit and execute hosts closed unexpectedly
Trying to reconnect to node3
<10.0.40.112:32772>
...
024 (067.000.000) 10/12 21:58:15 Job reconnection failed
Job not found at execution machine
Can not reconnect to node3, rescheduling job
###########################################

condor_q -analyze: (shows that...)
###########################
node3 match the job but reject for unknown reasons
############################

I would very much appreciate for your help.

Leo