[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Other machines don't accept jobs!



I just figured it out...I have to specify their UID_DOMAIN and
FILESYSTEM_DOMAIN in the local config file. How does it works?, I mean it
worked for me even If I DON'T have a shared filesystem!! Can anybody
explain to me how did it happen?


Leo

> Hi all,
>
> I have 3 machines on my pool. node1 (central manager), node2 and node 3.
> (all can execute and can submit jobs).
> Here's what happened:
>
> jobs submitted from node3 can be executed on all 3 machines.
>
> jobs submitted from node1 can NOT be executed on node3, but okey on node1
> & node2.
>
> jobs submitted from node2 can NOT be executed on node3, but okey also on
> node1 and node2.
>
>
> Log file has the following:
> ###########################################
> 022 (067.000.000) 10/12 21:58:15 Job disconnected, attempting to reconnect
> Socket between submit and execute hosts closed unexpectedly
> Trying to reconnect to node3
> <10.0.40.112:32772>
> ...
> 024 (067.000.000) 10/12 21:58:15 Job reconnection failed
> Job not found at execution machine
> Can not reconnect to node3, rescheduling job
> ###########################################
>
> condor_q -analyze: (shows that...)
> ###########################
> node3 match the job but reject for unknown reasons
> ############################
>
> I would very much appreciate for your help.
>
> Leo
>
>