[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] JOB not executed on target machine - Condor 6.8




I am trying to run condor using 2 Windows machines.

Condor version is 6.8.0


1 machine is the Central Manager and I use it to submit jobs, the other one is just a worker.

Jobs submitted for execution on the Central Manager machine (the same machine that I submit the job from) execute perfectly.

When I submit a Job that needs to be executed on the other machine I get in the Job log the following:


001 (042.000.000) 07/26 09:58:29 Job executing on host: <>


022 (042.000.000) 07/26 09:58:29 Job disconnected, attempting to reconnect

    Socket between submit and execute hosts closed unexpectedly

    Trying to reconnect to vm2@xxxxxxxxxxxxxxxxxxx <>



And the Job never finishes execution.


The Job is a Java job (using java universe).


I stopped all the firewalls on both machines and put:



Both machines have full READ and WRITE access to all machines in the subnet.


I also see on the machine that needs to execute the log the following:

7/26 10:09:41 Trying to query collector <>

7/26 10:09:50 condor_read(): recv() returned -1, errno = 10054, assuming failure.

7/26 10:09:50 IO: EOF reading packet header

7/26 10:09:50 Couldn't fetch ads: communication error

7/26 10:09:50 Aborting negotiation cycle


Any suggestions?