Re: [Condor-users] JOB not executed on target machine

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

All,

More information for below mail:

After further investigation I found also the following in the Startlog:

7/26 12:11:23 Starter pid 2736 died on signal -1073741819 (exception ACCESS_VIOLATION)

I added permission for Everyone and Users group for Java installation directory + Condor directory but I still get it.

Any suggestion?

Thanks, Ronen.

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ronen Yaari
Sent: Wednesday, July 26, 2006 10:10 AM
To: Condor-Users Mail List
Subject: [Condor-users] JOB not executed on target machine - Condor 6.8

All,

I am trying to run condor using 2 Windows machines.

Condor version is 6.8.0

1 machine is the Central Manager and I use it to submit jobs, the other one is just a worker.

Jobs submitted for execution on the Central Manager machine (the same machine that I submit the job from) execute perfectly.

When I submit a Job that needs to be executed on the other machine I get in the Job log the following:

001 (042.000.000) 07/26 09:58:29 Job executing on host: <192.168.16.37:4195>

...

022 (042.000.000) 07/26 09:58:29 Job disconnected, attempting to reconnect

Socket between submit and execute hosts closed unexpectedly

Trying to reconnect to vm2@xxxxxxxxxxxxxxxxxxx <192.168.16.37:4195>

...

And the Job never finishes execution.

The Job is a Java job (using java universe).

I stopped all the firewalls on both machines and put:

ADD_WINDOWS_FIREWALL_EXCEPTION = false

Both machines have full READ and WRITE access to all machines in the subnet.

I also see on the machine that needs to execute the log the following:

7/26 10:09:41 Trying to query collector <192.168.16.23:9618>

7/26 10:09:50 condor_read(): recv() returned -1, errno = 10054, assuming failure.

7/26 10:09:50 IO: EOF reading packet header

7/26 10:09:50 Couldn't fetch ads: communication error

7/26 10:09:50 Aborting negotiation cycle

Any suggestions?

Thanks,

Ronen

Mailing List Archives

Public Access

Re: [Condor-users] JOB not executed on target machine - Condor 6.8