[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor lease duration not working??



Machines on my pool fail to notify the condor collector that the tasks have finished and remain in a busy state even though the job finished successfully and there is no CPU utilization. Eventually every CPU in my pool becomes permenantly busy.. I even set the job_lease_duration = 400 in my submit file...but this does not get my cpus back in my pool...below is the error from one of the starter.log files.

Any ideas???

Condor Master server collector/negotiator is on condor 6.7.8 on Dec Alpha Red Hat 7.2
Condor Startd machines running Condor 6.7.8 on Windows XP Pro


5/16 10:12:57 Create_Process succeeded, pid=3368
5/16 10:13:23 Process exited, pid=3368, status=0
5/16 10:13:47 getpeername failed so connect must have failed
5/16 10:14:12 Connect failed for 30 seconds; returning FALSE
5/16 10:14:12 FileTransfer: Unable to connect to server <192.168.0.3:9635>
5/16 10:14:12 JIC::allJobsDone() failed, waiting for job lease to expire or for a reconnect attempt