[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] condor lease duration not working??
- Date: Mon, 16 May 2005 20:51:54 -0700
- From: John Wheez <john@xxxxxxxxxx>
- Subject: [Condor-users] condor lease duration not working??
Machines on my pool fail to notify the condor collector that the tasks
have finished and remain in a busy state even though the job finished
successfully and there is no CPU utilization. Eventually every CPU in my
pool becomes permenantly busy.. I even set the job_lease_duration = 400
in my submit file...but this does not get my cpus back in my
pool...below is the error from one of the starter.log files.
Condor Master server collector/negotiator is on condor 6.7.8 on Dec
Alpha Red Hat 7.2
Condor Startd machines running Condor 6.7.8 on Windows XP Pro
5/16 10:12:57 Create_Process succeeded, pid=3368
5/16 10:13:23 Process exited, pid=3368, status=0
5/16 10:13:47 getpeername failed so connect must have failed
5/16 10:14:12 Connect failed for 30 seconds; returning FALSE
5/16 10:14:12 FileTransfer: Unable to connect to server <192.168.0.3:9635>
5/16 10:14:12 JIC::allJobsDone() failed, waiting for job lease to expire
or for a reconnect attempt