[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] job_lease_duration / multiple nics

You can specify NETWORK_INTERFACE = 10.x.x.x
in the condor_config.local of the submit host
and BIND_ALL_INTERFACES = FALSE. That might have other consequences but it should be the first thing you try.

Steve Timm

On Fri, 16 Mar 2012, Shrum, Donald C wrote:

I have a dedicated condor cluster here at FSU.  There are occasions where jobs are being evicted at 20 minutes.  Setting job_lease_duration in the submit file resolves the problem.

My submit nodes have both a public and a private interface.  Communication with condor processing nodes occurs over the private network and the central manager is on the same private network.
##  What machine is your central manager?

Looking at the processing node logs (StartLog) I see the following -

03/16/12 12:21:05 slot5: Remote owner is dcshrum@xxxxxxxxxxxxxxxxx
03/16/12 12:21:05 slot5: State change: claiming protocol successful
03/16/12 12:21:05 slot5: Changing state: Matched -> Claimed
03/16/12 12:21:05 slot4: Got activate_claim request from shadow (<>)
03/16/12 12:21:05 slot4: Remote job ID is 9184.0
03/16/12 12:21:05 slot4: Got universe "VANILLA" (5) from request classad
03/16/12 12:21:05 slot4: State change: claim-activation protocol successful
03/16/12 12:21:05 slot4: Changing activity: Idle -> Busy
03/16/12 12:23:59 attempt to connect to <> failed: Connection timed out (connect errno = 110).  Will keep trying for 597 total seconds (576 to go). and are the ips on the submit node.  It appears that communication is fine over to accept the job.  I presume the failure to communicate back on is my problem.

From this I have two questions -
1 - I'm not sure how to force condor to ignore the public ip address on the submit node.
2- I thought the lease renewal occurred when the submit node contacted the processing node.  Not the other way around as my log seems to indicate.


Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at:

Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Group Leader.
Lead of FermiCloud project.