Re: [Condor-users] jobs vacating reason

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

Date: Fri, 10 Dec 2010 10:44:12 -0500

From: Erik Aronesty <erik@xxxxxxx>

Subject: Re: [Condor-users] jobs vacating reason

Removing the entry in /etc/hosts that mapped "f0.<mydom>.local" to 127.0.0.1 on the schedd machine (which was also the collector/negotiator... so I'm not sure it's dependent on schedd) worked immediately to allow ALIVES to go through.

Apparently, the schedd (or perhaps collector/negotiator) server uses it's own /etc/hosts to let the startd compute server know what ip to connect to for ALIVE pings? It seems rather backward ... there are good reasons why the startd server should use it's own DNS (multi-segment networks, failover, etc). (NO_DNS is false in my config)

Anyway, it's fixed, i put my suspend options back on and condor works like a dream and is already in the process of saving our a***es by letting us schedule massive compute jobs.

Thanks!

Mailing List Archives

Public Access

Re: [Condor-users] jobs vacating reason