[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Failed to locate startd - Can't find address for startd



Hi Todd.

Thanks for this. The key seems to be that my master and worker config files did not contain:

CONDOR_HOST =

The strange thing is though, that Condor had been working early last year just fine with these exact config files with an earlier version. The rest of the stuff you mentioned is in fact exactly how I had set the system up.

Anyway, thanks again for your help; everythingÂis now working as it should.

--
Kind regards,

Justin Fisher.


On Mon, Sep 9, 2019 at 10:03 PM Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
On 9/9/2019 10:03 AM, Justin Fisher wrote:
> Hi.
>
> I'm wondering if anyone can help me, please? My condor_status comes up
> empty. Here is some relevant information:
>

Hi Justin,

It would help if you told us what you are trying to do...Â

Assuming you are trying to setup a two-server pool where one machine is a worker node, and the other machine is the central manager + submit node, some pithy suggestions:

0. Install via the RPMs as shown here:

 Âhttps://research.cs.wisc.edu/htcondor/instructions/el/7/stable/

1. Do not make any edits to /etc/condor/condor_config

2. Have /etc/hosts properly setup (or a DNS config) for both forward and reverse address lookups if you want to use host names instead of addresses, for instance include the following on each node /etc/hosts file:

 Â192.168.1.206 manager.jfisher.ingenazure.com manager
 Â192.168.1.207 worker1.jfisher.ingenazure.com worker1

2. On your worker node(s).... in a file in /etc/condor/config.d have something like

 ÂDAEMON_LIST = MASTER STARTD
 ÂCONDOR_HOST = manager.jfisher.ingenazure.comÂ
 ÂALLOW_WRITE = 192.168.1.*

3. On your central manager plus submit node.... in a file in /etc/condor/config.d have something like

 ÂDAEMON_LIST = MASTER COLLECTOR NEGOTIATOR STARTD
 ÂCONDOR_HOST = manager.jfisher.ingenazure.comÂ
 ÂALLOW_WRITE = 192.168.1.*

You probably want to tighten down the ALLOW_WRITE above if you do not trust all hosts on 192.168.1.*
network... the above is just something to help you get going, not a production config!

Besides the Manual, this (old) blog post may help you a bit:
 https://spinningmatt.wordpress.com/2011/06/12/getting-started-creating-a-multiple-node-condor-pool/

Also, I see "Azure" in your host... perhaps of interest, if you are using Azure, the Azure CycleCloud
tool supports spinning up an HTCondor pool in Azure with some points-and-clicks. See

 https://docs.microsoft.com/en-us/azure/cyclecloud/overview

Hope the above helps
Todd