[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor head node connection to other node



Jim,

Andrew had a good list of suggestions (slightly edited below). I'd
consider the most likely candidates to be:

> On the execute nodes:
> ·         CONDOR_HOST – is it pointing to your “head node” / admin machine?
> ·         DAEMON_LIST – does it include MASTER STARTD KBDD?
> On your admin machine check the following:
> ·         ALLOW_READ – this should cover the machines that you want to join your pool
> ·         ALLOW_WRITE – this should cover the machines that you want to join your pool

If I had to guess, I'd say that each node is probably set up to run
it's own pool. Fixing CONDOR_HOST and DAEMON_LIST on those machines
will probably resolve your issue.

Along the same lines, you should consider how you want to manage
configuration files going forward, because you'll want to make changes
in the future. There are several ways to do this:

* A dedicated tool (e.g. CycleServer[1] or Wallaby[1])
* Whatever you use for managing other parts of system configuration
(Chef/Puppet/etc)
* A common configuration file on a shared filesystem
* Manually making changes everywhere as needed (a painful prospect)

[1] http://www.cyclecomputing.com/products-solutions/cycleserver/
[2] http://getwallaby.com/


-- 
Ben Cotton
main: 888.292.5320

Cycle Computing
Leader in Utility HPC Software

http://www.cyclecomputing.com
twitter: @cyclecomputing