[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Newbie startup question about configuring a simple condor pool



My security problem was solved by reviewing SpinningMatt's wonderful tutorial on setting up Condor Pools.  You need bi-directional ALLOW_WRITE access between head and workers.

Thanks, Matt.


On Wed, Mar 20, 2013 at 9:52 AM, David Hentchel <dhentchel@xxxxxxxxx> wrote:
More information:


This is on Condor 7.6.10

Upon closer inspection, the Collector log on the master host seems to identify the problem:
03/20/13 07:40:18 PERMISSION DENIED to unauthenticated@unmapped from host xx.xx.xx.xxx for command 0 (UPDATE_STARTD_AD), access level ADVERTISE_STARTD: reason: ADVERTISE_STARTD authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: xx.xx.xx.yyy,p52...

This is because I am running as my own login and not as root, so I clearly need to read the section on setting up security.

All of these hosts are safely behind our firewall, so it's not important to me to investigate security at this point.  Are there any shortcuts for disabling security, or setting some simple defaults that will get me running quickly?


Thanks to any and all,
Dave


On Wed, Mar 20, 2013 at 9:17 AM, David Hentchel <dhentchel@xxxxxxxxx> wrote:
I have a set of equivalent linux hosts (p51, p52, ... p58) that I want to configure as a Condor pool for running parallel jobs.

On p51, I install the manager and submit daemons, as follows:
CONDOR_LOCAL=/var/local/condor/`hostname -s`
$CONDOR_INSTALL/condor_install --install=$CONDOR_INSTALL --install-dir=/var/local/condor --local-dir=$CONDOR_LOCAL --env-scripts-dir=$CONDOR_LOCAL --type=submit,execute,manager

Then I start up and all the daemons seem to run correctly.  I can submit a simple job and it gets dispatched and executed.

Then on host p52 I install the second member of the pool:
CONDOR_MGR=p51
CONDOR_LOCAL=/var/local/condor/`hostname -s`
$CONDOR_INSTALL/condor_configure --install=$CONDOR_INSTALL --install-dir=/var/local/condor --local-dir=$CONDOR_LOCAL  --env-scripts-dir=$CONDOR_LOCAL --type=execute --central-manager=$CONDOR_MGR

To set up parallel scheduling, I modify the /var/local/condor/etc/condor_config file to be:
COLLECTOR_NAME = NuoDB-DHentchel-p51
## Parallel scheduling groups
DedicatedScheduler      = p51
ParallelSchedulingGroup = P5

Then I restart daemons on both machines. 

My assumption was that the --central-manager option would set up host p52 to be a slave to the manager and scheduler running on p51, as long as both hosts used the same COLLECTOR_NAME and scheduling group name. But condor_status on p51 shows only the p51 execute slots and nothing for p52.  When I submit a parallel universe job for 2 hosts it get queued but never dispatched, indicating the scheduler is unaware of the second host.

Is there something I'm overlooking in setting up the pool?  I searched FAQs and the doc, but is there some how-to that goes through the first-time setup of a pool of hosts?

Thanks,
dave





--

David Hentchel

Performance Engineer

www.nuodb.com

(617) 803 - 1193




--

David Hentchel

Performance Engineer

www.nuodb.com

(617) 803 - 1193




--

David Hentchel

Performance Engineer

www.nuodb.com

(617) 803 - 1193