[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] jobs fail to run, with "Warning: Found no submitters"



On Wed, Aug 17, 2005 at 10:29:14AM -0500, Alain Roy wrote:
> I have a bet...
> 
> >>
> >> in this case, you should change your HOSTALLOW_ settings in the config
> >> file to allow IPs from both inside and outside:
> >>
> >> HOSTALLOW_READ = 10.32.47.10 10.0.0.*
> >> HOSTALLOW_WRITE = 10.32.47.10 10.0.0.*
> 
> I bet Zach was wrong--probably the HOSTALLOW variables were set to * 
> already, right? Or something sufficiently open anyway.

That is correct.

> Condor assumes that DNS (or /etc/hosts) is set up so that for all machines 
> in your Condor pool:
> 
> 1) Names can be resolved to IP addresses (foo.example.com -> 128.135.20.10)
> 2) IP addresses can be resolved to names (128.135.20.10 -> foo.example.com)
> 
> Any machine in your pool is expected to be able to do this, not just the 
> submit node or the central manager.
>
> A common problem is that one or both of these doesn't work right. Condor 
> will check the HOSTALLOW settings but it do some DNS lookups as part of 
> this to ensure that everything looks good. When the DNS lookups fail, it is 
> usually reported as a permission denied error. (Maybe we should have a 
> better error message!)
> 
> So this is a possible reason that you are getting permission denied errors.

I've definitely thought that the problem may very well be with some sort of DNS
issue, but I'm not sure how.  All machines are using the exact same /etc/hosts
file, that looks like this:

--------------------
127.0.0.1 localhost.localdomain localhost
10.0.0.1 zajos.cluster zajos
10.0.0.101 node1.cluster node1
10.0.0.102 node2.cluster node2
10.0.0.103 node3.cluster node3
--------------------

This should be sufficient for any machine to map names to ips and vice versa.
However, there would obviously be a problem for any of the execution nodes
looking up a name or ip address that's not listed in the /etc/hosts file, like
for instance the ip address of the wan interface on the head node (which in this
case is 10.32.47.10).

However, this doesn't seem to be the issue, since adding this address to the
/etc/hosts for the client doesn't seem to help.

Why, though, would any of the client nodes ever see that address?  I can't
imagine any reason that the collector would send that address to the client node
when asking it to execute a job.  

I guess in the old model of host-based authentication, the execution machine
might need to know the submittor.

Another question is why would the collector see the submitted jobs as coming
from the wan ip address on the head node, instead of the cluster interface?  The
condor_config.local for the head node specifies that the daemons should run on
the cluster ip address (10.0.0.1).

jamie.