[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] jobs fail to run, with "Warning: Found no submitters"
- Date: Wed, 17 Aug 2005 11:38:07 -0500
- From: Alain Roy <roy@xxxxxxxxxxx>
- Subject: Re: [Condor-users] jobs fail to run, with "Warning: Found no submitters"
This should be sufficient for any machine to map names to ips and vice versa.
However, there would obviously be a problem for any of the execution nodes
looking up a name or ip address that's not listed in the /etc/hosts file,
like for instance the ip address of the wan interface on the head node
(which in this case is 10.32.47.10).
Did you notice that the permission denied was for that address?
8/16 14:45:19 (Sending 15 ads in response to query)
8/16 14:45:19 DaemonCore: PERMISSION DENIED to unknown user from host
<10.32.47.10:45781> for command 10 (QUERY_STARTD_PVT_ADS)
Why, though, would any of the client nodes ever see that address? I can't
imagine any reason that the collector would send that address to the
client node when asking it to execute a job.
The collector/negotiator talk to the execution node on more than one
occasion. For instance, when a match has been made, both the submitter and
executor are notified. I'm not sure what QUERY_STARTD_PVT_ADS is for, but I
guess it happens.
However, this doesn't seem to be the issue, since adding this address to the
/etc/hosts for the client doesn't seem to help.
Be more specific--are you getting the same exact error message, or are you
getting the same general problem (jobs don't run)?
Can you use nslookup (or your favorite tool) to verify that you've got the
lookups working correctly on both the clients and the central manager and
the submitter node?
Another question is why would the collector see the submitted jobs as coming
from the wan ip address on the head node, instead of the cluster interface?
I don't see any evidence that this has happened.
The condor_config.local for the head node specifies that the daemons
should run on the cluster ip address (10.0.0.1).
Do you have multiple network interfaces? Everything is on the same network