[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] I broke my config, somewhere...



On 7/23/07, Jonathan D. Proulx <jon@xxxxxxxxxxxxx> wrote:

I seme to have broken my test system which to some exent is what it's
for, but I can quite figure out how...


They symptom is jubs submitted from here are "rejected for unknown
reason" on all availabel systems.

Looking at SchedLog I see a bunch of:

 7/23 13:36:22 (pid:3940) attempt to connect to <128.30.2.158:9618> failed: timed
out after 20 seconds.
7/23 13:36:22 (pid:3940) ERROR: SECMAN:2003:TCP auth connection to <128.30.2.158:
9618> failed


Now that IP is in my network space, but does not resolve and is not
up.  Grep'ing for it in the local config comes up with nothing.  What
is it trying to do here and where should I look for this mysterious IP
(I am using krb5 so I did check krb5.conf and /etc/hosts)


That looks like it's trying to contact a collector process on 128.30.2.158.

Some things to check:

1. What is your COLLECTOR_HOST configuration set to (probably
$(CONDOR_HOST), so in that case, what is CONDOR_HOST set to?)

2. Does that IP or hostname appear in your FLOCK_COLLECTOR_HOSTS
configuration variable? (Usually, FLOCK_COLLECTOR_HOSTS is set to
$(FLOCK_TO), so what appears in your FLOCK_TO setting?)

3. Did you make changes to either of the above settings, but haven't
restarted Condor? I don't think those settings are reloaded on
reconfig, they probably need a restart (at least COLLECTOR_HOST)

4. Do you have a CONDOR_VIEW_HOST configuration defined?

-Erik