[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Strange condor.boot message and collector not found



OK, the first problem is still there, but the second problem was a result of no CONDOR_HOST line in the "master" condor_config file.  I am surprised that wasn't added by the condor_install script.  Once I added that (and allowed the default COLLECTOR_HOST = CONDOR_HOST to be set properly), my condor_status works alright.

I don't think I understand very well how the condor_install command works.  I understood something like:

cd /se/app/shared/condor
./condor_install --type=execute --local-dir=/osg-local/condor

would setup all my necessary *local* condor directories for the given host.  I could then go and repeat that on several hosts, all of which are using the same "site" install of condor (with the "master" config file in condor/etc/condor_config), and provided
CONDOR_CONFIG pointed to the master config that in turn pointed to a consistent directory for the local config file, then those local settings would override the "site" settings in the "master" config file.  In fact, I discovered that condor_install only worked the first time I executed it, and furthermore it did unexpected things like used the --type setting to update the "master" config file, rather than the local file.

Cheers,

Ian


Ian Stokes-Rees wrote:
On an execute node, I can run condor_master no problem from the command line, but my init script condor.boot generates an error.  Below is a trace.

# shows that CONDOR_CONFIG is set and points to a file which exists and is not empty
[root@mackenzie condor]# ls -Fla $CONDOR_CONFIG
-rw-r--r-- 1 root root 93644 Mar 20  2008 /se/app/shared/condor-7.0.1/etc/condor_config

# shows failed startup script
[root@mackenzie condor]# service condor start
Starting up Condor

Neither the environment variable CONDOR_CONFIG,
/etc/condor/, nor ~condor/ contain a condor_config source.
Either set CONDOR_CONFIG to point to a valid config source,
or put a "condor_config" file in /etc/condor or ~condor/
Exiting.

# shows that condor_master from the command line works
[root@mackenzie sbin]# ./condor_master

[root@nahanni sbin]# ps -ef | grep condor
condor    5990     1  0 17:38 ?        00:00:00 ./condor_master
condor    5991  5990 82 17:38 ?        00:00:02 condor_startd -f

On the "head" node, when I run condor_status I get an error that the collector cannot be found, even though it is running.

[root@abitibi sbin]# condor_status
Error:  Could not fetch ads --- can't find collector

[root@abitibi sbin]# ps -ef | grep condor
condor   28500     1  1 17:45 ?        00:00:00 ./condor_master
condor   28501 28500  0 17:45 ?        00:00:00 condor_collector -f
condor   28503 28500  1 17:45 ?        00:00:00 condor_negotiator -f
condor   28504 28500  1 17:45 ?        00:00:00 condor_schedd -f
condor   28505 28500 86 17:45 ?        00:00:01 condor_startd -f
root     28506 28504  1 17:45 ?        00:00:00 condor_procd -A /tmp/condor-lock.abitibi0.0513363986547155/procd_pipe.SCHEDD -S 60 -C 9422

Any hints as to what might be going wrong would be greatly appreciated.  It seems like very strange behavior.

-- 
Ian Stokes-Rees                            W: http://sbgrid.org
ijstokes@xxxxxxxxxxxxxxxxxxx               T: +1 617 418-4168
SBGrid, Harvard Medical School             F: +1 617 432-5600

  

_______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/

-- 
Ian Stokes-Rees                            W: http://sbgrid.org
ijstokes@xxxxxxxxxxxxxxxxxxx               T: +1 617 418-4168
SBGrid, Harvard Medical School             F: +1 617 432-5600