[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Daemon problems



On Thu June 16 2005 5:13 am, Alexandre Badez wrote:
> Good Morning !
Hello,

> I'm running a little test cluster of 6 machines, with redhat 3. They are
> named node1 to node6 (ip @ 10.2.4.11 to 10.2.4.16), and my domain name is
> *.mop.ibm.com
> I've setup the 6 machines with the rpm avaiable on the download pages
> (Condor 6.6.9).
> My central manager is node1, all others are execution hosts.
>
> My problem, seems to be my node1 where there is no negociator:
>
> [root@node1 root]# condor_master
> [root@node1 root]# ps ax | grep condor
>  5137 ?        S      0:00 condor_master
>  5138 ?        S      0:00 condor_collector -f
>  5139 ?        R      0:03 condor_startd -f
>  5142 ?        S      0:00 condor_schedd -f
>  5149 pts/0    S      0:00 grep condor
> [root@node1 root]#

I don't know much about how our RPMs configure Condor, but I can see that 
something is wrong here...  Your central manager (node1) should be running 
both the collector and the negotiator.  Look at the DAEMON_LIST setting in 
the condor_config (or condor_config.local), and make sure that both COLLECTOR 
NEGOTIATOR is in the list.

Also, if you don't want to be running jobs on this machine, remove STARTD from 
the list.  Similarly, if you aren't going to be submitting jobs from this 
host, remove SCHEDD from the list.

> Moreover there is a negociator on each execution node:
>
> [root@node2 root]# condor_master
> [root@node2 root]# ps ax | grep condor
> 29704 ?        S      0:00 condor_master
> 29705 ?        S      0:00 condor_collector -f
> 29706 ?        S      0:00 condor_negotiator -f
> 29707 ?        S      0:06 condor_startd -f
> 29708 ?        S      0:00 condor_schedd -f
> 29717 pts/0    R      0:00 grep condor
> [root@node2 root]#

Again, edit your condor_config on the execution node(s), and remove COLLECTOR 
and NEGOTIATOR from the DAEMON_LIST.

As above, I'll note that you're running the schedd here, which allows you to 
submit jobs from this host.  If this is not what you intended, then remove 
SCHEDD from the list.

You'll need to restart Condor on the affected nodes for these changes to take 
effect.  "condor_restart -master node1", or "/etc/init.d/condor restart" (or 
similar).

> Is it normal? After re-reading the installation manual, it don't seems
> so...

Nope.  See above.  I don't know _why_ they're set as they are, but it's 
obviously wrong.

> I can also send the config and config local files if you need them.

Try the above first -- it'll probably solve the problems that you're seeing.  
If not, we can pursue it further.

> Thanks for your help.

Glad to help!

-Nick

-- 
           <<< Welcome to the real world. >>>
 /`-_    Nicholas R. LeRoy               The Condor Project
{     }/ http://www.cs.wisc.edu/~nleroy  http://www.cs.wisc.edu/condor
 \    /  nleroy@xxxxxxxxxxx              The University of Wisconsin
 |_*_|   608-265-5761                    Department of Computer Sciences