Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Independent condor configuration files.

Date: Wed, 07 Jun 2006 11:44:51 -0500 (CDT)
From: Steven Timm <timm@xxxxxxxx>
Subject: [Condor-users] Independent condor configuration files.

I have a couple of condor pools configured
with all of their condor configuration files on the local
disk of each machine.  There are a lot of compute nodes running
only master and startd.  This morning, after a power outage,

the compute nodes came up before the node that runs thecollector/negotiator. They started the master and startd all right,

StartLog looks like this:

6/7 07:29:10 ******************************************************
6/7 07:29:10 ** condor_startd (CONDOR_STARTD) STARTING UP
6/7 07:29:10 ** /opt/condor-6.7.18/sbin/condor_startd
6/7 07:29:10 ** $CondorVersion: 6.7.18 Mar 22 2006 $
6/7 07:29:10 ** $CondorPlatform: I386-LINUX_RH9 $
6/7 07:29:10 ** PID = 3106
6/7 07:29:10 ******************************************************
6/7 07:29:10 Using config file: /etc/condor/condor_config

6/7 07:29:10 Using local config files: /opt/condor/etc/group_params.config/opt/

condor/local/condor_config.local
6/7 07:29:10 DaemonCore: Command Socket at <131.225.167.91:32771>
6/7 07:32:39 vm1: New machine resource allocated
6/7 07:32:39 vm2: New machine resource allocated
6/7 07:32:39 About to run initial benchmarks.
6/7 07:32:44 Completed initial benchmarks.
6/7 07:32:44 vm1: State change: IS_OWNER is false
6/7 07:32:44 vm1: Changing state: Owner -> Unclaimed
6/7 07:32:44 vm2: State change: IS_OWNER is false
6/7 07:32:44 vm2: Changing state: Owner -> Unclaimed
6/7 07:32:48 vm1: Error sending update to collector(s)
6/7 07:32:49 vm2: Error sending update to collector(s)
6/7 07:37:48 vm1: Error sending update to collector(s)
6/7 07:37:49 vm2: Error sending update to collector(s)
6/7 07:42:48 vm1: Error sending update to collector(s)
6/7 07:42:49 vm2: Error sending update to collector(s)
6/7 07:47:48 vm1: Error sending update to collector(s)
6/7 07:47:49 vm2: Error sending update to collector(s)
6/7 07:52:48 vm1: Error sending update to collector(s)


and so forth.. these errors sending the update to collectors
continued well after the collectors were up and working.
I had to stop and restart condor on all of these nodes
to get them to be seen in condor_status on the collector.

Is this expected behavior?  Has anyone successfully
configured condor in such a pool such that you do not
have to restart condor on all nodes in such a situation?

Steve Timm


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525  timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Div/Core Support Services Dept./Scientific Computing Section
Assistant Group Leader, Farms and Clustered Systems Group
Lead of Computing Farms Team

Prev by Date: Re: [Condor-users] question about preemption policy
Next by Date: Re: [Condor-users] question about preemption policy
Previous by thread: Re: [Condor-users] Installation Assistance Needed for New User
Next by thread: [Condor-users] pseudo-dedicated machine
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] Independent condor configuration files.