[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] upgrading central manager from condor 7.6.6 to 8.2.10



Hi there, checking in to see if any feedback on my last message from you or anyone else.

Thanks!


From: ade kc
Sent: Wednesday, May 25, 2016 10:37:10 AM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] upgrading central manager from condor 7.6.6 to 8.2.10
 

Hello Michael,

Thanks for the prompt response. My execute nodes are running mainly 8.2.X versions with a handful of 8.4.X


So that IP address - 192.168.122.1:35985 is actually a generic bridge virtual private IP setup for my docker containers. There are other ports with this IP that have no issues.

Anyway docker is no longer using these private IP address, at any rate I got a telnet timeout error


telnet 192.168.122.1 35985
Trying 192.168.122.1...
telnet: Unable to connect to remote host: Connection timed out

Yeah this IP doesn't show up in condor_status long, not worried about it anyway as this container IP is no longer in use.

I can share with you what my current config file looks like and if it's possible maybe you can let me know what I need to remove that may not be compatible with 8.2.10


Please see below.


##  What machine is your central manager?
  3 
  4 CONDOR_HOST = condor.myhost.com
  5 DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxxx"
  6 
  7 ##  Pathnames:
  8 ##  Where have you installed the bin, sbin and lib condor directories?   
  9 
 10 RELEASE_DIR = /home/condor
 11 
 12 
 13 ##  Where is the local condor directory for each host?  
 14 ##  This is where the local config file(s), logs and
 15 ##  spool/execute directories are located
 16 
 17 LOCAL_DIR = /condor/local
 18 
 19 
 20 ##  Mail parameters:
 21 ##  When something goes wrong with condor at your site, who should get
 22 ##  the email?
 23 
 24 CONDOR_ADMIN = condor@xxxxxxxxxxxxxxxxx
 25 
 26 
 27 ##  Full path to a mail delivery program that understands that "-s"
 28 ##  means you want to specify a subject:
MAIL = /bin/mailx
 31 
 32 
 33 ##  Network domain parameters:
 34 ##  Internet domain of machines sharing a common UID space.  If your
 35 ##  machines don't share a common UID space, set it to 
 36 ##  UID_DOMAIN = $(FULL_HOSTNAME)
 37 ##  to specify that each machine has its own UID space.
 38 
 39 UID_DOMAIN = myhost.com
 40 
 41 
 42 ##  Internet domain of machines sharing a common file system.
 43 ##  If your machines don't use a network file system, set it to
 44 ##  FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
 45 ##  to specify that each machine has its own file system. 
 46 
 47 FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
 48 
 49 
 50 ##  The user/group ID <uid>.<gid> of the "Condor" user. 
 51 ##  (this can also be specified in the environment)
 52 ##  Note: the CONDOR_IDS setting is ignored on Win32 platforms
 53 
 54 CONDOR_IDS = 20077.10000

##  Condor needs to create a few lock files to synchronize access to
 58 ##  various log files.  Because of problems we've had with network
 59 ##  filesystems and file locking over the years, we HIGHLY recommend
 60 ##  that you put these lock files on a local partition on each
 61 ##  machine.  If you don't have your LOCAL_DIR on a local partition,
 62 ##  be sure to change this entry.  Whatever user (or group) condor is
 63 ##  running as needs to have write access to this directory.  If
 64 ##  you're not running as root, this is whatever user you started up
 65 ##  the condor_master as.  If you are running as root, and there's a
 66 ##  condor account, it's probably condor.  Otherwise, it's whatever
 67 ##  you've set in the CONDOR_IDS environment variable.  See the Admin
 68 ##  manual for details on this.
 69 
 70 LOCK = /tmp/condor-lock.$(HOSTNAME)0.449911842894839
 71 
 72 DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD
 73 
 74 ENABLE_PERSISTENT_CONFIG = True
 75 PERSISTENT_CONFIG_DIR = /condor/local
 76 SETTABLE_ATTRS_CONFIG = VERTICA_*, ParallelSchedulingGroup
 77 
 78 ######################################################################
 79 ######################################################################
 80 ##  Settings you should leave alone, but that must be defined
 81 ######################################################################

######################################################################
 79 ######################################################################
 80 ##  Settings you should leave alone, but that must be defined
 81 ######################################################################
 82 ######################################################################
 83 
 84 ##  Path to the special version of rsh that's required to spawn MPI
 85 ##  jobs under Condor.  WARNING: This is not a replacement for rsh,
 86 ##  and does NOT work for interactive use.  Do not use it directly!
 87 MPI_CONDOR_RSH_PATH = $(LIBEXEC)
 88 
 89 ##  Path to OpenSSH server binary
 90 ##  Condor uses this to establish a private SSH connection between execute
 91 ##  machines. It is usually in /usr/sbin, but may be in /usr/local/sbin
 92 CONDOR_SSHD = /usr/sbin/sshd
 93 
 94 ##  Path to OpenSSH keypair generator.
 95 ##  Condor uses this to establish a private SSH connection between execute
 96 ##  machines. It is usually in /usr/bin, but may be in /usr/local/bin
 97 CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen
 98 
 99 
100 SEC_DEFAULT_AUTHENTICATION_METHODS = FS_REMOTE
101 FS_REMOTE_DIR = /home/condor/condor_auth
102 
103 #NEGOTIATOR_DEBUG = D_FULLDEBUG D_NEGOTIATE
104 # needed after upgrade to 8.2.10
    #SEC_DEFAULT_NEGOTIATION = NEVER


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Michael V Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx>
Sent: Tuesday, May 24, 2016 3:37:23 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] upgrading central manager from condor 7.6.6 to 8.2.10
 
From: ade kc <kcbobo@xxxxxxxxxxx>
To: "htcondor-users@xxxxxxxxxxx" <htcondor-users@xxxxxxxxxxx>
Date: 05/24/2016 03:59 PM
>
> Hi all,

>
> I recently attempted an upgrade of my central manager node from 7.6.6 to
> 8.2.10 on a centos5 machine.

>
> I did the hot upgrade route by just copying all the new binaries over from
> the release directory i.e. bin, include, sbin, libexec and lib folders
> respectively into the current directory of these folder, while backing up
> the old binaries. Condor automatically restarted once it noticed the new
> binaries in place.


The philosophy of the configuration file changed quite a bit between even
7.8 to 8.0, let alone 7.6 to 8.2.

Unlike before, there's hardly anything that's actually required to be present
in the config file - nearly everything is built in as a default, and so the
only thing you need to do is override whatever you want to be different from
the defaults, and this is usually a very small list of items.

And so a 7.6 config file will invariably have a lot of extraneous stuff in it.
When I shifted over to 8, I wound up reconstructing the config by taking the
8.0 default and adding only the local customization such as startd_cron and
policy expressions via the config.d, and discarding essentially everything
else that had been in the 7.6 and 7.8 config files.

Are your exec nodes upgraded, or still running 7.6 at this point?

The error indicates that the daemon that's supposed to be at port 35985
on host 192.168.122.1 may not be listening. Can you telnet to that IP:port
by hand? What does that 122.1 machine show in its condor_status -long?


        -Michael Pelletier.