[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Configuration a condor pool with two machines



hi sean...

i'm sending you a copy of the 1st two sections of the condor_config file
that i use i have a copy of this file (with minor mods) on the machines
within my condor network. you might want to compare this with what you have
and make changes accordingly.

to make things simple, i have a condor user/group on each machine with the
same uid/gid. i also have only one condor_manager.


-bruce

##  What machine is your central manager?
CONDOR_HOST             = lserver5.tmesa.com

##--------------------------------------------------------------------
##  Pathnames:
##--------------------------------------------------------------------
##  Where have you installed the bin, sbin and lib condor directories?
RELEASE_DIR             = /home/condor

##  Where is the local condor directory for each host?
##  This is where the local config file(s), logs and
##  spool/execute directories are located
LOCAL_DIR               = $(TILDE)
#LOCAL_DIR              = $(RELEASE_DIR)/hosts/$(HOSTNAME)

##  Where is the machine-specific local config file for each host?
LOCAL_CONFIG_FILE = /home/condor/local.lserver5/condor_config.local

## If the local config file is not present, is it an error?
## WARNING: This is a potential security issue.
## If not specificed, te default is True
#REQUIRE_LOCAL_CONFIG_FILE = TRUE

##--------------------------------------------------------------------
##  Mail parameters:
##--------------------------------------------------------------------
##  When something goes wrong with condor at your site, who should get
##  the email?
CONDOR_ADMIN            = bedouglas@xxxxxxxxxxxxx

##  Full path to a mail delivery program that understands that "-s"
##  means you want to specify a subject:
MAIL                    = /usr/bin/mail

##--------------------------------------------------------------------
##  Network domain parameters:
##--------------------------------------------------------------------
##  Internet domain of machines sharing a common UID space.  If your
##  machines don't share a common UID space, set it to
##  UID_DOMAIN = $(FULL_HOSTNAME)
##  to specify that each machine has its own UID space.
UID_DOMAIN              = $(FULL_HOSTNAME)

##  Internet domain of machines sharing a common file system.
##  If your machines don't use a network file system, set it to
##  FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
##  to specify that each machine has its own file system.
FILESYSTEM_DOMAIN       = $(FULL_HOSTNAME)

##  This macro is used to specify a short description of your pool.
##  It should be about 20 characters long. For example, the name of
##  the UW-Madison Computer Science Condor Pool is ``UW-Madison CS''.
COLLECTOR_NAME          = College

######################################################################
######################################################################
##
##  ######                                   #####
##  #     #    ##    #####    #####         #     #
##  #     #   #  #   #    #     #                 #
##  ######   #    #  #    #     #            #####
##  #        ######  #####      #           #
##  #        #    #  #   #      #           #
##  #        #    #  #    #     #           #######
##
##  Part 2:  Settings you may want to customize:
##  (it is generally safe to leave these untouched)
######################################################################
######################################################################

##
##  The user/group ID <uid>.<gid> of the "Condor" user.
##  (this can also be specified in the environment)
##  Note: the CONDOR_IDS setting is ignored on Win32 platforms
CONDOR_IDS=700.700

##--------------------------------------------------------------------
##  Flocking: Submitting jobs to more than one pool
##--------------------------------------------------------------------
##  Flocking allows you to run your jobs in other pools, or lets
##  others run jobs in your pool.
##
##  To let others flock to you, define FLOCK_FROM.
##
##  To flock to others, define FLOCK_TO.

##  FLOCK_FROM defines the machines where you would like to grant
##  people access to your pool via flocking. (i.e. you are granting
##  access to these machines to join your pool).
FLOCK_FROM =
##  An example of this is:
#FLOCK_FROM = somehost.friendly.domain, anotherhost.friendly.domain

##  FLOCK_TO defines the central managers of the pools that you want
##  to flock to. (i.e. you are specifying the machines that you
##  want your jobs to be negotiated at -- thereby specifying the
##  pools they will run in.)
FLOCK_TO =
##  An example of this is:
#FLOCK_TO = central_manager.friendly.domain, condor.cs.wisc.edu

##  FLOCK_COLLECTOR_HOSTS should almost always be the same as
##  FLOCK_NEGOTIATOR_HOSTS (as shown below).  The only reason it would be
##  different is if the collector and negotiator in the pool that you are
##  flocking too are running on different machines (not recommended).
##  The collectors must be specified in the same corresponding order as
##  the FLOCK_NEGOTIATOR_HOSTS list.
FLOCK_NEGOTIATOR_HOSTS = $(FLOCK_TO)
FLOCK_COLLECTOR_HOSTS = $(FLOCK_TO)
## An example of having the negotiator and the collector on different
## machines is:
#FLOCK_NEGOTIATOR_HOSTS = condor.cs.wisc.edu,
condor-negotiator.friendly.domain
#FLOCK_COLLECTOR_HOSTS =  condor.cs.wisc.edu,
condor-collector.friendly.domain

##--------------------------------------------------------------------
##  Host/IP access levels
##--------------------------------------------------------------------
##  Please see the administrator's manual for details on these
##  settings, what they're for, and how to use them.

##  What machines have administrative rights for your pool?  This
##  defaults to your central manager.  You should set it to the
##  machine(s) where whoever is the condor administrator(s) works
##  (assuming you trust all the users who log into that/those
##  machine(s), since this is machine-wide access you're granting).
HOSTALLOW_ADMINISTRATOR = $(CONDOR_HOST)

##  If there are no machines that should have administrative access
##  to your pool (for example, there's no machine where only trusted
##  users have accounts), you can uncomment this setting.
##  Unfortunately, this will mean that administering your pool will
##  be more difficult.
#HOSTDENY_ADMINISTRATOR = *

##  What machines should have "owner" access to your machines, meaning
##  they can issue commands that a machine owner should be able to
##  issue to their own machine (like condor_vacate).  This defaults to
##  machines with administrator access, and the local machine.  This
##  is probably what you want.
HOSTALLOW_OWNER = $(FULL_HOSTNAME), $(HOSTALLOW_ADMINISTRATOR)

##  Read access.  Machines listed as allow (and/or not listed as deny)
##  can view the status of your pool, but cannot join your pool
##  or run jobs.
##  NOTE: By default, without these entries customized, you
##  are granting read access to the whole world.  You may want to
##  restrict that to hosts in your domain.  If possible, please also
##  grant read access to "*.cs.wisc.edu", so the Condor developers
##  will be able to view the status of your pool and more easily help
##  you install, configure or debug your Condor installation.
##  It is important to have this defined.
HOSTALLOW_READ = 192.168.1.*
#HOSTALLOW_READ = *.your.domain, *.cs.wisc.edu
#HOSTDENY_READ = *.bad.subnet, bad-machine.your.domain, 144.77.88.*

##  Write access.  Machines listed here can join your pool, submit
##  jobs, etc.  Note: Any machine which has WRITE access must
##  also be granted READ access.  Granting WRITE access below does
##  not also automatically grant READ access; you must change
##  HOSTALLOW_READ above as well.
##
##  You must set this to something else before Condor will run.
##  This most simple option is:
##    HOSTALLOW_WRITE = *
##  but note that this will allow anyone to submit jobs or add
##  machines to your pool and is serious security risk.
HOSTALLOW_WRITE = 192.168.1.*
#HOSTALLOW_WRITE = *.your.domain, your-friend's-machine.other.domain
#HOSTDENY_WRITE = bad-machine.your.domain

##  Negotiator access.  Machines listed here are trusted central
##  managers.  You should normally not have to change this.
HOSTALLOW_NEGOTIATOR = $(CONDOR_HOST)
##  Now, with flocking we need to let the SCHEDD trust the other
##  negotiators we are flocking with as well.  You should normally
##  not have to change this either.
HOSTALLOW_NEGOTIATOR_SCHEDD = $(CONDOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS)

##  Config access.  Machines listed here can use the condor_config_val
##  tool to modify all daemon configurations except those specified in
##  the condor_config.root file.  This level of host-wide access
##  should only be granted with extreme caution.  By default, config
##  access is denied from all hosts.
#HOSTALLOW_CONFIG = trusted-host.your.domain

##  Flocking Configs.  These are the real things that Condor looks at,
##  but we set them from the FLOCK_FROM/TO macros above.  It is safe
##  to leave these unchanged.
HOSTALLOW_WRITE_COLLECTOR = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
HOSTALLOW_WRITE_STARTD    = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
HOSTALLOW_READ_COLLECTOR  = $(HOSTALLOW_READ), $(FLOCK_FROM)
HOSTALLOW_READ_STARTD     = $(HOSTALLOW_READ), $(FLOCK_FROM)


##--------------------------------------------------------------------
##  Security parameters for setting configuration values remotely:
##--------------------------------------------------------------------
##  These parameters define the list of attributes that can be set
##  remotely with condor_config_val for the security access levels
##  defined above (for example, WRITE, ADMINISTRATOR, CONFIG, etc).
##  Please see the administrator's manual for futher details on these
##  settings, what they're for, and how to use them.  There are no
##  default values for any of these settings.  If they are not
##  defined, no attributes can be set with condor_config_val.

## Do you want to allow condor_config_val -rset to work at all?
## This feature is disabled by default, so to enable, you must
## uncomment the following setting and change the value to "True".
## Note: changing this requires a restart not just a reconfig.
#ENABLE_RUNTIME_CONFIG = False

## Do you want to allow condor_config_val -set to work at all?
## This feature is disabled by default, so to enable, you must
## uncomment the following setting and change the value to "True".
## Note: changing this requires a restart not just a reconfig.
#ENABLE_PERSISTENT_CONFIG = False

## Directory where daemons should write persistent config files (used
## to support condor_config_val -set).  This directory should *ONLY*
## be writable by root (or the user the Condor daemons are running as
## if non-root).  There is no default, administrators must define this.
## Note: changing this requires a restart not just a reconfig.
#PERSISTENT_CONFIG_DIR = /full/path/to/root-only/local/directory

##  Attributes that can be set by hosts with "CONFIG" permission (as
##  defined with HOSTALLOW_CONFIG and HOSTDENY_CONFIG above).
##  The commented-out value here was the default behavior of Condor
##  prior to version 6.3.3.  If you don't need this behavior, you
##  should leave this commented out.
#SETTABLE_ATTRS_CONFIG = *

##  Attributes that can be set by hosts with "ADMINISTRATOR"
##  permission (as defined above)
#SETTABLE_ATTRS_ADMINISTRATOR = *_DEBUG, MAX_*_LOG

##  Attributes that can be set by hosts with "OWNER" permission (as
##  defined above) NOTE: any Condor job running on a given host will
##  have OWNER permission on that host by default.  If you grant this
##  kind of access, Condor jobs will be able to modify any attributes
##  you list below on the machine where they are running.  This has
##  obvious security implications, so only grant this kind of
##  permission for custom attributes that you define for your own use
##  at your pool (custom attributes about your machines that are
##  published with the STARTD_EXPRS setting, for example).
#SETTABLE_ATTRS_OWNER = your_custom_attribute, another_custom_attr

##  You can also define daemon-specific versions of each of these
##  settings.  For example, to define settings that can only be
##  changed in the condor_startd's configuration by hosts with OWNER
##  permission, you would use:
#STARTD_SETTABLE_ATTRS_OWNER = your_custom_attribute_name


##--------------------------------------------------------------------
##  Network filesystem parameters:
##--------------------------------------------------------------------
##  Do you want to use NFS for file access instead of remote system
##  calls?
USE_NFS         = True

##  Do you want to use AFS for file access instead of remote system
##  calls?
#USE_AFS                = False

##--------------------------------------------------------------------
##  Checkpoint server:
##--------------------------------------------------------------------
##  Do you want to use a checkpoint server if one is available?  If a
##  checkpoint server isn't available or USE_CKPT_SERVER is set to
##  False, checkpoints will be written to the local SPOOL directory on
##  the submission machine.
#USE_CKPT_SERVER        = True

##  What's the hostname of this machine's nearest checkpoint server?
#CKPT_SERVER_HOST       = checkpoint-server-hostname.your.domain

##  Do you want the starter on the execute machine to choose the
##  checkpoint server?  If False, the CKPT_SERVER_HOST set on
##  the submit machine is used.  Otherwise, the CKPT_SERVER_HOST set
##  on the execute machine is used.  The default is true.
#STARTER_CHOOSES_CKPT_SERVER = True

##--------------------------------------------------------------------
##  Miscellaneous:
##--------------------------------------------------------------------
##  Try to save this much swap space by not starting new shadows.
##  Specified in megabytes.
#RESERVED_SWAP          = 5

##  What's the maximum number of jobs you want a single submit machine
##  to spawn shadows for?
#MAX_JOBS_RUNNING       = 200

##  Condor needs to create a few lock files to synchronize access to
##  various log files.  Because of problems we've had with network
##  filesystems and file locking over the years, we HIGHLY recommend
##  that you put these lock files on a local partition on each
##  machine.  If you don't have your LOCAL_DIR on a local partition,
##  be sure to change this entry.  Whatever user (or group) condor is
##  running as needs to have write access to this directory.  If
##  you're not running as root, this is whatever user you started up
##  the condor_master as.  If you are running as root, and there's a
##  condor account, it's probably condor.  Otherwise, it's whatever
##  you've set in the CONDOR_IDS environment variable.  See the Admin
##  manual for details on this.
LOCK            = $(LOG)

##  If you don't use a fully qualified name in your /etc/hosts file
##  (or NIS, etc.) for either your official hostname or as an alias,
##  Condor wouldn't normally be able to use fully qualified names in
##  places that it'd like to.  You can set this parameter to the
##  domain you'd like appended to your hostname, if changing your host
##  information isn't a good option.  This parameter must be set in
##  the global config file (not the LOCAL_CONFIG_FILE from above).
#DEFAULT_DOMAIN_NAME = your.domain.name

##  If you don't have DNS set up, Condor will normally fail in many
##  places because it can't resolve hostnames to IP addresses and
##  vice-versa. If you enable this option, Condor will use
##  pseudo-hostnames constructed from a machine's IP address and the
##  DEFAULT_DOMAIN_NAME. Both NO_DNS and DEFAULT_DOMAIN must set in
##  your top-level config file for this mode of operation to work
##  properly.
#NO_DNS = True

##  Condor can be told whether or not you want the Condor daemons to
##  create a core file if something really bad happens.  This just
##  sets the resource limit for the size of a core file.  By default,
##  we don't do anything, and leave in place whatever limit was in
##  effect when you started the Condor daemons.  If this parameter is
##  set and "True", we increase the limit to as large as it gets.  If
##  it's set to "False", we set the limit at 0 (which means that no
##  core files are even created).  Core files greatly help the Condor
##  developers debug any problems you might be having.
#CREATE_CORE_FILES      = True

##  Condor Glidein downloads binaries from a remote server for the
##  machines into which you're gliding. This saves you from manually
##  downloading and installing binaries for every architecture you
##  might want to glidein to. The default server is one maintained at
##  The University of Wisconsin. If you don't want to use the UW
##  server, you can set up your own and change the following values to
##  point to it, instead.
GLIDEIN_SERVER_URLS = \
  http://www.cs.wisc.edu/condor/glidein/binaries \
  gsiftp://gridftp.cs.wisc.edu/p/condor/public/binaries/glidein

## List the sites you want to GlideIn to on the GLIDEIN_SITES. For example,
## if you'd like to GlideIn to some Alliance GiB resources,
## uncomment the line below.
## Make sure that $(GLIDEIN_SITES) is included in HOSTALLOW_READ and
## HOSTALLW_WRITE, or else your GlideIns won't be able to join your pool.
#GLIDEIN_SITES = *.ncsa.uiuc.edu, *.cs.wisc.edu, *.mcs.anl.gov
GLIDEIN_SITES =

##  If your site needs to use UID_DOMAIN settings (defined above) that
##  are not real Internet domains that match the hostnames, you can
##  tell Condor to trust whatever UID_DOMAIN a submit machine gives to
##  the execute machine and just make sure the two strings match.  The
##  default for this setting is False, since it is more secure this
##  way.
TRUST_UID_DOMAIN = True

## If you would like to be informed in near real-time via condor_q when
## a vanilla/standard/java job is in a suspension state, set this attribute
to
## TRUE. However, this real-time update of the condor_schedd by the shadows
## could cause performance issues if there are thousands of concurrently
## running vanilla/standard/java jobs under a single condor_schedd and they
are
## allowed to suspend and resume.
#REAL_TIME_JOB_SUSPEND_UPDATES = False

## A standard universe job can perform arbitrary shell calls via the
## libc 'system()' function. This function call is routed back to the shadow
## which performs the actual system() invocation in the initialdir of the
## running program and as the user who submitted the job. However, since the
## user job can request ARBITRARY shell commands to be run by the shadow,
this
## is a generally unsafe practice. This should only be made available if it
is
## actually needed. If this attribute is not defined, then it is the same as
## it being defined to False. Set it to True to allow the shadow to execute
## arbitrary shell code from the user job.
#SHADOW_ALLOW_UNSAFE_REMOTE_EXEC = False

## KEEP_OUTPUT_SANDBOX is an optional feature to tell Condor-G to not
## remove the job spool when the job leaves the queue.  To use, just
## set to TRUE.  Since you will be operating Condor-G in this manner,
## you may want to put leave_in_queue = false in your job submit
## description files, to tell Condor-G to simply remove the job from
## the queue immediately when the job completes (since the output files
## will stick around no matter what).
#KEEP_OUTPUT_SANDBOX = False

## This setting tells the negotiator to ignore user priorities.  This
## avoids problems where jobs from different users won't run when using
## condor_advertise instead of a full-blown startd (some of the user
## priority system in Condor relies on information from the startd --
## we will remove this reliance when we support the user priority
## system for grid sites in the negotiator; for now, this setting will
## just disable it).
#NEGOTIATOR_IGNORE_USER_PRIORITIES = False

## These are the directories used to locate classad plug-in functions
#CLASSAD_SCRIPT_DIRECTORY =
#CLASSAD_LIB_PATH =

## This setting tells Condor whether to delegate or copy GSI X509
## credentials when sending them over the wire between daemons.
## Delegation can take up to a second, which is very slow when
## submitting a large number of jobs. Copying exposes the credential
## to third parties if Condor isn't set to encrypt communications.
## By default, Condor will delegate rather than copy.
DELEGATE_JOB_GSI_CREDENTIALS = True

##--------------------------------------------------------------------
##  Settings that control the daemon's debugging output:
##--------------------------------------------------------------------

##
## The flags given in ALL_DEBUG are shared between all daemons.
##

ALL_DEBUG               =

MAX_COLLECTOR_LOG       = 1000000
COLLECTOR_DEBUG         =

MAX_KBDD_LOG            = 1000000
KBDD_DEBUG              =

MAX_NEGOTIATOR_LOG      = 1000000
NEGOTIATOR_DEBUG        = D_MATCH
MAX_NEGOTIATOR_MATCH_LOG = 1000000

MAX_SCHEDD_LOG          = 1000000
SCHEDD_DEBUG            = D_COMMAND D_PID

MAX_SHADOW_LOG          = 1000000
SHADOW_DEBUG            =

MAX_STARTD_LOG          = 1000000
STARTD_DEBUG            = D_COMMAND

MAX_STARTER_LOG         = 1000000
STARTER_DEBUG           = D_NODATE

MAX_MASTER_LOG          = 1000000
MASTER_DEBUG            = D_COMMAND
##  When the master starts up, should it truncate it's log file?
#TRUNC_MASTER_LOG_ON_OPEN        = False

## The daemons touch their log file periodically, even when they have
## nothing to write. When a daemon starts up, it prints the last time
## the log file was modified. This lets you estimate when a previous
## instance of a daemon stopped running. This paramete controls often
## the daemons touch the file (in seconds).
TOUCH_LOG_INTERVAL = 60

[root@laptop2 ~]#


-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx]On Behalf Of Condor Digger
Sent: Thursday, January 11, 2007 11:16 AM
To: condor-users@xxxxxxxxxxx
Subject: [Condor-users] Configuration a condor pool with two machines


Hi all,
    I am a newbie. I am trying to configure a condor pool with two machines
(M1, M2). The two machines are running the same Linux OS, and both machine
can run as a master. Currently when I use the following example from condor
community, the example can only run in the sumbitter machine (M1).

  Example submit profile:
   ########################

  # Submit description file for hello program  ########################
Executable     = hello  Universe       = standard  Output         =
hello.out  Log            = hello.log   Queue I would like it to be run in
the second machine. I search this mailing list, and find this solution. I
add the statement "
requirements= Name == "M2"" in the profile. The job has been successfully
submitted to the queue, but it just stays in the queue, and never get run.
Both M1 and M2 are located in a local network with a local IP address. In M1
and M2's condor_config files, I have add both to the HOST_ALLOW_WRITE with
their local IPs. I also add MASTER_NAME = M2's ip address in M1's config
file.
When I try to run the code, I run command "condor_master" in both machine,
then I run "condor_submit xxxx" in M1. I have compiled the source code with
condor_complile. But the job just stay in the queue when I run condor_q in
M1, and never run (I test for several hours).
What's wrong in my operation? what else I need to do in order to make the
job submitted in M1 to be run in M2? Thanks a lot! Your help will be greatly
helpful!Sean