[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Desktop Policy renders workstations unusable



Dear HTCondor-Team/Users,

We use HTCondor at our institute but keep running into conflicts between machine owners and HTC users. It is a recurring problem that workstations become unusably slow due to different kinds of resource exhaustion (CPU, IO, memory, ...), when working on HTC jobs. We are using the default DESKTOP policy, but jobs are apparently not preempted by keyboard use or other user actions. Ideally, we'd like it to preempt jobs on any indication of non-HTC activity on the machine, preferably including SSH connections. Are there recommendations for such a configuration?

One ugly workaround our users use to "free" their PCs from Condor is commands akin to `watch -n1 condor_vacate`, but this should not be necessary with the default desktop policy in my opinion. Is this a known problem or just some configuration error on our side? You can find our config files attached.

Best,
Alexander

--
UniversitÃt MÃnster
Institut fÃr Theoretische Physik

Wilhelm-Klemm-StraÃe 9
48149 MÃnster

Tel: +49 251 83-34527
E-Mail:itpadmins@xxxxxxxxxxxxxxx
Web:https://www.uni-muenster.de/Physik.TP/

# Job submission and execution is allowed on this machine
use ROLE : Submit, Execute
# Desktop usage is priotised
use POLICY : Desktop

use security : host_based

CONDOR_HOST = CONDORHOST.uni-muenster.de
##  Internet domain of machines sharing a common UID space.  If your
##  machines don't share a common UID space, set it to 
##  UID_DOMAIN = $(FULL_HOSTNAME)
##  to specify that each machine has its own UID space.
UID_DOMAIN              = uni-muenster.de

##  What machines have administrative rights for your pool?  This
##  defaults to your central manager.  You should set it to the
##  machine(s) where whoever is the condor administrator(s) works
##  (assuming you trust all the users who log into that/those
##  machine(s), since this is machine-wide access you're granting).
ALLOW_ADMINISTRATOR = $(CONDOR_HOST), 127.0.0.1, CONDORHOST

##  Read access.  Machines listed as allow (and/or not listed as deny)
##  can view the status of your pool, but cannot join your pool 
##  or run jobs.
##  NOTE: By default, without these entries customized, you
##  are granting read access to the whole world.  You may want to
##  restrict that to hosts in your domain.  If possible, please also
##  grant read access to "*.cs.wisc.edu", so the Condor developers
##  will be able to view the status of your pool and more easily help
##  you install, configure or debug your Condor installation.
##  It is important to have this defined.
ALLOW_READ = OUR_IPS, 127.0.0.1, $(CONDOR_HOST)
##  Write access.  Machines listed here can join your pool, submit
##  jobs, etc.  Note: Any machine which has WRITE access must
##  also be granted READ access.  Granting WRITE access below does
##  not also automatically grant READ access; you must change
##  ALLOW_READ above as well.
##
##  You must set this to something else before Condor will run.
##  This most simple option is:
##    ALLOW_WRITE = *
##  but note that this will allow anyone to submit jobs or add
##  machines to your pool and is a serious security risk.
ALLOW_WRITE = OUR_IPS, 127.0.0.1, $(CONDOR_HOST)

## HIGHPORT and LOWPORT let you set the range of ports that Condor
## will use. This may be useful if you are behind a firewall. By
## default, Condor uses port 9618 for the collector, 9614 for the
## negotiator, and system-assigned (apparently random) ports for
## everything else. HIGHPORT and LOWPORT only affect these
## system-assigned ports, but will restrict them to the range you
## specify here. If you want to change the well-known ports for the
## collector or negotiator, see COLLECTOR_HOST or NEGOTIATOR_HOST.
## Note that both LOWPORT and HIGHPORT must be at least 1024 if you
## are not starting your daemons as root.  You may also specify
## different port ranges for incoming and outgoing connections by
## using IN_HIGHPORT/IN_LOWPORT and OUT_HIGHPORT/OUT_LOWPORT.
HIGHPORT = 11200 
LOWPORT = 9200
## important: when using another range than 9000-11000 don't forget
## to change the firewall settings in files/usr/local/sbin/setup_pf

##  How do you want preen to behave?  The "-m" means you want email
##  about files preen finds that it thinks it should remove.  The "-r"
##  means you want preen to actually remove these files.  If you don't
##  want either of those things to happen, just remove the appropriate
##  one from this setting.
#PREEN_ARGS                      = -m -r
PREEN_ARGS                      = -r
# note: disabled mails from condor_preen, they can be a bit annoying

##  disable reporting to cs.wisc.edu
CONDOR_DEVELOPERS_COLLECTOR	= NONE
CONDOR_DEVELOPERS		= NONE

## disable schedd restart reports (spam mails)
SCHEDD_RESTART_REPORT =
MOUNT_UNDER_SCRATCH=
######################################################################
##
##  condor_config
##
##  This is the global configuration file for condor. This is where
##  you define where the local config file is. Any settings
##  made here may potentially be overridden in the local configuration
##  file.  KEEP THAT IN MIND!  To double-check that a variable is
##  getting set from the configuration file that you expect, use
##  condor_config_val -v <variable name>
##
##  condor_config.annotated is a more detailed sample config file
##
##  Unless otherwise specified, settings that are commented out show
##  the defaults that are used if you don't define a value.  Settings
##  that are defined here MUST BE DEFINED since they have no default
##  value.
##
######################################################################

##  Where have you installed the bin, sbin and lib condor directories?
RELEASE_DIR = /usr

##  Where is the local condor directory for each host?  This is where the local config file(s), logs and
##  spool/execute directories are located. this is the default for Linux and Unix systems.
LOCAL_DIR = /var

##  Where is the machine-specific local config file for each host?
LOCAL_CONFIG_FILE = /etc/condor/condor_config.local
##  If your configuration is on a shared file system, then this might be a better default
#LOCAL_CONFIG_FILE = $(RELEASE_DIR)/etc/$(HOSTNAME).local
##  If the local config file is not present, is it an error? (WARNING: This is a potential security issue.)
REQUIRE_LOCAL_CONFIG_FILE = false

##  The normal way to do configuration with RPM and Debian packaging is to read all of the
##  files in a given directory that don't match a regex as configuration files.
##  Config files are read in lexicographic order.
##  Multiple directories may be specified, separated by commas; directories
##  are read in left-to-right order.
LOCAL_CONFIG_DIR = /usr/share/condor/config.d,/etc/condor/config.d
#LOCAL_CONFIG_DIR_EXCLUDE_REGEXP = ^((\..*)|(.*~)|(#.*)|(.*\.rpmsave)|(.*\.rpmnew))$

##
## Do NOT use host-based security by default.
##
## This was the default for the 8.8 series (and earlier), but it is
## intrinsically insecure.  To make the 9.0 series secure by default, we
## commented it out.
##
## You should seriously consider improving your security configuration.
##
## To continue to use your old security configuration, knowing that it is
## insecure, add the line 'use SECURITY:HOST_BASED' to your local
## configuration directory.  Don't just uncomment the final line in this
## comment block; changes in this file may be lost during your next upgrade.
## The following shell command will make the change on most Linux systems.
##
## echo 'use SECURITY:HOST_BASED' >> $(condor_config_val LOCAL_CONFIG_DIR)/00-insecure.config
##

##  To expand your condor pool beyond a single host, set ALLOW_WRITE to match all of the hosts
#ALLOW_WRITE = *.cs.wisc.edu
##  FLOCK_FROM defines the machines that grant access to your pool via flocking. (i.e. these machines can join your pool).
#FLOCK_FROM =
##  FLOCK_TO defines the central managers that your schedd will advertise itself to (i.e. these pools will give matches to your schedd).
#FLOCK_TO = condor.cs.wisc.edu, cm.example.edu

##--------------------------------------------------------------------
## Values set by the debian patch script:
##--------------------------------------------------------------------

## For Unix machines, the path and file name of the file containing
## the pool password for password authentication.
#SEC_PASSWORD_FILE = $(LOCAL_DIR)/lib/condor/pool_password

##  Pathnames
RUN     = $(LOCAL_DIR)/run/condor
LOG     = $(LOCAL_DIR)/log/condor
LOCK    = $(LOCAL_DIR)/lock/condor
SPOOL   = $(LOCAL_DIR)/spool/condor
EXECUTE = $(LOCAL_DIR)/lib/condor/execute
CRED_STORE_DIR = $(LOCAL_DIR)/lib/condor/cred_dir
ETC     = /etc/condor
BIN     = $(RELEASE_DIR)/bin
LIB     = $(RELEASE_DIR)/lib/condor
INCLUDE = $(RELEASE_DIR)/include/condor
SBIN    = $(RELEASE_DIR)/sbin
LIBEXEC = $(RELEASE_DIR)/libexec/condor
SHARE   = $(RELEASE_DIR)/share/condor
GANGLIA_LIB64_PATH = /lib,/usr/lib,/usr/local/lib

# Account for different pki locations for Debian
AUTH_SSL_SERVER_CERTFILE = /etc/ssl/certs/ssl-cert-snakeoil.pem
AUTH_SSL_SERVER_KEYFILE  = /etc/ssl/private/ssl-cert-snakeoil.key

##  Install the minihtcondor package to run HTCondor on a single node

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature