Re: [HTCondor-users] Dedicated Scheduler Config to enable Parallel Jobs.

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

This is normal for parallel universe. The reason is that the execute nodes must configured to respond to a single dedicated scheduler, so only jobs submitted to that scheduler will ever run.

You would split your execute nodes up by configuring ½ of them to use schedd A as the dedicated scheduler, and and 1/2 to use schedd B as the dedicated scheduler. Then you could submit jobs to either schedd A and schedd B, but those jobs would never be able to use more than ½ of the execute nodes.

This is the same whether your schedd and/or execute nodes are Windows or Linux.

-tj

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Sofya Urbaniec
Sent: Tuesday, October 2, 2018 1:13 PM
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] Dedicated Scheduler Config to enable Parallel Jobs.

I can run multi-processors jobs now but in order to submit a multi-cpu parallel job I have to submit it from a dedicated scheduler. In this case, the master. It means I have to login to the remote machine and submit from there.

Is this behavior expected? Can it be because I run it on Windows and it's it has some limitations?

Thank you.

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Sofya Urbaniec <Sofya.Urbaniec@xxxxxxxxxx>
Sent: Wednesday, September 26, 2018 7:41:35 PM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] Dedicated Scheduler Config to enable Parallel Jobs.

Hello,

I'm trying to configure to enable Parallel Jobs on HTCondor pool running on Windows.

I'm using Condor version 8.4.1

My condor_config on master:

######################################################################

## condor_config

## This is the global configuration file for condor. This is where

## you define where the local config file is. Any settings

## made here may potentially be overridden in the local configuration

## file. KEEP THAT IN MIND! To double-check that a variable is

## getting set from the coniguration file that you expect, use

## condor_config_val -v <variable name>

## condor_config.annotated is a more detailed sample config file

## Unless otherwise specified, settings that are commented out show

## the defaults that are used if you don't define a value. Settings

## that are defined here MUST BE DEFINED since they have no default

## value.

######################################################################

## Where have you installed the bin, sbin and lib condor directories?

RELEASE_DIR = C:\condor

LOCAL_DIR = $(RELEASE_DIR)

LOCAL_CONFIG_FILE = $(LOCAL_DIR)\condor_config.local

REQUIRE_LOCAL_CONFIG_FILE = TRUE

LOCAL_CONFIG_DIR = $(LOCAL_DIR)

SETTABLE_ATTRS_CONFIG = *

SETTABLE_ATTRS_OWNER = TDVERS

STARTD_ATTRS = COLLECTOR_HOST_STRING, TDVERS

CONDOR_HOST = $(FULL_HOSTNAME)

COLLECTOR_NAME = thermal

UID_DOMAIN = domain.com

CONDOR_ADMIN = condor_admin_svc@xxxxxxxxxx

SMTP_SERVER = smtp.domain.com

ALLOW_READ = *

ALLOW_WRITE = $(CONDOR_HOST), $(IP_ADDRESS), *.domain.com

ALLOW_ADMINISTRATOR = $(IP_ADDRESS), *.domain.com

JAVA = C:\PROGRA~2\Java\JRE18~1.0_6\bin\java.exe

START = FALSE

WANT_VACATE = FALSE

WANT_SUSPEND = TRUE

#  Dedicated Scheduler Config to enable Parallel Jobs.

DedicatedScheduler = "DedicatedScheduler@<FQDN of master>"

STARTD_ATTRS = $(STARTD_ATTRS),DedicatedScheduler

DAEMON_LIST = MASTER SCHEDD COLLECTOR NEGOTIATOR

# Space X Additional Configuration

MAX_JOBS_RUNNING=225

START_SCHEDULER_UNIVERSE = TotalSchedulerJobsRunning < 225

START_LOCAL_UNIVERSE = TotalLocalJobsRunning < 225

CREDD_HOST = <FQDN of master>

CREDD_CACHE_LOCALLY = True

STARTER_ALLOW_RUNAS_OWNER = True

ALLOW_CONFIG = condor_admin_svc@*

HOSTALLOW_CONFIG = *.domain.com

SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORD

SEC_CONFIG_NEGOTIATION = REQUIRED

SEC_CONFIG_AUTHENTICATION = REQUIRED

SEC_CONFIG_ENCRYPTION = REQUIRED

SEC_CONFIG_INTEGRITY = REQUIRED

I did condor_reconfig -all and condor_restart

I changed condor_config on two nodes out of 9 to see if it works. It's a condor config from one of the nodes:

######################################################################

##

##  condor_config

##

##  This is the global configuration file for condor. This is where

##  you define where the local config file is. Any settings

##  made here may potentially be overridden in the local configuration

##  file.  KEEP THAT IN MIND!  To double-check that a variable is

##  getting set from the configuration file that you expect, use

##  condor_config_val -v <variable name>

##

##  condor_config.annotated is a more detailed sample config file

##

##  Unless otherwise specified, settings that are commented out show

##  the defaults that are used if you don't define a value.  Settings

##  that are defined here MUST BE DEFINED since they have no default

##  value.

##

######################################################################

##  Where have you installed the bin, sbin and lib condor directories?

RELEASE_DIR = E:\condor

##  Where is the local condor directory for each host?  This is where the local config file(s), logs and

##  spool/execute directories are located. this is the default for Linux and Unix systems.

#LOCAL_DIR = $(TILDE)

##  this is the default on Windows sytems

LOCAL_DIR = $(RELEASE_DIR)

##  Where is the machine-specific local config file for each host?

LOCAL_CONFIG_FILE = $(LOCAL_DIR)\condor_config.local

##  If your configuration is on a shared file system, then this might be a better default

#LOCAL_CONFIG_FILE = $(RELEASE_DIR)\etc\$(HOSTNAME).local

##  If the local config file is not present, is it an error? (WARNING: This is a potential security issue.)

REQUIRE_LOCAL_CONFIG_FILE = FALSE

##  The normal way to do configuration with RPMs is to read all of the

##  files in a given directory that don't match a regex as configuration files.

##  Config files are read in lexicographic order.

LOCAL_CONFIG_DIR = $(LOCAL_DIR)\config

#LOCAL_CONFIG_DIR_EXCLUDE_REGEXP = ^((\..*)|(.*~)|(#.*)|(.*\.rpmsave)|(.*\.rpmnew))$

##  Use a host-based security policy. By default CONDOR_HOST and the local machine will be allowed

use SECURITY : HOST_BASED

##  To expand your condor pool beyond a single host, set ALLOW_WRITE to match all of the hosts

#ALLOW_WRITE = *.cs.wisc.edu

##  FLOCK_FROM defines the machines that grant access to your pool via flocking. (i.e. these machines can join your pool).

#FLOCK_FROM =

##  FLOCK_TO defines the central managers that your schedd will advertise itself to (i.e. these pools will give matches to your schedd).

FLOCK_TO = <FQDN of Master>

##--------------------------------------------------------------------

## Values set by the condor_configure script:

##--------------------------------------------------------------------

JAVA = C:\Program Files (x86)\Java\jre7\bin\java.exe

CONDOR_HOST =

<FQDN of Master>

UID_DOMAIN = domain.com CONDOR_ADMIN = condor_admin_svc@xxxxxxxxxx SMTP_SERVER = smtp.domain.com ALLOW_READ = * ALLOW_WRITE = $(CONDOR_HOST), $(IP_ADDRESS), *.doamin.com ALLOW_ADMINISTRATOR = $(IP_ADDRESS) JAVA = C:\PROGRA~2\Java\JRE18~1.0_6\bin\java.exe DAEMON_LIST = MASTER SCHEDD STARTD KBDD  # Dedicated Scheduler DedicatedScheduler = "DedicatedScheduler@<FQDN of Master>" STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler RANK_FACTOR = 10000 RANK = (Scheduler =?= $(DedicatedScheduler) * $(RANK_FACTOR))  # Space X Additional Configuration CREDD_HOST = <FQDN of Master> CREDD_CACHE_LOCALLY = True STARTER_ALLOW_RUNAS_OWNER = True ALLOW_CONFIG = condor_admin_svc@* HOSTALLOW_CONFIG = $(IP_ADDRESS),*.domain.com SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORD SEC_CONFIG_NEGOTIATION = REQUIRED SEC_CONFIG_AUTHENTICATION = REQUIRED SEC_CONFIG_ENCRYPTION = REQUIRED SEC_CONFIG_INTEGRITY = REQUIRED   SLOTS_CONNECTED_TO_CONSOLE = 2 SLOTS_CONNECTED_TO_KEYBOARD = 2 NonCondorLoadAvg = (LoadAvg - CondorLoadAvg) HighLoad = 1.0 BgndLoad = 0.3 CPU_Busy = ($(NonCondorLoadAvg) >= $(HighLoad)) CPU_Idle = ($(NonCondorLoadAvg) <= $(BgndLoad)) KeyboardBusy = (KeyboardIdle < 10) MachineBusy = ($(CPU_Busy) || $(KeyboardBusy)) ActivityTimer = (CurrentTime - EnteredCurrentActivity) START = $(CPU_Idle) && KeyboardIdle > 300 SUSPEND = $(MachineBusy) CONTINUE = $(CPU_Idle) && KeyboardIdle > 120 PREEMPT = (Activity == "Suspended") && $(ActivityTimer) > 300 SUSPEND = Scheduler =!= $(DedicatedScheduler) && ($(SUSPEND)) PREEMPT = Scheduler =!= $(DedicatedScheduler) && ($(PREEMPT)) START = (Scheduler =?= $(DedicatedScheduler)) || ($(START)) KILL = $(ActivityTimer) > 300   SETTABLE_ATTRS_CONFIG = * SETTABLE_ATTRS_OWNER = TDVERS STARTD_ATTRS = COLLECTOR_HOST_STRING, TDVERS TDVERS = "5.8"

I did condor_reconfig -all and condor_restart

But if I submit a parallel job it stack forever in idle mode.

This is an example of the job:

universe = parallel

should_transfer_files = Yes

when_to_transfer_output = ON_EXIT

notify_user = <email address>

machine_count = 1

request_cpus = 2

notification = Always

run_as_owner = true

getenv = true

log = sleep_log.txt

output = sleep_stdout.txt

error = sleep_stderr.txt

executable = sleep.bat

queue

Please advise.

Thank you.

Mailing List Archives

Public Access

Re: [HTCondor-users] Dedicated Scheduler Config to enable Parallel Jobs.