[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Duplicated nodes on master?



Hummm... I already got one of that...

My condor_config is:

FULL_HOSTAME = 192.168.0.2

## ÂWhere have you installed the bin, sbin and lib condor directories? ÂÂ
RELEASE_DIR = /usr

## ÂWhere is the local condor directory for each host? This is where the local config file(s), logs and
## Âspool/execute directories are located. this is the default for Linux and Unix systems.
LOCAL_DIR = /var

## ÂWhere is the machine-specific local config file for each host?
LOCAL_CONFIG_FILE = /etc/condor/condor_config.local
## ÂIf your configuration is on a shared file system, then this might be a better default
#LOCAL_CONFIG_FILE = $(RELEASE_DIR)/etc/$(HOSTNAME).local
## ÂIf the local config file is not present, is it an error? (WARNING: This is a potential security issue.)
REQUIRE_LOCAL_CONFIG_FILE = false

## ÂThe normal way to do configuration with RPMs is to read all of the
## Âfiles in a given directory that don't match a regex as configuration files.
## ÂConfig files are read in lexicographic order.
LOCAL_CONFIG_DIR = /etc/condor/config.d
#LOCAL_CONFIG_DIR_EXCLUDE_REGEXP = ^((\..*)|(.*~)|(#.*)|(.*\.rpmsave)|(.*\.rpmnew))$

## ÂUse a host-based security policy. By default CONDOR_HOST and the local machine will be allowed
use SECURITY : HOST_BASED
## ÂTo expand your condor pool beyond a single host, set ALLOW_WRITE to match all of the hosts
#ALLOW_WRITE = *.cs.wisc.edu
## ÂFLOCK_FROM defines the machines that grant access to your pool via flocking. (i.e. these machines can join your pool).
#FLOCK_FROM =
## ÂFLOCK_TO defines the central managers that your schedd will advertise itself to (i.e. these pools will give matches to your schedd).
#FLOCK_TO = condor.cs.wisc.edu, cm.example.edu

##--------------------------------------------------------------------
## Values set by the debian patch script:
##--------------------------------------------------------------------

## For Unix machines, the path and file name of the file containing
## the pool password for password authentication.
#SEC_PASSWORD_FILE = $(LOCAL_DIR)/lib/condor/pool_password

## ÂPathnames
RUN Â Â = $(LOCAL_DIR)/run/condor
LOG Â Â = $(LOCAL_DIR)/log/condor
LOCK Â Â= $(LOCAL_DIR)/lock/condor
SPOOL Â = $(LOCAL_DIR)/lib/condor/spool
EXECUTE = $(LOCAL_DIR)/lib/condor/execute
BIN Â Â = $(RELEASE_DIR)/bin
LIB Â Â = $(RELEASE_DIR)/lib/condor
INCLUDE = $(RELEASE_DIR)/include/condor
SBIN Â Â= $(RELEASE_DIR)/sbin
LIBEXEC = $(RELEASE_DIR)/lib/condor/libexec
SHARE Â = $(RELEASE_DIR)/share/condor
GANGLIA_LIB64_PATH = /lib,/usr/lib,/usr/local/lib

PROCD_ADDRESS = $(RUN)/procd_pipe

## ÂWhat machine is your central manager?

CONDOR_HOST = $(FULL_HOSTNAME)
NEGOTIATOR = $(SBIN)/condor_negotiator
COLLECTOR = $(SBIN)/condor_collector
USE_CKPT_SERVER = FALSE
#CKPT_SERVER = $(SBIN)/condor_ckpt_server
#CKPT_SERVER_HOST Â= 192.168.0.2

NETWORK_INTERFACE = 192.168.0.2


## ÂThis macro determines what daemons the condor_master will start and keep its watchful eyes on.
## ÂThe list is a comma or space separated list of subsystem names

DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD

NUM_CPUS=2
START=TRUE
SUSPEND=FALSE
CONTINUE=TRUE
PREEMPT=FALSE
KILL=FALSE

ALLOW_ADMINISTRATOR = $(CONDOR_HOST)
ALLOW_OWNER = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR)
ALLOW_READ = *
ALLOW_WRITE = *
ALLOW_NEGOTIATOR = *
#$(COLLECTOR_HOST)
ALLOW_NEGOTIATOR_SCHEDD = $(COLLECTOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS)
ALLOW_WRITE_COLLECTOR = $(ALLOW_WRITE), $(FLOCK_FROM)
ALLOW_WRITE_STARTD Â Â= $(ALLOW_WRITE), $(FLOCK_FROM)
ALLOW_READ_COLLECTOR Â= $(ALLOW_READ), $(FLOCK_FROM)
ALLOW_READ_STARTD Â Â = $(ALLOW_READ), $(FLOCK_FROM)
ALLOW_CLIENT = *

HOSTALLOW_CONFIG = $(CONDOR_HOST)

SEC_DEFAULT_AUTHENTICATION_METHODS = FS, PASSWORD


------------------------------------------------------------------------------------------------------------------------
Prof. Dr. Roberto Fernandes Tavares Neto
Departamento de Engenharia de ProduÃÃo / Industrial Engineering Department
Universidade Federal de SÃo Carlos
tavares@xxxxxxxxxxxxx   tel +55 16 3351-9532
http://www.dep.ufscar.br/tavares
------------------------------------------------------------------------------------------------------------------------

On Wed, Dec 13, 2017 at 6:51 PM, Edier Zapata <edalzap@xxxxxxxxx> wrote:
Hi,
ÂNETWORK_INTERFACE=192.168.xxx.yyy (IP of the Master Node in network 192.168)
With that, condor will accept jobs only from net 192.168.

Bye

On Wed, Dec 13, 2017 at 3:46 PM, Roberto Tavares <tavares@xxxxxxxxxxxxx> wrote:
Hello,

I figure it out:

I have two network interfaces (one for my external network IP 200.xxxxx and another for the condor network 192.168xxxxx). Condor is creating nodes for both interfaces.

How can I limit condor to NOT use the interface eth0 with IP 200.xxxxx?

Thank you!


------------------------------------------------------------------------------------------------------------------------
Prof. Dr. Roberto Fernandes Tavares Neto
Departamento de Engenharia de ProduÃÃo / Industrial Engineering Department
Universidade Federal de SÃo Carlos
tavares@xxxxxxxxxxxxx   tel +55 16 3351-9532
http://www.dep.ufscar.br/tavares
------------------------------------------------------------------------------------------------------------------------

On Tue, Dec 5, 2017 at 11:33 AM, Edier Zapata <edalzap@xxxxxxxxx> wrote:
Hi Roberto,
Âtry this:
condor_status -af:h Name OpSys Arch Memory Cpus
You will get the full name for each slot, operating system, architecture, memory and cores (CPUs)
The another way is check the Collector's log (/var/log/condor/CollectorLog) and the StartdLog (same path)

Bye

On Tue, Dec 5, 2017 at 6:30 AM, Roberto Tavares <tavares@xxxxxxxxxxxxx> wrote:
Hello,


I've configured HTCondor into to machines, Node0 (COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD) and Node1 (MASTER, STARTD).

The communication seems to be fine, I can use all nodes. However, At some point I'm getting duplicated nodes on Node0. condor_status gives me:

$ condor_status
NameÂÂÂÂÂÂÂÂÂÂÂÂÂÂ OpSysÂÂÂÂÂ ArchÂÂ StateÂÂÂÂ Activity LoadAv MemÂÂ ActvtyTime

slot1@Node1 LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 1992Â 0+00:00:21
slot2@Node1 LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 1992Â 0+00:00:23
slot3@Node1 LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 1992Â 0+00:00:24
slot4@Node1 LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 1992Â 0+00:00:24
slot5@Node1 LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 1992Â 0+00:00:26
slot6@Node1 LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 1992Â 0+00:00:27
slot7@Node1 LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 1992Â 0+00:00:27
slot8@Node1 LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 1992Â 0+00:00:21
slot1@Node0 LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 3943Â 0+00:00:01
slot1@Node0 LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.260 3943Â 0+00:00:01
slot2@Node0 LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.240 3943Â 0+00:00:03
slot2@Node0 LINUXÂÂÂÂÂ X86_64 Unclaimed IdleÂÂÂÂÂ 0.000 3943Â 0+00:00:02
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Total Owner Claimed Unclaimed Matched Preempting Backfill

ÂÂÂÂÂÂÂ X86_64/LINUXÂÂÂ 12ÂÂÂÂ 0ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 12ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 0

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ TotalÂÂÂ 12ÂÂÂÂ 0ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 12ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 0

In my config file for Node0, I have:

$ cat /etc/condor/condor_config |grep NUM_CPUS
NUM_CPUS=2

How can I trace how I got 2 solt1@Node0 and 2 slot2@Node0?

I'm running condor_8.4.12-409562-ubuntu14_amd64

Thank you!!!

Roberto

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxx.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxx.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxx.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/