[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Problem with REPLICATION_USE_SHARED_PORT



I've forgot the condor version

condor-8.6.0-1.el6.x86_64

Cheers,
Andrea


On 10/03/2017 17:07, Andrea Sartirana wrote:
Hi,

at GRIF we are currently testing HAD and Replication.
Things work just fine when declaring special ports for HAD and REPLICATION. But, when setting REPLICATION_USE_SHARED_PORT to TRUE, the replication service refuses to start and I see errors like these in the Master log of the 2 master servers

03/10/17 17:01:26 ERROR: SharedPortEndpoint: failed to bind to 15f287e5db818c2dbce9638b70a6dc044992f0be80d2dc43848c983c1fc43fa5/MASTER: Address already in use 03/10/17 17:01:26 ERROR: Create_Process failed trying to start /usr/sbin/condor_replication
03/10/17 17:01:26 restarting /usr/sbin/condor_replication in 265 seconds

Below [1] my HAD/REPLICATION configuration.
.... What am I doing wrong?

Thanks,
Andrea

[1]
HAD_USE_SHARED_PORT = TRUE
REPLICATION_USE_SHARED_PORT = TRUE
REPLICATION_LIST = lpnhe-gs9088.in2p3.fr:$(SHARED_PORT_PORT) llrmpicream.in2p3.fr:$(SHARED_PORT_PORT) HAD_LIST = lpnhe-gs9088.in2p3.fr:$(SHARED_PORT_PORT) llrmpicream.in2p3.fr:$(SHARED_PORT_PORT)

HAD_CONTROLLEE          = NEGOTIATOR
HAD_CONNECTION_TIMEOUT = 10
HAD_USE_PRIMARY = true

DAEMON_LIST = $(DAEMON_LIST) HAD REPLICATION

HAD_USE_REPLICATION    = true

STATE_FILE = $(SPOOL)/Accountantnew.log

REPLICATION_INTERVAL                 = 300

MAX_TRANSFER_LIFETIME                = 300

HAD_UPDATE_INTERVAL = 300

MASTER_NEGOTIATOR_CONTROLLER    = HAD
MASTER_HAD_BACKOFF_CONSTANT     = 360