[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Struggling with Condor Flocking



Hi
I'm trying flocking between condor pools. I'm getting an error in
CollectorLog "DC_AUTHENTICATE: attempt to open invalid session
cm.mygrid:27485:1119363742:5, failing" where cm is the host name of
CentralManager of a pool that flocks to.
	And flocking never happened.
Here is the scenario...
I'm using Redhat Linux  and condor 6.6.9
There are two central managers  cm.mygrid.com and cm1.mygrid.com represent
two condor polls respectively..
condor_config is on shared file system.
Here is the configuration 
======================================================================
$LOCAL_DIR/condor_config.local for  cm.mygrid.com
=======================================================================
COLLECTOR = $(SBIN)/condor_collector
 

DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, STARTD, SCHEDD
 

COLLECTOR_NAME = Collector at alpha
 

CONTINUE = True
 

FILESYSTEM_DOMAIN = cm.mygrid.com
 

FLOCK_FROM = *.hclgrid.com
 

FLOCK_TO = cm1.mygrid.com
 

PREEMPT = FALSE
 

SUSPEND = FALSE
 

LOCK = /tmp/condor-lock.$(HOSTNAME)0.885447545050742
 

UID_DOMAIN = cm.mygrid.com
 

NEGOTIATOR = $(SBIN)/condor_negotiator
VACATE = FALSE
 

CONDOR_ADMIN = root@xxxxxxxxxxxxx
 

START = TRUE
 

MAIL = /bin/mail
 

CONDOR_IDS = 504.504
 

RELEASE_DIR = /usr/local/condor
 

CONDOR_HOST = cm.mygrid.com
 

LOCAL_DIR = /usr/local/condor/local.$(HOSTNAME)

======================================================================
$LOCAL_DIR/condor_config.local for  cm1.mygrid.com
=======================================================================
COLLECTOR = $(SBIN)/condor_collector
 

DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, STARTD, SCHEDD
 

COLLECTOR_NAME = Collector at microgrid
 

CONTINUE = True
 

FILESYSTEM_DOMAIN = cm1.mygrid.com
 

FLOCK_FROM = *.hclgrid.com
 

FLOCK_TO = Not defined
 

PREEMPT = FALSE
 

SUSPEND = FALSE
 

LOCK = /tmp/condor-lock.$(HOSTNAME)0.505710791637288
 

UID_DOMAIN = cm1.mygrid.com
 

NEGOTIATOR = $(SBIN)/condor_negotiator
 

VACATE = FALSE
 

CONDOR_ADMIN = root@xxxxxxxxxxxxxx
 

START = TRUE
MAIL = /bin/mail
 

CONDOR_IDS = 504.504
 

RELEASE_DIR = /usr/local/condor
 

CONDOR_HOST = cm1.mygrid.com
 

LOCAL_DIR = /usr/local/condor/local.$(HOSTNAME)
============================================================================
===
CollectorLog at cm1.mygrid.com
============================================================================
=====
6/21 20:03:42 WARNING:  No master ad for < cm.mygrid.com >
6/21 20:03:42 ScheddAd    : Inserting ** "< cm.mygrid.com , 10.100.207.10 >"
6/21 20:03:42 stats: Inserting new hashent for
'Schedd':'cm.mygrid.com':'10.100.207.10'
6/21 20:03:42 SubmittorAd  : Inserting ** "< condor@xxxxxxxxxxxxx ,
10.100.207.10 >"
6/21 20:03:42 stats: Inserting new hashent for
'Submittor':'condor@xxxxxxxxxxxxx':'10.100.207.10'
6/21 20:04:09 Got QUERY_STARTD_ADS
6/21 20:04:09 (Sent 1 ads in response to query)
6/21 20:06:33 (Sent 5 ads in response to query)
6/21 20:06:33 Got QUERY_STARTD_PVT_ADS
6/21 20:06:33 (Sent 1 ads in response to query)
============================================================================
===
CollectorLog at cm.mygrid.com
============================================================================
=====
6/21 20:02:21 DC_AUTHENTICATE: attempt to open invalid session
alpha:27485:1119363742:5, failing.
6/21 20:03:21 SubmittorAd  : Inserting ** "< condor@xxxxxxxxxxxxx ,
10.100.207.10 >"
6/21 20:03:21 stats: Inserting new hashent for
'Submittor':'condor@xxxxxxxxxxxxx':'10.100.207.10'
6/21 20:03:21 (Sent 4 ads in response to query)
6/21 20:03:21 Got QUERY_STARTD_PVT_ADS
6/21 20:03:21 (Sent 1 ads in response to query)
6/21 20:03:41 (Sent 4 ads in response to query)
6/21 20:03:41 Got QUERY_STARTD_PVT_ADS
6/21 20:03:41 (Sent 1 ads in response to query)
6/21 20:03:49 DC_AUTHENTICATE: attempt to open invalid session
alpha:27485:1119363829:6, failing.

============================================================================
===
ScheddLog at cm.mygrid.com
============================================================================
=====
6/21 20:03:20 DaemonCore: Command received via UDP from host
<10.100.207.10:32990>
6/21 20:03:20 DaemonCore: received command 421 (RESCHEDULE), calling handler
(reschedule_negotiator)
6/21 20:03:21 Sent ad to central manager for condor@xxxxxxxxxxxxx
6/21 20:03:21 Called reschedule_negotiator()
6/21 20:03:21 DaemonCore: Command received via TCP from host
<10.100.207.10:41493>
6/21 20:03:21 DaemonCore: received command 416 (NEGOTIATE), calling handler
(negotiate)
6/21 20:03:21 Negotiating for owner: condor@xxxxxxxxxxxxx
6/21 20:03:21 Checking consistency running and runnable jobs
6/21 20:03:21 Tables are consistent
6/21 20:03:21 Out of jobs - 1 jobs matched, 0 jobs idle, flock level = 0
6/21 20:03:21 DaemonCore: Command received via UDP from host
<10.100.207.10:32991>
6/21 20:03:21 DaemonCore: received command 421 (RESCHEDULE), calling handler
(reschedule_negotiator)
6/21 20:03:21 Called reschedule_negotiator()
6/21 20:03:23 Started shadow for job 558.0 on "<10.100.207.10:41475>",
(shadow pid = 27559)
6/21 20:03:25 Sent ad to central manager for condor@xxxxxxxxxxxxx
6/21 20:03:42 Activity on stashed negotiator socket
6/21 20:03:42 Negotiating for owner: condor@xxxxxxxxxxxxx
6/21 20:03:42 Checking consistency running and runnable jobs
6/21 20:03:42 Tables are consistent
6/21 20:03:42 Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs rejected
6/21 20:03:42 Increasing flock level for condor@xxxxxxxxxxxxx to 1.
6/21 20:03:42 Sent ad to central manager for condor@xxxxxxxxxxxxx
6/21 20:04:34 Shadow pid 27559 for job 558.0 exited with status 100
6/21 20:04:34 Started shadow for job 559.0 on "<10.100.207.10:41475>",
(shadow pid = 27569)

Could you plz help me where I'm missing..?

Regards
Sai
DISCLAIMER 
This message and any attachment(s) contained here are information that is confidential, proprietary to HCL Technologies 
and its customers. Contents may be privileged or otherwise protected by law. The information is solely intended for the 
individual or the entity it is addressed to. If you are not the intended recipient of this message, you are not authorized to 
read, forward, print, retain, copy or disseminate this message or any part of it. If you have received this e-mail in error, 
please notify the sender immediately by return e-mail and delete it from your computer