[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] Struggling with Condor Flocking



Rajesh,
Thanks for ur quick response.Even I've corrected the FLOCK_FROM and FLOCK_TO
values I'm getting the same error.
HOSTALLOW entries in condor_config file as follwed...
HOSTALLOW_READ = cm.mygrid.com, cm1.mygrid.com
HOSTALLOW_WRITE = cm.mygrid.com, cm1.mygrid.com
And all the remaining I left as it is.(Since $(CONDOR_HOST), $(FLOCK_FROM)
etc are allready defined).
Any more idea..that where I'm wrong?

Thanks much
Sai

-----Original Message-----
From: Rajesh Rajamani [mailto:raj@xxxxxxxxxx]
Sent: Tuesday, June 21, 2005 9:51 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Struggling with Condor Flocking


Sai,
I think FLOCK_FROM on cm1.mygrid.com must be

FLOCK_FROM = *.mygrid.com

and NOT

FLOCK_FROM = *.hclgrid.com

Please also check that the FLOCK_TO variable is set in the condor config
files of the Schedd Machine.

Also, please make sure that the HOSTALLOW* entries in the config files
are set appropriately.  For details, consult
http://www.cs.wisc.edu/condor/manual/v6.7/5_2Connecting_Condor.html

Let me know if that helps.
-- 
Rajesh Rajamani
Senior Member of Technical Staff
Direct : +1.408.321.9000
Fax    : +1.408.904.5992
Mobile : +1.408.321.9030
raj@xxxxxxxxxx


Optena Corporation
2860 Zanker Road, Suite 201
San Jose, CA 95134
www.optena.com


This electronic transmission (and any attached documents) contains
information from Optena Corporation and is for the sole use of the
individual or entity it is addressed to. If you receive this message in
error, please notify me and destroy the attached message (and all
attached documents) immediately.


Aln Sai Srinivas - CTD, Chennai wrote:
> Hi
> I'm trying flocking between condor pools. I'm getting an error in
> CollectorLog "DC_AUTHENTICATE: attempt to open invalid session
> cm.mygrid:27485:1119363742:5, failing" where cm is the host name of
> CentralManager of a pool that flocks to.
> 	And flocking never happened.
> Here is the scenario...
> I'm using Redhat Linux  and condor 6.6.9
> There are two central managers  cm.mygrid.com and cm1.mygrid.com represent
> two condor polls respectively..
> condor_config is on shared file system.
> Here is the configuration 
> ======================================================================
> $LOCAL_DIR/condor_config.local for  cm.mygrid.com
> =======================================================================
> COLLECTOR = $(SBIN)/condor_collector
>  
> 
> DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, STARTD, SCHEDD
>  
> 
> COLLECTOR_NAME = Collector at alpha
>  
> 
> CONTINUE = True
>  
> 
> FILESYSTEM_DOMAIN = cm.mygrid.com
>  
> 
> FLOCK_FROM = *.hclgrid.com
>  
> 
> FLOCK_TO = cm1.mygrid.com
>  
> 
> PREEMPT = FALSE
>  
> 
> SUSPEND = FALSE
>  
> 
> LOCK = /tmp/condor-lock.$(HOSTNAME)0.885447545050742
>  
> 
> UID_DOMAIN = cm.mygrid.com
>  
> 
> NEGOTIATOR = $(SBIN)/condor_negotiator
> VACATE = FALSE
>  
> 
> CONDOR_ADMIN = root@xxxxxxxxxxxxx
>  
> 
> START = TRUE
>  
> 
> MAIL = /bin/mail
>  
> 
> CONDOR_IDS = 504.504
>  
> 
> RELEASE_DIR = /usr/local/condor
>  
> 
> CONDOR_HOST = cm.mygrid.com
>  
> 
> LOCAL_DIR = /usr/local/condor/local.$(HOSTNAME)
> 
> ======================================================================
> $LOCAL_DIR/condor_config.local for  cm1.mygrid.com
> =======================================================================
> COLLECTOR = $(SBIN)/condor_collector
>  
> 
> DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, STARTD, SCHEDD
>  
> 
> COLLECTOR_NAME = Collector at microgrid
>  
> 
> CONTINUE = True
>  
> 
> FILESYSTEM_DOMAIN = cm1.mygrid.com
>  
> 
> FLOCK_FROM = *.hclgrid.com
>  
> 
> FLOCK_TO = Not defined
>  
> 
> PREEMPT = FALSE
>  
> 
> SUSPEND = FALSE
>  
> 
> LOCK = /tmp/condor-lock.$(HOSTNAME)0.505710791637288
>  
> 
> UID_DOMAIN = cm1.mygrid.com
>  
> 
> NEGOTIATOR = $(SBIN)/condor_negotiator
>  
> 
> VACATE = FALSE
>  
> 
> CONDOR_ADMIN = root@xxxxxxxxxxxxxx
>  
> 
> START = TRUE
> MAIL = /bin/mail
>  
> 
> CONDOR_IDS = 504.504
>  
> 
> RELEASE_DIR = /usr/local/condor
>  
> 
> CONDOR_HOST = cm1.mygrid.com
>  
> 
> LOCAL_DIR = /usr/local/condor/local.$(HOSTNAME)
>
============================================================================
> ===
> CollectorLog at cm1.mygrid.com
>
============================================================================
> =====
> 6/21 20:03:42 WARNING:  No master ad for < cm.mygrid.com >
> 6/21 20:03:42 ScheddAd    : Inserting ** "< cm.mygrid.com , 10.100.207.10
>"
> 6/21 20:03:42 stats: Inserting new hashent for
> 'Schedd':'cm.mygrid.com':'10.100.207.10'
> 6/21 20:03:42 SubmittorAd  : Inserting ** "< condor@xxxxxxxxxxxxx ,
> 10.100.207.10 >"
> 6/21 20:03:42 stats: Inserting new hashent for
> 'Submittor':'condor@xxxxxxxxxxxxx':'10.100.207.10'
> 6/21 20:04:09 Got QUERY_STARTD_ADS
> 6/21 20:04:09 (Sent 1 ads in response to query)
> 6/21 20:06:33 (Sent 5 ads in response to query)
> 6/21 20:06:33 Got QUERY_STARTD_PVT_ADS
> 6/21 20:06:33 (Sent 1 ads in response to query)
>
============================================================================
> ===
> CollectorLog at cm.mygrid.com
>
============================================================================
> =====
> 6/21 20:02:21 DC_AUTHENTICATE: attempt to open invalid session
> alpha:27485:1119363742:5, failing.
> 6/21 20:03:21 SubmittorAd  : Inserting ** "< condor@xxxxxxxxxxxxx ,
> 10.100.207.10 >"
> 6/21 20:03:21 stats: Inserting new hashent for
> 'Submittor':'condor@xxxxxxxxxxxxx':'10.100.207.10'
> 6/21 20:03:21 (Sent 4 ads in response to query)
> 6/21 20:03:21 Got QUERY_STARTD_PVT_ADS
> 6/21 20:03:21 (Sent 1 ads in response to query)
> 6/21 20:03:41 (Sent 4 ads in response to query)
> 6/21 20:03:41 Got QUERY_STARTD_PVT_ADS
> 6/21 20:03:41 (Sent 1 ads in response to query)
> 6/21 20:03:49 DC_AUTHENTICATE: attempt to open invalid session
> alpha:27485:1119363829:6, failing.
> 
>
============================================================================
> ===
> ScheddLog at cm.mygrid.com
>
============================================================================
> =====
> 6/21 20:03:20 DaemonCore: Command received via UDP from host
> <10.100.207.10:32990>
> 6/21 20:03:20 DaemonCore: received command 421 (RESCHEDULE), calling
handler
> (reschedule_negotiator)
> 6/21 20:03:21 Sent ad to central manager for condor@xxxxxxxxxxxxx
> 6/21 20:03:21 Called reschedule_negotiator()
> 6/21 20:03:21 DaemonCore: Command received via TCP from host
> <10.100.207.10:41493>
> 6/21 20:03:21 DaemonCore: received command 416 (NEGOTIATE), calling
handler
> (negotiate)
> 6/21 20:03:21 Negotiating for owner: condor@xxxxxxxxxxxxx
> 6/21 20:03:21 Checking consistency running and runnable jobs
> 6/21 20:03:21 Tables are consistent
> 6/21 20:03:21 Out of jobs - 1 jobs matched, 0 jobs idle, flock level = 0
> 6/21 20:03:21 DaemonCore: Command received via UDP from host
> <10.100.207.10:32991>
> 6/21 20:03:21 DaemonCore: received command 421 (RESCHEDULE), calling
handler
> (reschedule_negotiator)
> 6/21 20:03:21 Called reschedule_negotiator()
> 6/21 20:03:23 Started shadow for job 558.0 on "<10.100.207.10:41475>",
> (shadow pid = 27559)
> 6/21 20:03:25 Sent ad to central manager for condor@xxxxxxxxxxxxx
> 6/21 20:03:42 Activity on stashed negotiator socket
> 6/21 20:03:42 Negotiating for owner: condor@xxxxxxxxxxxxx
> 6/21 20:03:42 Checking consistency running and runnable jobs
> 6/21 20:03:42 Tables are consistent
> 6/21 20:03:42 Out of servers - 0 jobs matched, 1 jobs idle, 1 jobs
rejected
> 6/21 20:03:42 Increasing flock level for condor@xxxxxxxxxxxxx to 1.
> 6/21 20:03:42 Sent ad to central manager for condor@xxxxxxxxxxxxx
> 6/21 20:04:34 Shadow pid 27559 for job 558.0 exited with status 100
> 6/21 20:04:34 Started shadow for job 559.0 on "<10.100.207.10:41475>",
> (shadow pid = 27569)
> 
> Could you plz help me where I'm missing..?
> 
> Regards
> Sai
> DISCLAIMER 
> This message and any attachment(s) contained here are information that is
confidential, proprietary to HCL Technologies 
> and its customers. Contents may be privileged or otherwise protected by
law. The information is solely intended for the 
> individual or the entity it is addressed to. If you are not the intended
recipient of this message, you are not authorized to 
> read, forward, print, retain, copy or disseminate this message or any part
of it. If you have received this e-mail in error, 
> please notify the sender immediately by return e-mail and delete it from
your computer
> 
> 
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 



_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users