[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] BIND_ALL_INTERFACES not working?



I have a multi-homed central manager, also running schedd, with eth0 set to 192.168.1.254 (cmprivate.bestsystems.co.jp) and eth1 set to 172.16.10.191 (cmpublic.bestsystems.co.jp). Setting 'NETWORK_INTERFACE = 192.168.1.254' and 'BIND_ALL_INTERFACES = True' I can submit jobs from the CM and they execute OK.

If I try to execute 'condor_q -pool cmpublic.bestsystems.co.jp -name cmpublic.bestsystems.co.jp' from 172.16.10.70 (client.bestsystems.co.jp) where both cmpublic.bestsystems.co.jp and client.bestsystems.co.jp are defined in /etc/hosts of each machine I get the following error:

   $ condor_q -pool cmpublic.bestsystems.co.jp -name
   cmpublic.bestsystems.co.jp

   -- Failed to fetch ads from: <192.168.1.254:26758> :
   cmpublic.bestsystems.co.jp
   CEDAR:6001:Failed to connect to <192.168.1.254:26758>
   $

192.168.1.254 is defined as the ScheddIpAddr in the schedd classAd:

   $ condor_status -pool cmpublic.bestsystems.co.jp -schedd -l
   MyType = "Scheduler"
   TargetType = ""
   CondorVersion = "$CondorVersion: 6.7.18 Mar 22 2006 $"
   CondorPlatform = "$CondorPlatform: I386-LINUX_RH9 $"
   Machine = "cmpublic.bestsystems.co.jp"
   QuillEnabled = FALSE
   ScheddIpAddr = "<192.168.1.254:26758>"
   MyAddress = "<192.168.1.254:26758>"
   NumUsers = 0
   MaxJobsRunning = 200
   StartLocalUniverse = TRUE
   StartSchedulerUniverse = TRUE
   Name = "cmpublic.bestsystems.co.jp"
   VirtualMemory = 4192672
   TotalIdleJobs = 0
   TotalRunningJobs = 0
   TotalJobAds = 0
   TotalHeldJobs = 0
   TotalFlockedJobs = 0
   TotalRemovedJobs = 0
   MonitorSelfTime = 1147926852
   MonitorSelfCPUUsage = 0.000000
   MonitorSelfImageSize = 8304.000000
   MonitorSelfResidentSetSize = 3588
   MonitorSelfAge = 0
   WantResAd = TRUE
   DaemonStartTime = 1147926852
   UpdateSequenceNumber = 2
   ServerTime = 1147926997
   LastHeardFrom = 1147926997
   UpdatesTotal = 3
   UpdatesSequenced = 2
   UpdatesLost = 0
   UpdatesHistory = "0x00000000000000000000000000000000"


   $

So why isn't BIND_ALL_INTERFACES taking effect?

Now, if I change 'NETWORK_INTERFACE = 172.16.10.191' both condor_q executed on the CM and 'condor_q -pool cmpublic.bestsystems.co.jp -name cmpublic.bestsystems.co.jp' from client.bestsystems.co.jp work OK, but I cannot get jobs running. When I submit a job from the CM I get the error 'DaemonCore: PERMISSION DENIED to unknown user from host <172.16.10.191:26926> for command 493 (NEGOTIATE_WITH_SIGATTRS)' in the SchedLog and the following errors in the NegotiatorLog:

   5/18 13:53:52 Socket to <172.16.10.191:26882> not in cache, creating one
   5/18 13:53:52 NEGOTIATOR_TIMEOUT_MULTIPLIER is undefined, using
   default value of 0
   5/18 13:53:52 SEC_DEBUG_PRINT_KEYS is undefined, using default value
   of False
   5/18 13:53:52 SocketCache:  Found unused slot 0
   5/18 13:53:52 condor_write(): Socket closed when trying to write
   buffer, fd is 6
   5/18 13:53:52 Buf::write(): condor_write() failed
   5/18 13:53:52     Failed to send scheddName/eom
   5/18 13:53:52   Error: Ignoring schedd for this cycle
   5/18 13:53:52 ---------- Finished Negotiation Cycle ----------

So, where do I go from here?

The CM is running 6.7.18 on SuSE 9.3 and the client is installed with 6.7.17 on SuSE 8.2.

Cheers,
Andrew

--
Andrew Stubbings
BestSystems, Inc.
Tel: +81 29 860 7080
E-mail: ajs@xxxxxxxxxxxxxxxxx
www.bestsystems.co.jp