[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] strange condor_advertise behavior.



Hi,
I had the same problem and it seems to be the vintage of the 
condor_advertise binary which is the problem. If you update it will go 
away, or you can actually put MyAddress in the classad you try to 
advertise by hand.

Cheers,
Rod.

On Fri, 14 Oct 2005, Steven Timm wrote:

> 
> I have a simple shell script (attached) to forward a classad from
> a number of clusters to a central collector/negotiator, from there
> to do matchmaking with Condor-G.
> 
> On the first 2 clusters I tried, it worked and I can see the classAd.
> 
> It is executing the command
> 
> condor_advertise -pool fermigrid1.fnal.gov UPDATE_STARTD_AD GCclassad.txt
> 
> and the contents of GCclassad.txt look like this:
> 
> MyType = "Machine"
> Name = "fnpcg.fnal.gov:2119/jobmanager-condor"
> gatekeeper_url = "fnpcg.fnal.gov:2119/jobmanager-condor"
> TargetType = "Job"
> Requirements = TRUE
> Rank = 0.000000
> CurrentRank = 0.000000
> WantAdRevaluate = TRUE
> CurMatches = 0
> UpdateSequenceNumber = 1129319101
> gluehostapplicationsoftwareruntimeenvironment = "VO-atlas-release-9.0.3 
> VO-atlas
> -lcg-release-0.0.2"
> glueceinfohostname = "fnal.gov"
> gluesubclustername = "fnal.gov"
> gluecestatestatus = "Production"
> gluecepolicymaxcputime = 2880
> gluecepolicymaxwallclocktime = 2880
> glueceaccesscontrolbaserule = "VO:*"
> GlueCEStateTotalCPUs = 27
> gluecestatefreecpus = 0
> GlueCEStateRunningJobs =       0
> GlueCEStateWaitingJobs =       0
> gluecestateestimatedresponsetime = 0
> 
> So on the central collector/negotiator, condor_status looks like this:
> 
> 
> fngp-osg.fnal [?????????] [????] [????????] [???]  [??]   [Unknown]
> fnpcg.fnal.go [?????????] [????] [????????] [???]  [??]   [Unknown]
> vm1@fermigrid LINUX       INTEL  Unclaimed  Idle       0.000   997 
> 0+00:01:51
> vm2@fermigrid LINUX       INTEL  Unclaimed  Idle       0.490   997 
> 0+01:35:23
> vm3@fermigrid LINUX       INTEL  Unclaimed  Idle       0.000   997 
> 0+01:35:14
> vm4@fermigrid LINUX       INTEL  Unclaimed  Idle       0.000   997 
> 0+01:35:11
> 
>                       Machines Owner Claimed Unclaimed Matched Preempting
> 
>           INTEL/LINUX        4     0       0         4       0          0
> 
>                 Total        4     0       0         4       0          0
> 
>                      (Omitted 2 malformed ads in computed attribute totals)
> 
> -------------------------\\
> 
> If I do the following:
> 
> MyAddress = "<131.225.166.93:0>"
> LastHeardFrom = 1129319400
> UpdatesTotal = 4
> UpdatesSequenced = 0
> UpdatesLost = 0
> UpdatesHistory = "0x0000000000000000000000000000000
> 
> I see that the two classads which successfully are seen by the collector
> have a field called MyAddress appended to the classad, a field which
> is not in the classad file.'
> 
> There is a third node on which I am trying to run the same script.
> I do not see this one show up in the collector.  Instead I see:
> 
> 10/13 09:44:00 Got IP = '(null)'
> 10/13 09:44:00 No IP address in classAd
> 10/13 09:44:00 Error: Invalid StartAd
> 10/13 09:44:00 Could not make hashkey --- ignoring ad
> 10/13 09:44:00 Received malformed ad from command (0). Ignoring.
> 
> 
> I'm guessing from that, that the condor schedd on that node,
> which is an earlier version, 6.7.6, is configured slightly differently
> and is not including the MyAddress field in the classad for whatever 
> reason.
> 
> Any idea what the magic configuration tweak is to make it include
> MyAddress in the classad?  Thanks for any help.
> 
> Steve Timm
> 
> 

-- 
Rod Walker +1 6042913051