[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Malformed ClassAd



Steve,
I wouldn`t mind knowing the answer myself, but I can assure you it doesn`t 
harm the matchmaking in any way. In the LCG collector, all 763 classads 
are malformed.
The stats table at the bottom of the condor_status output is fairly 
meaningless for gatekeeper classads, and in particular the "Claimed" 
mechanism doesn`t make sense when the 'machine' is a 500 node cluster. So 
I presume this is why the classad is deemed malformed.

Incidentally, this Claimed issue is also what prevents the accounting and 
fairshare working for grid resources. I`m told it should be fairly 
straightforward to weight Claimed by the number of cpus being used.

As for the question marks in the output
Name    OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

CE.pakgrid.or [?????????] [????] [????????] [???]  [??]   [Unknown]

these would be filled in if you had corresponding attributes in your 
classad, although again not all of them make sense for grid resources.

Cheers,
Rod.


On Wed, 13 Jul 2005, Steven Timm wrote:

> 
> I am trying to make a ClassAd for condor-G based matching.
> 
> Machine A: to be the machine which receives the classads from N different
> different clusters B1...BN
> 
> The idea is that you submit a condor-G job on machineA, and it does
> matching based on the classAds and forwards the job respectively.
> 
> I copied a shell script that someone else had, and modified
> the script to make what I thought was a good condor ClassAd:
> 
> Contents of Classad for cluster B1:
> 
> [timm@fermigrid1 ~]$ condor_status -long 
> fngp-osg.fnal.gov:2119/jobmanager-condor
> MyType = "Machine"
> TargetType = "Job"
> Name = "fngp-osg.fnal.gov:2119/jobmanager-condor"
> gatekeeper_url = "fngp-osg.fnal.gov:2119/jobmanager-condor"
> Requirements = TRUE
> Rank = 0.000000
> CurrentRank = 0.000000
> WantAdRevaluate = TRUE
> CurMatches = 0
> UpdateSequenceNumber = 1121277498
> gluehostapplicationsoftwareruntimeenvironment = "VO-atlas-release-9.0.3 
> VO-atlas-lcg-release-0.0.2"
> glueceinfohostname = "fnal.gov"
> gluesubclustername = "fnal.gov"
> gluecestatestatus = "Production"
> gluecepolicymaxcputime = 2880
> gluecepolicymaxwallclocktime = 2880
> glueceaccesscontrolbaserule = "VO:*"
> GlueCEStateTotalCPUs = 80
> gluecestatefreecpus = 0
> GlueCEStateRunningJobs = 26
> GlueCEStateWaitingJobs = 0
> gluecestateestimatedresponsetime = 0
> MyAddress = "<131.225.167.42:0>"
> LastHeardFrom = 1121277499
> UpdatesTotal = 1
> UpdatesSequenced = 0
> UpdatesLost = 0
> UpdatesHistory = "0x00000000000000000000000000000000"
> 
> The ClassAd is sent to machineA via condor_advertise.
> (above is the output of condor_status -long).
> MachineA sees the ClassAD but claims that it's malformed.
> 
> 
> [timm@fermigrid1 ~]$ condor_status
> 
> Name          OpSys       Arch   State      Activity   LoadAv Mem 
> ActvtyTime
> 
> fngp-osg.fnal [?????????] [????] [????????] [???]  [??]   [Unknown]
> vm1@fermigrid LINUX       INTEL  Claimed    Busy       1.170   997 
> 0+02:30:58
> vm2@fermigrid LINUX       INTEL  Claimed    Busy       1.420   997 
> 0+01:22:47
> vm3@fermigrid LINUX       INTEL  Claimed    Busy       1.170   997 
> 0+16:14:03
> vm4@fermigrid LINUX       INTEL  Claimed    Busy       1.170   997 
> 0+16:14:03
> 
>                       Machines Owner Claimed Unclaimed Matched Preempting
> 
>           INTEL/LINUX        4     0       4         0       0          0
> 
>                 Total        4     0       4         0       0          0
> 
>                      (Omitted 1 malformed ads in computed attribute totals)
> 
> 
> So 3 questions:
> 
> 1) Is it legal and/or advisable to try to have both job
> execution slots from a startd, and a pool ad, in the same condor pool,
> as I have above... e.g., condor_status shows 1 remote cluster and 4
> cpu's on this machine
> 
> 2) what's malformed about the classad as included above?
> 
> 3) Is there a shortcut condor mechanism to have condor itself create
> the classad for condor_g type matching.
> 
> Steve
> 
> 
> 
> --------------------------------------------------------------------
> Steven C. Timm, Ph.D  (630) 840-8525  timm@xxxxxxxx  http://home.fnal.gov/~timm/
> Fermilab Computing Div/Core Support Services Dept./Scientific Computing Section
> Assistant Group Leader, Farms and Clustered Systems Group
> Lead of Computing Farms Team
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 

-- 
Rod Walker +1 6042913051