Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Malformed ClassAd
- Date: Thu, 14 Jul 2005 00:27:11 -0700 (PDT)
- From: Rod Walker <rwalker@xxxxxx>
- Subject: Re: [Condor-users] Malformed ClassAd
Steve,
I wouldn`t mind knowing the answer myself, but I can assure you it doesn`t
harm the matchmaking in any way. In the LCG collector, all 763 classads
are malformed.
The stats table at the bottom of the condor_status output is fairly
meaningless for gatekeeper classads, and in particular the "Claimed"
mechanism doesn`t make sense when the 'machine' is a 500 node cluster. So
I presume this is why the classad is deemed malformed.
Incidentally, this Claimed issue is also what prevents the accounting and
fairshare working for grid resources. I`m told it should be fairly
straightforward to weight Claimed by the number of cpus being used.
As for the question marks in the output
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
CE.pakgrid.or [?????????] [????] [????????] [???] [??] [Unknown]
these would be filled in if you had corresponding attributes in your
classad, although again not all of them make sense for grid resources.
Cheers,
Rod.
On Wed, 13 Jul 2005, Steven Timm wrote:
>
> I am trying to make a ClassAd for condor-G based matching.
>
> Machine A: to be the machine which receives the classads from N different
> different clusters B1...BN
>
> The idea is that you submit a condor-G job on machineA, and it does
> matching based on the classAds and forwards the job respectively.
>
> I copied a shell script that someone else had, and modified
> the script to make what I thought was a good condor ClassAd:
>
> Contents of Classad for cluster B1:
>
> [timm@fermigrid1 ~]$ condor_status -long
> fngp-osg.fnal.gov:2119/jobmanager-condor
> MyType = "Machine"
> TargetType = "Job"
> Name = "fngp-osg.fnal.gov:2119/jobmanager-condor"
> gatekeeper_url = "fngp-osg.fnal.gov:2119/jobmanager-condor"
> Requirements = TRUE
> Rank = 0.000000
> CurrentRank = 0.000000
> WantAdRevaluate = TRUE
> CurMatches = 0
> UpdateSequenceNumber = 1121277498
> gluehostapplicationsoftwareruntimeenvironment = "VO-atlas-release-9.0.3
> VO-atlas-lcg-release-0.0.2"
> glueceinfohostname = "fnal.gov"
> gluesubclustername = "fnal.gov"
> gluecestatestatus = "Production"
> gluecepolicymaxcputime = 2880
> gluecepolicymaxwallclocktime = 2880
> glueceaccesscontrolbaserule = "VO:*"
> GlueCEStateTotalCPUs = 80
> gluecestatefreecpus = 0
> GlueCEStateRunningJobs = 26
> GlueCEStateWaitingJobs = 0
> gluecestateestimatedresponsetime = 0
> MyAddress = "<131.225.167.42:0>"
> LastHeardFrom = 1121277499
> UpdatesTotal = 1
> UpdatesSequenced = 0
> UpdatesLost = 0
> UpdatesHistory = "0x00000000000000000000000000000000"
>
> The ClassAd is sent to machineA via condor_advertise.
> (above is the output of condor_status -long).
> MachineA sees the ClassAD but claims that it's malformed.
>
>
> [timm@fermigrid1 ~]$ condor_status
>
> Name OpSys Arch State Activity LoadAv Mem
> ActvtyTime
>
> fngp-osg.fnal [?????????] [????] [????????] [???] [??] [Unknown]
> vm1@fermigrid LINUX INTEL Claimed Busy 1.170 997
> 0+02:30:58
> vm2@fermigrid LINUX INTEL Claimed Busy 1.420 997
> 0+01:22:47
> vm3@fermigrid LINUX INTEL Claimed Busy 1.170 997
> 0+16:14:03
> vm4@fermigrid LINUX INTEL Claimed Busy 1.170 997
> 0+16:14:03
>
> Machines Owner Claimed Unclaimed Matched Preempting
>
> INTEL/LINUX 4 0 4 0 0 0
>
> Total 4 0 4 0 0 0
>
> (Omitted 1 malformed ads in computed attribute totals)
>
>
> So 3 questions:
>
> 1) Is it legal and/or advisable to try to have both job
> execution slots from a startd, and a pool ad, in the same condor pool,
> as I have above... e.g., condor_status shows 1 remote cluster and 4
> cpu's on this machine
>
> 2) what's malformed about the classad as included above?
>
> 3) Is there a shortcut condor mechanism to have condor itself create
> the classad for condor_g type matching.
>
> Steve
>
>
>
> --------------------------------------------------------------------
> Steven C. Timm, Ph.D (630) 840-8525 timm@xxxxxxxx http://home.fnal.gov/~timm/
> Fermilab Computing Div/Core Support Services Dept./Scientific Computing Section
> Assistant Group Leader, Farms and Clustered Systems Group
> Lead of Computing Farms Team
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
--
Rod Walker +1 6042913051