[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] collecting job statistics



Hi all,

I trying to collect job statistics in an exit hook.
Since I have not yet found a way to collect job statistics, i.e. the
job's ClassAds, from the context of the job, I go somewhat of a detour.
So, effectively I am querying the collector for all schedds and query
these for the job I am on, to get the job's ClassAds:

    condorColl = htcondor.Collector()
    allSchedds = condorColl.locateAll(htcondor.DaemonTypes.Schedd)

    #loop over all schedds in the end ~~> for allSchedds:
    scheddAddrs =
condorColl.locate(htcondor.DaemonTypes.Schedd,allSchedds[1]['Name'])
    condorSchedd = htcondor.Schedd(scheddAddrs)
    jobAds = condorSchedd.query('GlobalJobId=?="%s"' % globalJobID)[0]
(currently hard-coded things for testing)


Since this is a bit cumbersome, to go upstream to query job statistics,
I wonder if there is a more direct way to get a job's ClassAds (run as
on an exit hook, so in a job's context I suppose)?


actually, I also tried to get the startd hosting the job directly via
allStartds = condorColl.locateAll(htcondor.DaemonTypes.Startd)
but since we have dynamic slotting, it is difficult to identify the
correct one from the range of resources and then there is no way to
query startds without a Startd class as compared to the Schedd class.


Cheers and thanks,
  Thomas

ps: on the way I noticed, that the Local{User,Sys}Cpu actually stays
zero for a running job at us - while the remote values change. Is there
something missing at us, that the statistics are not recorded/updated?

>>> jobAds['JobStatus']
2L
>>> jobAds['LocalUserCpu']
0.0
>>> jobAds['LocalSysCpu']
0.0
>>> jobAds['RemoteSysCpu']
201.0
>>> jobAds['RemoteUserCpu']
8986.0
>>> jobAds['RemoteWallClockTime']
0.0

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature