[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condorview issues with Job Stats



 
If you have a standalone condorview server the docs suggest that
you only need a collector running, not a negotiator as well?
We tried having the negotiator run as well but it appeared to make
no difference to what the condorview server was seeing.

We've done some more testing so perhaps a simplified example of
what's happening will bring some more suggestions/answers.

Condor setup with 3 pools, A, B and C. Pools located in 3 different
states (Australia), routers, etc. in between.

Condorview server in pool A.

Submit machine geographically in region A and in condor pool A.
Submit jobs configured to only run in pool A. All OK. Condorview
shows running jobs.

Submit machine geographically in region A and in condor pool A.
Submit jobs configured to only run in pool B (or C). Running jobs
not showing up in condorview stats/graphs. Show up OK as running
using condor_q on submit machine.

Submit machine geographically in region A and in condor pool B.
Submit jobs configured to only run in pool B. All OK. Condorview
shows running jobs.

Submit machine geographically in region A and in condor pool B.
Submit jobs configured to only run in pool A (or C). Running jobs
not showing up in condorview stats/graphs. Show up OK as running
using condor_q on submit machine.

Submit machine geographically in region A and in condor pool C.
Submit jobs configured to only run in pool C. All OK. Condorview
shows running jobs.

Submit machine geographically in region A and in condor pool C.
Submit jobs configured to only run in pool A (or B). Running jobs
not showing up in condorview stats/graphs. Show up OK as running
using condor_q on submit machine.

i.e. it appears as though jobs that have flocked to and are running
in a different pool to the one in which they were submitted are
not being "seen" by the condorview server, even though the submit
schedd knows that they are.

We realize that the condorview collector is just getting info
forwarded to it from the collectors on the central managers.
But what info and from what daemons is the info as to whether a job is
running coming from.

Thanks for any further insights you might have.

Cheers

Greg


-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Steven Timm
Sent: Friday, 13 February 2009 12:29 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Condorview issues with Job Stats

On Fri, 13 Feb 2009, Greg.Hitchen@xxxxxxxx wrote:

> Hi All
>
> Revisiting an issue that we've asked about before over the past couple
> of years but have never really solved. It relates to the User Job Statistics
> part of condorview.
>
> A first general question to the condor developers would be where is the
> condorview server getting this data from (obviously forwarded on from the
> central managers). Is it the schedd of the submitting nodes? That was my
> assumption, or are the starter and shadow involved as well.

The condorview server gets all of its statistics from the collector
and negotiator.  Schedd, starter, and shadow are not involved
at all.  Most of the information can be considered to be
snapshots of what you get from condor_userprio.

So it is counting jobs as running from the time the node is claimed
until the time the claim is released.   As such condorview
will never tell you how many independent jobs have started
and finished, only the aggregate hours used.

Steve Timm



>
> To illustrate the problem we are having I have attached to jpg's from our
> condorview machine when I was testing things by running 100 jobs that
> take 2 hours to run. In the first example (condor_test.jpg) jobs start
> running ~ 19:07 and the number of idle jobs drops rapidly straight away.
> However, there is a gap of ~30 mins before running jobs are seen. Even
> then there later appears a large gap in the red running jobs of ~1hr.
> The second example (condor_test1.jp), I restricted the jobs to our local pool,
> to eliminate issues due to routers, etc. as we have several pools in our
> organisation spread around the country in different states. The same problems
> of the red running jobs not showing up occurs, although in this case only ~20
> jobs run at a time because they are not being flocked to other pools.
>
> Sorry for the long email but we would like to sort this out, as to get the
> "correct" total number of job running hours we need to get the history
> file from each submitter, run them through condor_history, and manually
> figure it out in excel. This always gives number ~3-4X that shown in condorview.
>
> BTW our CMs and condorview server are linux and the submit and execute
> nodes are winxp.
>
> Thanks for any help/info.
>
> Cheers
>
> Greg
> ------------------------------------------------------------------------------------------------------
> Greg Hitchen                                                                         greg.hitchen@xxxxxxxx
> CSIRO IM&T Advanced Scientific Computing              phone: +61 8 6436 8663
> Australian Resources Research Centre (ARRC)             fax:       +61 8 6436 8555
> Postal address:                                                                     mob:          0407 952 748
> PO Box 1130, Bentley WA 6102, Australia
> Street Address:
> 26 Dick Perry Avenue, Kensington WA 6151
> -------------------------------------------------------------------------------------------------------
>

-- 
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/