[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condorview issues with Job Stats



 
Hi again Steve

Thanks for your responses and taking the time to help us.

>> But surely that info as seen by the collectors has to come from
>> Somewhere originally, i.e. from the submit schedd, or elsewhere

>It's coming from each of the submit schedd's advertising to
>its respective collector which then forwards to the view server.

OK, that makes sense. That is how I thought/assumed it should work.
That's why I can't understand what's wrong. As you say, the schedd
updates the collector on it's local central manager which forwards this
on to the view_server collector.

>>> If you want the condorview server to show all three of pools,
>>> then VIEW_SERVER on pool B and pool C should be set to be
>>> the same as the VIEW_SERVER on pool A.  You can, and many
>>> do, aggregate the output of many collectors into one VIEW_SERVER.
>>
>> This is our setup. Using the previous example we have all 3
>> collectors in pools A, B and C reporting to our only condorview server
>> which resides in pool A.
>
>From what you described in the original message, only pool A is
>in fact reporting to condorview. The other 2 are not.
>Check the logs of collector startup on B and C---if they are reporting
>it would say so.

In fact, it appears that only jobs running in the same pool as the
submit machine are getting correctly reported, regardless of which
pool the submit node is in. It is jobs that flock to another Pool 
that are not getting reported.

Perhaps my original email didn't describe things well.

All collectors in pools A, B and C are configured to report to the
view_server collector (which just happens to reside in pool A).

If a submit machine in A runs jobs in A then the view server reports
running jobs as expected.

If a submit machine in A runs jobs in B or C then the view server
does not report the jobs as running (condor_q show them running though).

If a submit machine in B runs jobs in B then the view server reports
the jobs as running as expected.

If a submit machine in B runs jobs in A or C then the view server
does not report then jobs as running (condor_q / schedd does).

The exact same trends occur for a submit machine in C.

We're about to bite the bullet and try updates with TCP, even though
the manual doesn't exactly sound encouraging! :)

Cheers

Greg