[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_status -schedd issue



On Thu, Dec 16, 2010 at 9:52 AM, Michael O'Donnell <odonnellm@xxxxxxxx> wrote:

I am running Condor 7.4.3 on a windows pool and when I execute condor_status -schedd it returns inconsistent results. Some times the output is accurate and other times it returns that no jobs are running.

This is expected -- the collector's view of the scheduler status is delayed at best, mostly wrong at worst. It's sort of a light-weight, unobtrusive way to check on running jobs, and it's definitely *not* an authoritative source for running/idle/queued numbers. The schedulers update the collector this this information periodically, so it does go stale.

The only way to get the accurate, instanteous picture is to use Quill++. You can round-robin query the schedulers with 'condor_q -name <scheduler>' but even that information can be out of date by the time you parse it. Quill++ history snapshots should be pretty darn accurate for any point in time that has occurred that isn't this precise moment.
 
Sometimes it also includes machines that are not running a schedd (not a submit machine). Has anyone else seen this.

Now that's weird. That I've never seen.

- Ian