Re: [Condor-users] the infamous question mark problem

On Fri, Mar 26, 2010 at 12:44 PM, Nick LeRoy <nleroy@xxxxxxxxxxx> wrote:
> Mag,
>> Once over 1000 jobs hit the pool, I start to see the question marks.
>> Is there some setting I can look at to fix this?
> Just had a discussion here about this, and we have a number of questions..
> 1. What version of Condor are you running?  A recent performance enhancement
> could possibly be malfunctioning and causing the problems.

The version we are running is 7.2.4

> 2. Do you know what the jobs are doing during these "events"?  Is there a
> pattern to them?  For example, when you run your 'condor_q -run', do you
> sometimes see all jobs good, and on other runs a grouping of '??????' jobs?

These jobs are heterogeneous. Some of them are using a simple awk,
perl, R, and Octave.

> 3. I think that it'd be helpful if you could post the following:
> 3a. job log snippet(s) around the window in which you've seen the problem
> 3b. ShadowLog snippet(s) of the same
> Finally, some observations and a window into our thoughts:
> 1. When you run 'condor_q -run', it's equivalent to running:
>  condor_q -const 'JobStatus==2' -format ...

I will try this when the problem occurs. This usually occurs when the
other department lets us use their systems for overnight simulations.

> 2. It's possible that there's a race condition in which the job's status
> (JobStatus) has been set to RUNNING (2) without the RemoteHost attribute being
> set.  This should never happen, but it obviously is.  The answers to the above
> questions may help us to isolate how this is happening.
> Thanks Mag,
> -Nick
