[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] the infamous question mark problem



On Wednesday 17 March 2010, Mag Gam wrote:
> last week we had a minor storage problem in our pool. From then on, we
> see a lot of '???????' for running host field when we do condor_q -run
> -direct schedd
>
> Is there a way to fix this? I see some jobs which it shows the proper
> hostname but I see a lot of '???????' is there a way to free up our
> condor pool?

Mag,

I assume that you know this already, but '???????' is what condor_q displays 
for ClassAd attributes that aren't in the ClassAd.  In your case, I'd *guess* 
that the job got evicted from the machine for some reason (without 
understanding your pool layout, it's difficult to speculate what a "minor 
storage problem" could cause), but are still in the "run" state...  This 
makes no sense and AFIK should never happen, but it nonetheless seems to be 
the case.

I think that you'll have to force the jobs to rematch to a new machine.  
Perhaps 'condor_vacate_job' could be used to accomplish this?

Hope this helps

-Nick

-- 
           <<< The matrix has you. >>>
 /`-_    Nicholas R. LeRoy               The Condor Project
{     }/ http://www.cs.wisc.edu/~nleroy  http://www.cs.wisc.edu/condor
 \    /  nleroy@xxxxxxxxxxx              The University of Wisconsin
 |_*_|   608-265-5761                    Department of Computer Sciences