Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] the infamous question mark problem

Date: Sat, 27 Mar 2010 10:07:19 -0700
From: dalonso <dalonso@xxxxxxxxxxxxxxxx>
Subject: Re: [Condor-users] the infamous question mark problem

Back in January I also submitted a query about this problem (*1). We"solved" it by backing off to condor 6.4.8 (from 7.4.0). I'm in theprocess of upgrading to 7.4.1, and wondering if setting:

NEGOTIATOR_INFORM_STARTD = False         (*2)


from will be the fix?
 **** should this be set just in ~/etc/condor.config?


(*1)From: dalonso <dalonso@xxxxxxxxxxxxxxxx>
Date: January 26, 2010 10:15:14 AM PST
To: condor-users@xxxxxxxxxxx
Subject: claimed slots are idle

(*2) from From: Dan Bradley <dan@xxxxxxxxxxxx>
Date: February 24, 2010 7:56:12 AM PST
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>

Subject: Re: [Condor-users] Condor 7.2.4 / 7.4.1 — "Can't findresource with ClaimId" errors from startd



On Mar 26, 2010, at 3:47 PM, Mag Gam wrote:

OK, I think I am hitting this problem here:
https://lists.cs.wisc.edu/archive/condor-users/2005-March/msg00379.shtml


I see the same exact symptoms and I just rebooted a grid node and it
says its "Claimed" but Activiy is "Idle" and there is nothing running
on that box.
I think I need to setup multiple schedulers -- couple of questions:
Can I run multiple schedulers on the same box? My box is a 16core -
96GB RAM system.





On Fri, Mar 26, 2010 at 1:21 PM, Mag Gam <magawake@xxxxxxxxx> wrote:
On Fri, Mar 26, 2010 at 12:44 PM, Nick LeRoy <nleroy@xxxxxxxxxxx>wrote:
Mag,
Once over 1000 jobs hit the pool, I start to see the questionmarks.
Is there some setting I can look at to fix this?
Just had a discussion here about this, and we have a number ofquestions..
1. What version of Condor are you running? A recent performanceenhancement
could possibly be malfunctioning and causing the problems.
The version we are running is 7.2.4
2. Do you know what the jobs are doing during these "events"? Isthere apattern to them? For example, when you run your 'condor_q -run',do yousometimes see all jobs good, and on other runs a grouping of'??????' jobs?
These jobs are heterogeneous. Some of them are using a simple awk,
perl, R, and Octave.
3. I think that it'd be helpful if you could post the following:
3a. job log snippet(s) around the window in which you've seen theproblem
3b. ShadowLog snippet(s) of the same

Finally, some observations and a window into our thoughts:

1. When you run 'condor_q -run', it's equivalent to running:
 condor_q -const 'JobStatus==2' -format ...
I will try this when the problem occurs. This usually occurs when the
other department lets us use their systems for overnight simulations.
2. It's possible that there's a race condition in which the job'sstatus(JobStatus) has been set to RUNNING (2) without the RemoteHostattribute beingset. This should never happen, but it obviously is. The answersto the above
questions may help us to isolate how this is happening.

Thanks Mag,

-Nick

--
          <<< Welcome to the real world. >>>
 /`-_    Nicholas R. LeRoy               The Condor Project
{     }/ http://www.cs.wisc.edu/~nleroy  http://www.cs.wisc.edu/condor
 \    /  nleroy@xxxxxxxxxxx              The University of Wisconsin
|_*_| 608-265-5761 Department of ComputerSciences
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxxwith a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



Darwin O.V. Alonso
dalonso@xxxxxxxxxxxxxxxx
Dept. Biochem. J558(HSB)
University of Washington
1705 NE Pacific St
Seattle WA 98195-7350

References:
- [Condor-users] the infamous question mark problem
  - From: Mag Gam
- Re: [Condor-users] the infamous question mark problem
  - From: Mag Gam
- Re: [Condor-users] the infamous question mark problem
  - From: Mag Gam
- Re: [Condor-users] the infamous question mark problem
  - From: Nick LeRoy
- Re: [Condor-users] the infamous question mark problem
  - From: Mag Gam
- Re: [Condor-users] the infamous question mark problem
  - From: Mag Gam

Prev by Date: Re: [Condor-users] Gracefully stopping DAGMAN
Next by Date: Re: [Condor-users] Commercial Condor Providers
Previous by thread: Re: [Condor-users] the infamous question mark problem
Next by thread: [Condor-users] condor_submit hangs / condor_q hangs
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] the infamous question mark problem