Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Ever-increasing userprio.

Date: Wed, 1 Feb 2012 10:59:44 -0600
From: Amy Bush <amy@xxxxxxxxxxxxx>
Subject: [Condor-users] Ever-increasing userprio.

Let me preface this with the obligatory: I just recently took over the
care and feeding of an established condor cluster when its previous
caretaker left. I came in knowing nearly nothing and have managed to
muddle my way through most problems so far. This one has me baffled,
though, and so far searching hasn't turned up anyone reporting a similar
problem, so I come to you guys.

Background:
Yesterday someone reported condor_q failing on one of our submit nodes.
A little investigation showed the scheduler wasn't running on said node,
and was segfaulting/core dumping each time condor was restarted.

After some poking and searching, eventually I followed someone's
brute-force advice and moved the spool job_queue.log out of the way.
After doing that, I was able to start the scheduler again successfully.

MEANwhile a user reported that he had discovered he had 909 jobs that
were in the X state, and he couldn't rm them, and it appeared he
couldn't do that because of the scheduler being down on this submit
node. Once it was back up, I successfully got rid of his X jobs.

However, this whole thing wreaked havoc on said user's userprio. I
manually set it back down to a lower value, and he seemed happy.

Today he reports that his userprio continues to climb, despite not
running any jobs. 

I've confirmed he's not running any jobs (at least according to
'condor_q -g -submitter'), and I've confirmed that his userprio keeps
growing.

I'm not even sure what to look for to solve this.

Hoping for something obvious that just grants credence to my claims of
ignorance, and will take any suggestions anyone might have.

Thanks!

--
amy

Follow-Ups:
- Re: [Condor-users] Ever-increasing userprio.
  - From: Matthew Farrellee

Prev by Date: [Condor-users] kbbd issues
Next by Date: Re: [Condor-users] kbbd issues
Previous by thread: Re: [Condor-users] kbbd issues
Next by thread: Re: [Condor-users] Ever-increasing userprio.
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] Ever-increasing userprio.