Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor 6.9.2 hung schedd

Date: Wed, 13 Jun 2007 10:50:38 -0500
From: Dan Bradley <dan@xxxxxxxxxxxx>
Subject: Re: [Condor-users] Condor 6.9.2 hung schedd



Steffen Grunewald wrote:

On Mon, Jun 11, 2007 at 09:51:03AM -0500, Dan Bradley wrote:
It is normal for the schedd to temporarily show up as the user id of oneof the users with jobs in the queue, because the schedd switches userids in order to do some operations on the user's behalf.
However, it is not normal for the schedd to get stuck in this state. Tofind out what is going on, I would suggest using 'gdb' to see the scheddstack when it is in this state. Example:
$ gdb -p <pid of schedd>
(gdb) where
...
(gdb) quit
(gdb) where
#0  0x00002b46f6b2b69a in fcntl () from /lib/libc.so.6
#1  0x000000000058ee53 in flock ()
#2  0x0000000000665261 in lock_file ()
#3  0x000000000060d7e9 in FileLock::obtain ()
#4  0x00000000005c6685 in UserLog::writeEvent ()

Yes. It's the file locking problem that Todd referred to. Is the userlog for this user's job on NFS?

If I now find out how to remove the bad guys from the queue (I cannotwhile condor_schedd hangs, and if there are bad guys, condor_schedd will hangimmediately again)...

You could use condor_qedit to change the value of UserLog for theproblematic jobs and then remove them.


--Dan

Follow-Ups:
- Re: [Condor-users] Condor 6.9.2 hung schedd
  - From: Stuart Anderson

References:
- [Condor-users] Condor 6.9.2 hung schedd
  - From: Steffen Grunewald
- Re: [Condor-users] Condor 6.9.2 hung schedd
  - From: Dan Bradley
- Re: [Condor-users] Condor 6.9.2 hung schedd
  - From: Steffen Grunewald

Prev by Date: Re: [Condor-users] Condor 6.9.2 hung schedd
Next by Date: Re: [Condor-users] PERMISSION DENIED for command 60011 (DC_NOP)
Previous by thread: Re: [Condor-users] Condor 6.9.2 hung schedd
Next by thread: Re: [Condor-users] Condor 6.9.2 hung schedd
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Condor 6.9.2 hung schedd