[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor 6.9.2 hung schedd



On Mon, Jun 11, 2007 at 09:51:03AM -0500, Dan Bradley wrote:
> 
> It is normal for the schedd to temporarily show up as the user id of one 
> of the users with jobs in the queue, because the schedd switches user 
> ids in order to do some operations on the user's behalf.
> 
> However, it is not normal for the schedd to get stuck in this state.  To 
> find out what is going on, I would suggest using 'gdb' to see the schedd 
> stack when it is in this state.  Example:
> 
> $ gdb -p <pid of schedd>
> (gdb) where
> ...
> (gdb) quit

(gdb) where
#0  0x00002b46f6b2b69a in fcntl () from /lib/libc.so.6
#1  0x000000000058ee53 in flock ()
#2  0x0000000000665261 in lock_file ()
#3  0x000000000060d7e9 in FileLock::obtain ()
#4  0x00000000005c6685 in UserLog::writeEvent ()
#5  0x00000000004d23b8 in Scheduler::WriteReleaseToUserLog ()
#6  0x00000000004d6568 in Scheduler::actOnJobs ()
#7  0x0000000000572889 in DaemonCore::HandleReq ()
#8  0x000000000056f732 in DaemonCore::HandleReq ()
#9  0x000000000056f197 in DaemonCore::Driver ()
#10 0x000000000057c4d9 in main ()

Does this give you more information?

If I now find out how to remove the bad guys from the queue (I cannot 
while condor_schedd hangs, and if there are bad guys, condor_schedd will hang 
immediately again)...

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html