[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Schedd refuse to start with - Detected unterminated log entry in ClassAd



Johnson koil Raj wrote:
Hi,
    My condor pool schedd is crashing suddenly. The schedd log are given
copied below. Spool directory is in nfs.

   what is unterminated log entry in job_queue.log, how it got added to
this file suddenly.

   Is user permission issue.


3/28 18:12:55 (pid:32360) -------- Done starting jobs --------
3/28 18:13:01 (pid:32360) Received TCP command 1111 (QMGMT_CMD) from
<192.168.111.5:9688>, access level READ
3/28 18:13:04 (pid:32360) ERROR "Failed to write real job queue log:
fsync failed (errno 28); no local backup available." at line 529 in file
log_transaction.cpp
3/28 18:13:04 (pid:32360) ScheddCronMgr: Bye
3/28 18:13:04 (pid:32360) CronMgr: bye
3/28 18:13:23 (pid:2266) passwd_cache::cache_uid(): getpwnam("condor")
failed: user not found
3/28 18:13:23 (pid:2266) passwd_cache::cache_uid(): getpwnam("condor")
failed: user not found
3/28 18:13:23 (pid:2266)

******************************************************
3/28 18:13:23 (pid:2266) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
.
3/28 18:13:23 (pid:2266) ** $CondorVersion: 7.2.0 Dec 19 2008 BuildID:
121001 $
3/28 18:13:23 (pid:2266) ** $CondorPlatform: I386-LINUX_RHEL5 $
.
..
...
3/28 18:13:28 (pid:2266) CronMgr: Doing config (initial)
3/28 18:13:32 (pid:2266) Detected unterminated log entry in ClassAd
Log /mail/condorvm/spool/job_queue.log. Forcing rotation.
3/28 18:13:32 (pid:2266) About to rotate ClassAd
log /mail/condorvm/spool/job_queue.log
3/28 18:13:32 (pid:2266) About to save historical
log /mail/condorvm/spool/job_queue.log.3
3/28 18:13:35 (pid:2266) Removed historical
log /mail/condorvm/spool/job_queue.log.2.
3/28 18:13:39 (pid:2266) ERROR "fsync
of /mail/condorvm/spool/job_queue.log failed, errno = 28" at line 635 in
file classad_log.cpp
3/28 18:13:39 (pid:2266) ScheddCronMgr: Bye
3/28 18:13:39 (pid:2266) CronMgr: bye
3/28 18:14:02 (pid:2272) passwd_cache::cache_uid(): getpwnam("condor")
failed: user not found
3/28 18:14:02 (pid:2272) passwd_cache::cache_uid(): getpwnam("condor")
failed: user not found


by
Johnson

$ grep 28 /usr/include/asm-generic/errno-base.h
#define	ENOSPC		28	/* No space left on device */

A strerror in classad_log.cpp may be in order.

Best,


matt