[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] "Corrupt" Condor logs - any ideas?



I'm seeing some Condor logs that aren't getting written properly, and really don't even know where to begin diagnosing the issue.

One problem is that the logs will occasionally miss the "initial tag" (I on't know if it has a better name). See below how the first entry is missing the leading "005"

...
(11999.126.000) 10/12 13:35:08 Job terminated.
	(1) Normal termination (return value 0)
		Usr 0 00:02:19, Sys 0 00:00:03  -  Run Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
		Usr 0 00:02:19, Sys 0 00:00:03  -  Total Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
	0  -  Run Bytes Sent By Job
	0  -  Run Bytes Received By Job
	0  -  Total Bytes Sent By Job
	0  -  Total Bytes Received By Job
...
005 (11999.124.000) 10/12 13:35:08 Job terminated.
	(1) Normal termination (return value 0)
		Usr 0 00:01:59, Sys 0 00:00:07  -  Run Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
		Usr 0 00:01:59, Sys 0 00:00:07  -  Total Remote Usage
		Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
	0  -  Run Bytes Sent By Job
	0  -  Run Bytes Received By Job
	0  -  Total Bytes Sent By Job
	0  -  Total Bytes Received By Job
...


Another problem is it looks like two log entries will get intermixed. See how the second entry below is ... strange.


...
005 (11999.231.000) 10/12 13:42:20 Job terminated.
(1) Normal termination (return value 0)
Usr 0 00:01:17, Sys 0 00:00:01 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:01:17, Sys 0 00:00:01 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
0 - Run Bytes Sent By Job
0 - Run Bytes Received By Job
0 - Total Bytes Sent By Job
0 - Total Bytes Received By Job
...
005 (11999.217.000) 10/12 13:42:21 Job terminated.
(1) Normal termination (return value 0)
Usr 0 00:02:21, Sys 0 00:00:03 - Run Re Usr 0 00:01 Usr 0 00:00:00, Sys 0 00:00:00 - Run Lo Usr 0 00:0 Usr 0 00:02:21, Sys 0 00:00:03 - Total 0 - Run Byte Usr 0 00:00:0 0 - Run Bytes Received By Job
...
ge
0 - Run Bytes Sent By Job
0 - Run Bytes Received By Job
0 - Total Bytes Sent By Job
0 - Total Bytes Received By Job
...



ANY ideas on how to even begin diagnosing this?