[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] ProcAPI error



I noticed a few jobs in my 6.9.2pre cluster that had been evicted and
restarted even though the cluster is not full.  I finally found an error
message in the StarterLog for one of the jobs (see below).  It seems
that the job was restarted due to a 'ProcAPI short read' error.  Note
that the job had been running for just over 7 hours, which is the
expected normal runtime for this job.

What might be the cause for an error like this?

--Mike

7/23 00:16:23 ******************************************************
7/23 00:16:23 ** condor_starter (CONDOR_STARTER) STARTING UP
7/23 00:16:23 ** /share/apps/condor-6.9.2/sbin/condor_starter
7/23 00:16:23 ** $CondorVersion: 6.9.2 Jan 17 2007 PRE-RELEASE-UWCS $
7/23 00:16:23 ** $CondorPlatform: I386-LINUX_RHEL3 $
7/23 00:16:23 ** PID = 29667
7/23 00:16:23 ** Log last touched 7/23 00:16:21
7/23 00:16:23 ******************************************************
7/23 00:16:23 Using config source: /home/condor/condor_config7/23
00:16:23 Using local config sources:
7/23 00:16:23
/share/apps/condor/hosts/cithep184/condor_config.local7/23 00:16:23
DaemonCore: Command Socket at <10.255.255.201:40565>
7/23 00:16:23 Done setting resource limits7/23 00:16:23 Communicating
with shadow <10.255.255.216:46438>
7/23 00:16:23 Submitting machine is "gatekeeper-0-2.local"
7/23 00:16:24 File transfer completed successfully.7/23 00:16:25
Starting a VANILLA universe job with ID: 61742.0
7/23 00:16:25 IWD: /state/partition1/tmp/cithep184/execute/dir_296677/23
00:16:25 Output file:
/state/partition1/tmp/cithep184/execute/dir_29667/_condor_stdout
7/23 00:16:25 Error file:
/state/partition1/tmp/cithep184/execute/dir_29667/_condor_stderr
7/23 00:16:25 About to exec
/state/partition1/tmp/cithep184/execute/dir_29667/condor_exec.exe
7/23 00:16:25 Create_Process succeeded, pid=29671
7/23 07:27:27 ProcAPI: Unexpected short scan on /proc/11191/stat, errno: 3.
7/23 09:53:03 Process exited, pid=29671, status=0
7/23 09:53:03 Got SIGQUIT.  Performing fast shutdown.
7/23 09:53:03 ShutdownFast all jobs.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature