[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Evictions - even when TESTINGMODE used



Dear All,

I am seeking to run an ampl job on a 12-way RS6000 SMP machine.

I have tried condor 6.6.10 and now 6.7.19 with identical (negative)
results.

First: I have set all the options to TESTINGMODE_* since we are
assuming full use of the machine.

Second, I have set the NUM_CPUS=24 (we normally run 24 of these jobs
in parallel on this machine outside of condor (have done for years),
it's just that now we might want to queue up 200 and only run 24 at a
time - hence one reason for using condor here (not the only one)).

Starting 23 jobs (there's a reason only 23 for now), all jobs queue
up fine and show up in condor_q with first I then R status. Of those,
only about up to 15 get going, (8 quickly, then the rest dribbling in
up to about 15 over a couple of minutes). 

**Then I get forcible evictions for no apparent reason.

The machine has 32GB of memory and plenty of VM. Each job comes in at
no more than about 1GB memory used (and I've even played around with
the MEMORY param to finesse that - just in case. No dice). (Is it
possible I'm seeing a runtime memory problem even tho' the matches
match OK?). tppas does not show that the core memory is being overly
taxed.

I have turned on full debugging in condor, so can provide the
appropriate logs.

I've included *some* snippets below in case they help immediately,
but by all means ask me for what you think will be helpful.

All help received. 

Kind regards

Derek Jones

------------------------- Included ---------------------------

MatchLog

5/15 14:08:40       Matched 231.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm1@poodle
5/15 14:08:40       Matched 232.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm2@poodle
5/15 14:08:40       Matched 233.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm3@poodle
5/15 14:09:01       Matched 234.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm4@poodle
5/15 14:09:01       Matched 235.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm5@poodle
5/15 14:09:01       Matched 236.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm6@poodle
5/15 14:09:01       Matched 237.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm7@poodle
5/15 14:09:01       Matched 238.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm8@poodle
5/15 14:09:01       Matched 239.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm9@poodle
5/15 14:09:01       Matched 240.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm10@poodle
5/15 14:09:01       Matched 241.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm20@poodle
5/15 14:09:01       Matched 242.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm11@poodle
5/15 14:09:01       Matched 243.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm21@poodle
5/15 14:09:01       Matched 244.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm12@poodle
5/15 14:09:01       Matched 245.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm22@poodle
5/15 14:09:01       Matched 246.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm13@poodle
5/15 14:09:01       Matched 247.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm23@poodle
5/15 14:09:01       Matched 248.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm14@poodle
5/15 14:09:01       Matched 249.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm15@poodle
5/15 14:09:01       Matched 250.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm24@poodle
5/15 14:09:01       Matched 251.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm16@poodle
5/15 14:09:01       Matched 252.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm17@poodle
5/15 14:09:01       Matched 253.0 derjones@poodle <192.168.1.1:35957>
preempting none <192.168.1.1:35956> vm18@poodle

Partial ShadowLog
5/15 14:08:42 ******************************************************
5/15 14:08:42 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:08:42 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:08:42 ** $CondorPlatform: PPC-AIX5 $
5/15 14:08:42 ** PID = 1691890
5/15 14:08:42 ** Log last touched 5/15 14:04:01
5/15 14:08:42 ******************************************************
5/15 14:08:42 Using config file: /home/condor/condor_config
5/15 14:08:42 Using local config files:
/home/condor/condor_config.local
5/15 14:08:42 DaemonCore: Command Socket at <192.168.1.1:36041>
5/15 14:08:42 Initializing a VANILLA shadow for job 231.0
5/15 14:08:42 (231.0) (1691890): Request to run on
<192.168.1.1:35956> was ACCEPTED
5/15 14:08:45 ******************************************************
5/15 14:08:45 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:08:45 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:08:45 ** $CondorPlatform: PPC-AIX5 $
5/15 14:08:45 ** PID = 1544236
5/15 14:08:45 ** Log last touched 5/15 14:08:42
5/15 14:08:45 ******************************************************
5/15 14:08:45 Using config file: /home/condor/condor_config
5/15 14:08:45 Using local config files:
/home/condor/condor_config.local
5/15 14:08:45 DaemonCore: Command Socket at <192.168.1.1:36074>
5/15 14:08:45 Initializing a VANILLA shadow for job 232.0
5/15 14:08:45 (232.0) (1544236): Request to run on
<192.168.1.1:35956> was ACCEPTED
5/15 14:08:46 ******************************************************
5/15 14:08:46 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:08:46 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:08:46 ** $CondorPlatform: PPC-AIX5 $
5/15 14:08:46 ** PID = 1572970
5/15 14:08:46 ** Log last touched 5/15 14:08:45
5/15 14:08:46 ******************************************************
5/15 14:08:46 Using config file: /home/condor/condor_config
5/15 14:08:46 Using local config files:
/home/condor/condor_config.local
5/15 14:08:46 DaemonCore: Command Socket at <192.168.1.1:36079>
5/15 14:08:46 Initializing a VANILLA shadow for job 233.0
5/15 14:08:47 (233.0) (1572970): Request to run on
<192.168.1.1:35956> was ACCEPTED
5/15 14:08:53 (231.0) (1691890): Update ad:
MyType = "(unknown type)"
TargetType = "(unknown type)"
DiskUsage = 1
RemoteSysCpu = 100282639
RemoteUserCpu = 582475063
ImageSize = 8118272
JobState = "Running"
NumPids = 3
JobPid = 1917036
JobStartDate = 1147716524
5/15 14:08:53 (231.0) (1691890): --- End of ClassAd ---
5/15 14:08:58 (233.0) (1572970): Update ad:
MyType = "(unknown type)"
TargetType = "(unknown type)"
DiskUsage = 1
RemoteSysCpu = 141414204
RemoteUserCpu = 1118371886
ImageSize = 24162304
JobState = "Running"
NumPids = 4
JobPid = 1974360
JobStartDate = 1147716529
5/15 14:08:58 (233.0) (1572970): --- End of ClassAd ---
5/15 14:08:59 (232.0) (1544236): Update ad:
MyType = "(unknown type)"
TargetType = "(unknown type)"
DiskUsage = 1
RemoteSysCpu = 135754753
RemoteUserCpu = 1211042557
ImageSize = 20013056
JobState = "Running"
NumPids = 4
JobPid = 757996
JobStartDate = 1147716529
5/15 14:08:59 (232.0) (1544236): --- End of ClassAd ---
5/15 14:09:05 ******************************************************
5/15 14:09:05 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:05 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:05 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:05 ** PID = 1470692
5/15 14:09:05 ** Log last touched 5/15 14:08:59
5/15 14:09:05 ******************************************************
5/15 14:09:05 Using config file: /home/condor/condor_config
5/15 14:09:05 Using local config files:
/home/condor/condor_config.local
5/15 14:09:05 DaemonCore: Command Socket at <192.168.1.1:36115>
5/15 14:09:05 Initializing a VANILLA shadow for job 234.0
5/15 14:09:07 ******************************************************
5/15 14:09:07 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:07 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:07 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:07 ** PID = 1269956
5/15 14:09:07 ** Log last touched 5/15 14:09:05
5/15 14:09:07 ******************************************************
5/15 14:09:07 Using config file: /home/condor/condor_config
5/15 14:09:07 Using local config files:
/home/condor/condor_config.local
5/15 14:09:07 DaemonCore: Command Socket at <192.168.1.1:36117>
5/15 14:09:07 Initializing a VANILLA shadow for job 235.0
5/15 14:09:09 ******************************************************
5/15 14:09:09 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:09 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:09 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:09 ** PID = 1863880
5/15 14:09:09 ** Log last touched 5/15 14:09:07
5/15 14:09:09 ******************************************************
5/15 14:09:09 Using config file: /home/condor/condor_config
5/15 14:09:09 Using local config files:
/home/condor/condor_config.local
5/15 14:09:09 DaemonCore: Command Socket at <192.168.1.1:36119>
5/15 14:09:09 Initializing a VANILLA shadow for job 236.0
5/15 14:09:11 ******************************************************
5/15 14:09:11 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:11 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:11 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:11 ** PID = 1032264
5/15 14:09:11 ** Log last touched 5/15 14:09:09
5/15 14:09:11 ******************************************************
5/15 14:09:11 Using config file: /home/condor/condor_config
5/15 14:09:11 Using local config files:
/home/condor/condor_config.local
5/15 14:09:11 DaemonCore: Command Socket at <192.168.1.1:36121>
5/15 14:09:11 Initializing a VANILLA shadow for job 237.0
5/15 14:09:13 ******************************************************
5/15 14:09:13 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:13 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:13 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:13 ** PID = 1192078
5/15 14:09:13 ** Log last touched 5/15 14:09:11
5/15 14:09:13 ******************************************************
5/15 14:09:13 Using config file: /home/condor/condor_config
5/15 14:09:13 Using local config files:
/home/condor/condor_config.local
5/15 14:09:13 DaemonCore: Command Socket at <192.168.1.1:36123>
5/15 14:09:13 Initializing a VANILLA shadow for job 238.0
5/15 14:09:15 (234.0) (1470692): Request to run on
<192.168.1.1:35956> was ACCEPTED
5/15 14:09:15 ** PID = 1474672
5/15 14:09:15 ** Log last touched 5/15 14:09:15
5/15 14:09:15 ******************************************************
5/15 14:09:15 Using config file: /home/condor/condor_config
5/15 14:09:15 Using local config files:
/home/condor/condor_config.local
5/15 14:09:15 DaemonCore: Command Socket at <192.168.1.1:36126>
5/15 14:09:15 Initializing a VANILLA shadow for job 239.0
5/15 14:09:18 ******************************************************
5/15 14:09:18 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:18 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:18 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:18 ** PID = 1073284
5/15 14:09:18 ** Log last touched 5/15 14:09:15
5/15 14:09:18 ******************************************************
5/15 14:09:18 Using config file: /home/condor/condor_config
5/15 14:09:18 Using local config files:
/home/condor/condor_config.local
5/15 14:09:18 DaemonCore: Command Socket at <192.168.1.1:36131>
5/15 14:09:18 Initializing a VANILLA shadow for job 240.0
5/15 14:09:19 ******************************************************
5/15 14:09:19 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:19 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:19 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:19 ** PID = 2224236
5/15 14:09:19 ** Log last touched 5/15 14:09:18
5/15 14:09:19 ******************************************************
5/15 14:09:19 Using config file: /home/condor/condor_config
5/15 14:09:19 Using local config files:
/home/condor/condor_config.local
5/15 14:09:19 DaemonCore: Command Socket at <192.168.1.1:36133>
5/15 14:09:19 Initializing a VANILLA shadow for job 242.0
5/15 14:09:21 ******************************************************
5/15 14:09:21 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:21 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:21 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:21 ** PID = 1441818
5/15 14:09:21 ** Log last touched 5/15 14:09:19
5/15 14:09:21 ******************************************************
5/15 14:09:21 Using config file: /home/condor/condor_config
5/15 14:09:21 Using local config files:
/home/condor/condor_config.local
5/15 14:09:21 DaemonCore: Command Socket at <192.168.1.1:36135>
5/15 14:09:21 Initializing a VANILLA shadow for job 241.0
5/15 14:09:21 (235.0) (1269956): Request to run on
<192.168.1.1:35956> was ACCEPTED
5/15 14:09:23 ******************************************************
5/15 14:09:23 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:23 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:23 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:23 ** PID = 839862
5/15 14:09:23 ** Log last touched 5/15 14:09:21
5/15 14:09:23 ******************************************************
5/15 14:09:23 Using config file: /home/condor/condor_config
5/15 14:09:23 Using local config files:
/home/condor/condor_config.local
5/15 14:09:23 DaemonCore: Command Socket at <192.168.1.1:36142>
5/15 14:09:23 Initializing a VANILLA shadow for job 244.0
5/15 14:09:25 ******************************************************
5/15 14:09:25 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:25 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:25 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:25 ** PID = 1601756
5/15 14:09:25 ** Log last touched 5/15 14:09:23
5/15 14:09:25 ******************************************************
5/15 14:09:25 Using config file: /home/condor/condor_config
5/15 14:09:25 Using local config files:
/home/condor/condor_config.local
5/15 14:09:25 DaemonCore: Command Socket at <192.168.1.1:36144>
5/15 14:09:25 Initializing a VANILLA shadow for job 243.0
5/15 14:09:27 ******************************************************
5/15 14:09:27 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:27 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:27 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:27 ** PID = 1527808
5/15 14:09:27 ** Log last touched 5/15 14:09:25
5/15 14:09:27 ******************************************************
5/15 14:09:27 Using config file: /home/condor/condor_config
5/15 14:09:27 Using local config files:
/home/condor/condor_config.local
5/15 14:09:27 DaemonCore: Command Socket at <192.168.1.1:36146>
5/15 14:09:27 Initializing a VANILLA shadow for job 246.0
5/15 14:09:27 (236.0) (1863880): Request to run on
<192.168.1.1:35956> was ACCEPTED
5/15 14:09:29 ******************************************************
5/15 14:09:29 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:29 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:29 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:29 ** PID = 1077304
5/15 14:09:29 ** Log last touched 5/15 14:09:27
5/15 14:09:29 ******************************************************
5/15 14:09:29 Using config file: /home/condor/condor_config
5/15 14:09:29 Using local config files:
/home/condor/condor_config.local
5/15 14:09:29 DaemonCore: Command Socket at <192.168.1.1:36151>
5/15 14:09:29 Initializing a VANILLA shadow for job 245.0
5/15 14:09:31 ******************************************************
5/15 14:09:31 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:31 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:31 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:31 ** PID = 1491102
5/15 14:09:31 ** Log last touched 5/15 14:09:29
5/15 14:09:31 ******************************************************
5/15 14:09:31 Using config file: /home/condor/condor_config
5/15 14:09:31 Using local config files:
/home/condor/condor_config.local
5/15 14:09:31 DaemonCore: Command Socket at <192.168.1.1:36154>
5/15 14:09:31 Initializing a VANILLA shadow for job 248.0
5/15 14:09:33 ******************************************************
5/15 14:09:33 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:33 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:33 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:33 ** PID = 1876176
5/15 14:09:33 ** Log last touched 5/15 14:09:31
5/15 14:09:33 ******************************************************
5/15 14:09:33 Using config file: /home/condor/condor_config
5/15 14:09:33 Using local config files:
/home/condor/condor_config.local
5/15 14:09:33 DaemonCore: Command Socket at <192.168.1.1:36156>
5/15 14:09:33 Initializing a VANILLA shadow for job 247.0
5/15 14:09:33 (238.0) (1192078): condor_read(): timeout reading
buffer.
5/15 14:09:33 (238.0) (1192078): IO: Failed to read packet header
5/15 14:09:34 (237.0) (1032264): Request to run on
<192.168.1.1:35956> was ACCEPTED
5/15 14:09:35 ******************************************************
5/15 14:09:35 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:35 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:35 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:35 ** PID = 1356012
5/15 14:09:35 ** Log last touched 5/15 14:09:34 
5/15 14:09:35 ******************************************************
5/15 14:09:35 Using config file: /home/condor/condor_config
5/15 14:09:35 Using local config files:
/home/condor/condor_config.local
5/15 14:09:35 DaemonCore: Command Socket at <192.168.1.1:36164>
5/15 14:09:36 Initializing a VANILLA shadow for job 250.0
5/15 14:09:38 ******************************************************
5/15 14:09:38 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:38 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:38 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:38 ** PID = 1175694
5/15 14:09:38 ** Log last touched 5/15 14:09:36 
5/15 14:09:38 ******************************************************
5/15 14:09:38 Using config file: /home/condor/condor_config
5/15 14:09:38 Using local config files:
/home/condor/condor_config.local
5/15 14:09:38 DaemonCore: Command Socket at <192.168.1.1:36168>
5/15 14:09:38 (240.0) (1073284): condor_read(): timeout reading
buffer.
5/15 14:09:38 (240.0) (1073284): IO: Failed to read packet header
5/15 14:09:38 Initializing a VANILLA shadow for job 249.0
5/15 14:09:39 ******************************************************
5/15 14:09:39 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:39 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:39 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:39 ** PID = 1413362
5/15 14:09:39 ** Log last touched 5/15 14:09:38
5/15 14:09:39 ******************************************************
5/15 14:09:39 Using config file: /home/condor/condor_config
5/15 14:09:39 Using local config files:
/home/condor/condor_config.local
5/15 14:09:39 DaemonCore: Command Socket at <192.168.1.1:36171>
5/15 14:09:39 (242.0) (2224236): condor_read(): timeout reading
buffer.
5/15 14:09:39 (242.0) (2224236): IO: Failed to read packet header
5/15 14:09:39 Initializing a VANILLA shadow for job 251.0
5/15 14:09:41 ******************************************************
5/15 14:09:41 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:41 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:41 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:41 ** PID = 831622
5/15 14:09:41 ** Log last touched 5/15 14:09:39
5/15 14:09:41 ******************************************************
5/15 14:09:41 Using config file: /home/condor/condor_config
5/15 14:09:41 Using local config files:
/home/condor/condor_config.local
5/15 14:09:41 DaemonCore: Command Socket at <192.168.1.1:36174>
5/15 14:09:41 (241.0) (1441818): condor_read(): timeout reading
buffer.
5/15 14:09:41 (241.0) (1441818): IO: Failed to read packet header
5/15 14:09:41 Initializing a VANILLA shadow for job 252.0
5/15 14:09:43 ******************************************************
5/15 14:09:43 ** condor_shadow (CONDOR_SHADOW) STARTING UP
5/15 14:09:43 ** $CondorVersion: 6.7.19 May 10 2006 $
5/15 14:09:43 ** $CondorPlatform: PPC-AIX5 $
5/15 14:09:43 ** PID = 1048688
5/15 14:09:43 ** Log last touched 5/15 14:09:41
5/15 14:09:43 ******************************************************
5/15 14:09:43 Using config file: /home/condor/condor_config
5/15 14:09:43 Using local config files:
/home/condor/condor_config.local
5/15 14:09:43 DaemonCore: Command Socket at <192.168.1.1:36178>
5/15 14:09:43 (244.0) (839862): condor_read(): timeout reading
buffer.
5/15 14:09:43 (244.0) (839862): IO: Failed to read packet header
5/15 14:09:43 Initializing a VANILLA shadow for job 253.0
5/15 14:09:44 (239.0) (1474672): Request to run on
<192.168.1.1:35956> was ACCEPTED
5/15 14:09:45 (243.0) (1601756): condor_read(): timeout reading
buffer.
5/15 14:09:45 (243.0) (1601756): IO: Failed to read packet header
5/15 14:09:46 (234.0) (1470692): Update ad:
MyType = "(unknown type)"
TargetType = "(unknown type)"
DiskUsage = 1
RemoteSysCpu = 106714252
RemoteUserCpu = 728073418
ImageSize = 9162752
JobState = "Running"
NumPids = 3
JobPid = 1650746
JobStartDate = 1147716577
5/15 14:09:46 (234.0) (1470692): --- End of ClassAd ---
5/15 14:09:47 (246.0) (1527808): condor_read(): timeout reading
buffer.
5/15 14:09:47 (246.0) (1527808): IO: Failed to read packet header
5/15 14:09:49 (245.0) (1077304): condor_read(): timeout reading
buffer.
5/15 14:09:49 (245.0) (1077304): IO: Failed to read packet header
5/15 14:09:52 (235.0) (1269956): Update ad:
MyType = "(unknown type)"
TargetType = "(unknown type)"
DiskUsage = 1 
RemoteSysCpu = 116030986
RemoteUserCpu = 270382805
ImageSize = 14016512
JobState = "Running"
NumPids = 3
JobPid = 1753326 
JobStartDate = 1147716583
5/15 14:09:52 (235.0) (1269956): --- End of ClassAd ---
5/15 14:09:53 (238.0) (1192078): condor_read(): timeout reading
buffer.
5/15 14:09:53 (238.0) (1192078): IO: Failed to read packet header
5/15 14:09:53 (238.0) (1192078): DCStartd::activateClaim: Failed to
receive reply from <192.168.1.1:35956>
5/15 14:09:53 (238.0) (1192078): Job 238.0 is being evicted
5/15 14:09:53 (247.0) (1876176): condor_read(): timeout reading
buffer.
5/15 14:09:53 (247.0) (1876176): IO: Failed to read packet header
5/15 14:09:56 (250.0) (1356012): condor_read(): timeout reading
buffer.
5/15 14:09:56 (250.0) (1356012): IO: Failed to read packet header
5/15 14:09:58 (240.0) (1073284): condor_read(): timeout reading
buffer.
5/15 14:09:58 (240.0) (1073284): IO: Failed to read packet header
5/15 14:09:58 (240.0) (1073284): DCStartd::activateClaim: Failed to
receive reply from <192.168.1.1:35956>
5/15 14:09:58 (240.0) (1073284): Job 240.0 is being evicted
5/15 14:09:58 (249.0) (1175694): condor_read(): timeout reading
buffer.
5/15 14:09:58 (249.0) (1175694): IO: Failed to read packet header
5/15 14:09:58 (248.0) (1491102): Request to run on
<192.168.1.1:35956> was ACCEPTED
5/15 14:09:58 (236.0) (1863880): Update ad:
MyType = "(unknown type)" 
TargetType = "(unknown type)"
DiskUsage = 1 
RemoteSysCpu = 144096735
RemoteUserCpu = 1099941943 
ImageSize = 15753216
JobState = "Running"
NumPids = 3
JobPid = 1712302 
JobStartDate = 1147716589
5/15 14:09:58 (236.0) (1863880): --- End of ClassAd ---
5/15 14:09:59 (242.0) (2224236): condor_read(): timeout reading
buffer.
5/15 14:09:59 (242.0) (2224236): IO: Failed to read packet header
5/15 14:09:59 (242.0) (2224236): DCStartd::activateClaim: Failed to
receive reply from <192.168.1.1:35956>
5/15 14:09:59 (242.0) (2224236): Job 242.0 is being evicted
5/15 14:09:59 (251.0) (1413362): condor_read(): timeout reading
buffer.
5/15 14:09:59 (251.0) (1413362): IO: Failed to read packet header
5/15 14:10:01 (241.0) (1441818): condor_read(): timeout reading
buffer.
5/15 14:10:01 (241.0) (1441818): IO: Failed to read packet header
5/15 14:10:01 (241.0) (1441818): DCStartd::activateClaim: Failed to
receive reply from <192.168.1.1:35956>
5/15 14:10:01 (241.0) (1441818): Job 241.0 is being evicted
5/15 14:10:01 (252.0) (831622): condor_read(): timeout reading
buffer.
5/15 14:10:01 (252.0) (831622): IO: Failed to read packet header
5/15 14:10:03 (244.0) (839862): condor_read(): timeout reading
buffer.
5/15 14:10:03 (244.0) (839862): IO: Failed to read packet header
5/15 14:10:03 (244.0) (839862): DCStartd::activateClaim: Failed to
receive reply from <192.168.1.1:35956>
5/15 14:10:03 (244.0) (839862): Job 244.0 is being evicted
5/15 14:10:03 (253.0) (1048688): condor_read(): timeout reading
buffer.
5/15 14:10:03 (253.0) (1048688): IO: Failed to read packet header
5/15 14:10:05 (243.0) (1601756): condor_read(): timeout reading
buffer.
5/15 14:10:05 (243.0) (1601756): IO: Failed to read packet header
5/15 14:10:05 (243.0) (1601756): DCStartd::activateClaim: Failed to
receive reply from <192.168.1.1:35956>
5/15 14:10:05 (243.0) (1601756): Job 243.0 is being evicted
5/15 14:10:06 (237.0) (1032264): Update ad:
....
5/15 14:10:38 (249.0) (1175694): condor_read(): timeout reading
buffer.
5/15 14:10:38 (249.0) (1175694): IO: Failed to read packet header
5/15 14:10:38 (249.0) (1175694): logEvictEvent with unknown reason
(108), aborting
5/15 14:10:38 (249.0) (1175694): **** condor_shadow (condor_SHADOW)
EXITING WITH STATUS 108
5/15 14:10:39 (251.0) (1413362): condor_read(): timeout reading
buffer.
5/15 14:10:39 (251.0) (1413362): IO: Failed to read packet header
5/15 14:10:39 (251.0) (1413362): logEvictEvent with unknown reason
(108), aborting
5/15 14:10:39 (251.0) (1413362): **** condor_shadow (condor_SHADOW)
EXITING WITH STATUS 108
5/15 14:10:41 (252.0) (831622): condor_read(): timeout reading
buffer.
5/15 14:10:41 (252.0) (831622): IO: Failed to read packet header
5/15 14:10:41 (252.0) (831622): logEvictEvent with unknown reason
(108), aborting
5/15 14:10:41 (252.0) (831622): **** condor_shadow (condor_SHADOW)
EXITING WITH STATUS 108
5/15 14:10:43 (253.0) (1048688): condor_read(): timeout reading
buffer.
5/15 14:10:43 (253.0) (1048688): IO: Failed to read packet header
5/15 14:10:43 (253.0) (1048688): logEvictEvent with unknown reason
(108), aborting
5/15 14:10:43 (253.0) (1048688): **** condor_shadow (condor_SHADOW)
EXITING WITH STATUS 108


Partial StartLog

5/15 14:11:19 condor_write(): Socket closed when trying to write
buffer, fd is 5
5/15 14:11:19 Buf::write(): condor_write() failed
5/15 14:11:19 vm21: Can't send eom to shadow.
5/15 14:11:19 vm8: State change: received RELEASE_CLAIM command
5/15 14:11:19 vm8: Changing state and activity: Claimed/Idle ->
Preempting/Vacating
5/15 14:11:19 vm8: State change: No preempting claim, returning to
owner 
5/15 14:11:19 vm8: Changing state and activity: Preempting/Vacating
-> Owner/Idle
5/15 14:11:19 vm8: State change: IS_OWNER is false
5/15 14:11:19 vm8: Changing state: Owner -> Unclaimed
5/15 14:11:19 condor_write(): Socket closed when trying to write
buffer, fd is 5
5/15 14:11:19 Buf::write(): condor_write() failed
5/15 14:11:19 SECMAN: Error sending response classad!
5/15 14:11:19 Warning: can't find resource with ClaimId
(<192.168.1.1:35956>#1147716252#8)
5/15 14:11:19 vm13: Got activate_claim request from shadow
(<192.168.1.1:36190>)