[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] log file indicates termination of job, but output file is empty !?!



Hi,

I am testing my condor pool by sending a large amount of jobs to it:

#---- Condor submit file
Universe   = Vanilla
Executable = sleeper.exe
should_transfer_files = YES
when_to_transfer_output = ON_EXIT

Requirements = (target.Arch == "INTEL") && (target.OpSys == "WINNT51")

output = 005/$(Cluster)_$(PROCESS).out
log = 005/$(Cluster)_$(PROCESS).log
log_xml = true

arguments = "5"
Queue 15000
#----


The 'arguments = "5"' tells the sleeper.exe to sleep for 5 minutes, so I know 
that this job will run for close to 5 minutes on a pool PC.

Most of the jobs complete nicely, giving the report in the .log file and its 
output in the .out file.

However, some jobs indicate that they have completed (see below), but the output 
file remains empty.
Notice that the "SentBytes" and "TotalSentBytes" at the end of the log file are 
both zero in this case!

Any idea why and how this happens?
Should I investigate further? If yes, how?

Thanks,
Rob.



<c>
    <a n="MyType"><s>SubmitEvent</s></a>
    <a n="EventTypeNumber"><i>0</i></a>
    <a n="EventTime"><s>2010-11-24T08:46:40</s></a>
    <a n="Cluster"><i>319</i></a>
    <a n="Proc"><i>4146</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="SubmitHost"><s>&lt;115.125.120.71:60614&gt;</s></a>
</c>
<c>
    <a n="MyType"><s>ExecuteEvent</s></a>
    <a n="EventTypeNumber"><i>1</i></a>
    <a n="EventTime"><s>2010-11-24T13:18:55</s></a>
    <a n="Cluster"><i>319</i></a>
    <a n="Proc"><i>4146</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="ExecuteHost"><s>&lt;115.145.228.43:1047&gt;</s></a>
</c>
<c>
    <a n="MyType"><s>JobImageSizeEvent</s></a>
    <a n="EventTypeNumber"><i>6</i></a>
    <a n="EventTime"><s>2010-11-24T13:19:03</s></a>
    <a n="Cluster"><i>319</i></a>
    <a n="Proc"><i>4146</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="Size"><i>756</i></a>
</c>
<c>
    <a n="MyType"><s>JobSuspendedEvent</s></a>
    <a n="EventTypeNumber"><i>10</i></a>
    <a n="EventTime"><s>2010-11-24T13:19:38</s></a>
    <a n="Cluster"><i>319</i></a>
    <a n="Proc"><i>4146</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="NumberOfPIDs"><i>1</i></a>
</c>
<c>
    <a n="MyType"><s>JobDisconnectedEvent</s></a>
    <a n="EventTypeNumber"><i>22</i></a>
    <a n="EventTime"><s>2010-11-24T15:19:38</s></a>
    <a n="Cluster"><i>319</i></a>
    <a n="Proc"><i>4146</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="StartdAddr"><s>&lt;115.145.228.43:1047&gt;</s></a>
    <a n="StartdName"><s>slot1@06-3</s></a>
    <a n="DisconnectReason"><s>Socket between submit and execute hosts closed 
unexpectedly</s></a>
    <a n="EventDescription"><s>Job disconnected, attempting to reconnect</s></a>
</c>
<c>
    <a n="MyType"><s>JobReconnectFailedEvent</s></a>
    <a n="EventTypeNumber"><i>24</i></a>
    <a n="EventTime"><s>2010-11-24T15:19:38</s></a>
    <a n="Cluster"><i>319</i></a>
    <a n="Proc"><i>4146</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="StartdName"><s>slot1@06-3</s></a>
    <a n="Reason"><s>Job disconnected too long: JobLeaseDuration (1200 seconds) 
expired</s></a>
    <a n="EventDescription"><s>Job reconnect impossible: rescheduling 
job</s></a>
</c>
<c>
    <a n="MyType"><s>ExecuteEvent</s></a>
    <a n="EventTypeNumber"><i>1</i></a>
    <a n="EventTime"><s>2010-11-24T15:19:54</s></a>
    <a n="Cluster"><i>319</i></a>
    <a n="Proc"><i>4146</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="ExecuteHost"><s>&lt;115.145.228.201:1045&gt;</s></a>
</c>
<c>
    <a n="MyType"><s>JobSuspendedEvent</s></a>
    <a n="EventTypeNumber"><i>10</i></a>
    <a n="EventTime"><s>2010-11-24T15:20:47</s></a>
    <a n="Cluster"><i>319</i></a>
    <a n="Proc"><i>4146</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="NumberOfPIDs"><i>1</i></a>
</c>
<c>
    <a n="MyType"><s>JobUnsuspendedEvent</s></a>
    <a n="EventTypeNumber"><i>11</i></a>
    <a n="EventTime"><s>2010-11-24T15:25:52</s></a>
    <a n="Cluster"><i>319</i></a>
    <a n="Proc"><i>4146</i></a>
    <a n="Subproc"><i>0</i></a>
</c>
<c>
    <a n="MyType"><s>JobEvictedEvent</s></a>
    <a n="EventTypeNumber"><i>4</i></a>
    <a n="EventTime"><s>2010-11-24T15:25:52</s></a>
    <a n="Cluster"><i>319</i></a>
    <a n="Proc"><i>4146</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="Checkpointed"><b v="f"/></a>
    <a n="RunLocalUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a>
    <a n="RunRemoteUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a>
    <a n="SentBytes"><r>0.000000000000000E+00</r></a>
    <a n="ReceivedBytes"><r>2.373600000000000E+04</r></a>
    <a n="TerminatedAndRequeued"><b v="f"/></a>
    <a n="TerminatedNormally"><b v="f"/></a>
</c>
<c>
    <a n="MyType"><s>ExecuteEvent</s></a>
    <a n="EventTypeNumber"><i>1</i></a>
    <a n="EventTime"><s>2010-11-24T15:26:00</s></a>
    <a n="Cluster"><i>319</i></a>
    <a n="Proc"><i>4146</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="ExecuteHost"><s>&lt;115.145.230.95:4146&gt;</s></a>
</c>
<c>
    <a n="MyType"><s>JobTerminatedEvent</s></a>
    <a n="EventTypeNumber"><i>5</i></a>
    <a n="EventTime"><s>2010-11-24T15:26:00</s></a>
    <a n="Cluster"><i>319</i></a>
    <a n="Proc"><i>4146</i></a>
    <a n="Subproc"><i>0</i></a>
    <a n="TerminatedNormally"><b v="t"/></a>
    <a n="RunLocalUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a>
    <a n="RunRemoteUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a>
    <a n="TotalLocalUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a>
    <a n="TotalRemoteUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a>
    <a n="SentBytes"><r>0.000000000000000E+00</r></a>
    <a n="ReceivedBytes"><r>2.373600000000000E+04</r></a>
    <a n="TotalSentBytes"><r>0.000000000000000E+00</r></a>
    <a n="TotalReceivedBytes"><r>4.747200000000000E+04</r></a>
</c>