[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Completed job with no history file



Hi Stefano, couple of ideas for you.

First off: whenever the condor_schedd fails to write a history event,
we expect an error message to show up in the SchedLog. Can you take a
look for one of the following:

ERROR saving to history file ...
ERROR: failed to write job class ad to history file ...

Hopefully you'll find one of these errors explaining why the history
entry isn't getting saved.

Also, do you have a log file for one of the jobs that isn't appearing
in the history? I'm guessing there will be some clues related to how
the job exited that will point us in the right direction.

Mark


On Tue, Oct 20, 2020 at 9:47 AM Stefano Dal Pra
<stefano.dalpra@xxxxxxxxxxxx> wrote:
>
> Hello, condor 8.8.9 speaking
>
> I noticed recently that there are done jobs which seem to disappear from
> the point of view of condor_history,
> also leaving no history log file under PER_JOB_HISTORY_DIR.
>
> One example. This job completed apparently with no errors after running
> for ~ 26K seconds:
>
> [root@sn-01 ~]# condor_q -name sn-01 9865068.0 -af:jln LastJobStatus
> JobStatus AcctGroup LastRemoteHost CpusProvisioned
> CumulativeRemoteUserCpu RemoteWallClockTime ExitBySignal ExitCode
> ExitStatus 'abstime(JobStartDate)'
> 'abstime(JobCurrentStartTransferOutputDate)' NumJobStarts
> NumJobCompletions ResidentSetSize_RAW 'abstime(x509UserProxyExpiration)'
> 'abstime(CompletionDate)'
> ID = 9865068.0
>   LastJobStatus = 2
>   JobStatus = 4
>   AcctGroup = virgo
>   LastRemoteHost = slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>   CpusProvisioned = 2
>   CumulativeRemoteUserCpu = 9302.0
>   RemoteWallClockTime = 26025.0
>   ExitBySignal = false
>   ExitCode = 0
>   ExitStatus = 0
>   abstime(JobStartDate) = absTime("2020-10-17T01:58:58+02:00")
>   abstime(JobCurrentStartTransferOutputDate) =
> absTime("2020-10-17T09:12:42+02:00")
>   NumJobStarts = 1
>   NumJobCompletions = 1
>   ResidentSetSize_RAW = 4461780
>   abstime(x509UserProxyExpiration) = absTime("2020-10-17T12:11:11+02:00")
>   abstime(CompletionDate) = absTime("2020-10-17T09:12:43+02:00")
>
>
> However:
> [root@sn-01 ~]# condor_history -lim 1 -name sn-01 9865068.0
>   ID     OWNER          SUBMITTED   RUN_TIME     ST COMPLETED CMD
>
> Finally,
> I assume an history job log file existing under $(PER_JOB_HISTORY_DIR).
> Several files are there, but there is none (and other alike).
>
> Any idea?
> Thanks
> Stefano
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/



-- 
Mark Coatsworth
Systems Programmer
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin-Madison