[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Completed job with no history file




Thank You Todd, You were right.

It turns out that these jobs are submitted using -spool, which implies:
<<For this case, the default _expression_ causes the job to be kept in the queue for 10 days after completion.>>

and in fact:
LeaveJobInQueue is set set to

ÂÂ JobStatus == 4 && (CompletionDate is undefined || CompletionDate == 0 || ((time() - CompletionDate) < 864000))

Apparently, the "10 days quarantine" cannot be altered, as the 864000 seems to be an "hardcoded" value (is it?)
but i wanted to shorten it to two days, so i wrote the following Job Transform rule:

JOB_TRANSFORM_InQueueTwoDays @=end
ÂÂ REQUIREMENTS True
ÂÂ if RegExp(" < 864000",unparse(LeaveJobInQueue))
ÂÂÂÂÂ SET LeaveJobInQueue "JobStatus == 4 && (CompletionDate is undefined || CompletionDate == 0 || ((time() - CompletionDate) < 172800))"
ÂÂ endif
@end

Which seems to work, however i don't like much the
if RegExp(" < 864000",unparse(LeaveJobInQueue))
part. Maybe i'm just missing a simpler check?Â

Thanks again
Stefano


On 20/10/20 22:14, Todd Tannenbaum wrote:
On 10/20/2020 9:47 AM, Stefano Dal Pra wrote:
Hello, condor 8.8.9 speaking

I noticed recently that there are done jobs which seem to disappear from the point of view of condor_history,
also leaving no history log file under PER_JOB_HISTORY_DIR.

Jobs do not enter into the history file(s) when they are completed, they enter the history file(s) when they leave the schedd database.

If you can see the job with condor_q, you will not see it with condor_history. And vice versa.

By default jobs are removed from the schedd whenever they enter the completed state (JobStatus==4) or removed state (JobStatus==3).Â

However this can be customized via the the "leave_in_queue" statement in the job submit file. See the condor_submit man page for details.

Looks like at your site something is setting leave_in_queue as follows, which means the job will stay in the schedd for 10 days in completed state,
and then after 10 days it will be written into the history file(s):

 LeaveJobInQueue = JobStatus == 4 && (CompletionDate =?= undefined || CompletionDate == 0 || ((time() - CompletionDate) < 864000))

Hope the above helps,
Todd






One example. This job completed apparently with no errors after running for ~ 26K seconds:

[root@sn-01 ~]# condor_q -name sn-01 9865068.0 -af:jln LastJobStatus JobStatus AcctGroup LastRemoteHost CpusProvisioned CumulativeRemoteUserCpu RemoteWallClockTime ExitBySignal ExitCode ExitStatus 'abstime(JobStartDate)' 'abstime(JobCurrentStartTransferOutputDate)' NumJobStarts NumJobCompletions ResidentSetSize_RAW 'abstime(x509UserProxyExpiration)' 'abstime(CompletionDate)'
ID = 9865068.0
ÂLastJobStatus = 2
ÂJobStatus = 4
ÂAcctGroup = virgo
ÂLastRemoteHost = slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
ÂCpusProvisioned = 2
ÂCumulativeRemoteUserCpu = 9302.0
ÂRemoteWallClockTime = 26025.0
ÂExitBySignal = false
ÂExitCode = 0
ÂExitStatus = 0
Âabstime(JobStartDate) = absTime("2020-10-17T01:58:58+02:00")
Âabstime(JobCurrentStartTransferOutputDate) = absTime("2020-10-17T09:12:42+02:00")
ÂNumJobStarts = 1
ÂNumJobCompletions = 1
ÂResidentSetSize_RAW = 4461780
Âabstime(x509UserProxyExpiration) = absTime("2020-10-17T12:11:11+02:00")
Âabstime(CompletionDate) = absTime("2020-10-17T09:12:43+02:00")


However:
[root@sn-01 ~]# condor_history -lim 1 -name sn-01 9865068.0
ÂIDÂÂÂÂ OWNERÂÂÂÂÂÂÂÂÂ SUBMITTEDÂÂ RUN_TIMEÂÂÂÂ ST COMPLETED CMD

Finally,
I assume an history job log file existing under $(PER_JOB_HISTORY_DIR).
Several files are there, but there is none (and other alike).

Any idea?
Thanks
Stefano


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


-- 
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685