[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] HTCondor-CE not purging finished jobs



Hello,
htcondor-ce-3.4.0-1.el7.noarch here.

We have a problem common to all of our CEs:

[root@ce02-htc ~]# condor_ce_q -cons '(JobStatus == 5 ) && (time() - x509UserProxyExpiration > 4 * 3600)' -af Owner | sort | uniq -c
   9592 user1
      4 user2
   1114 user3
    575 user4
     44 user5

I have set up REMOVE  and REMOVE REASON rule:
SYSTEM_PERIODIC_REMOVE = (JobStatus == 5 && CurrentTime - EnteredCurrentStatus > 3600*8) SYSTEM_PERIODIC_REMOVE_REASON = strcat("CE job removed by SYSTEM_PERIODIC_REMOVE due to ", ifThenElse((JobStatus == 5 && CurrentTime - EnteredCurrentStatus > 3600*8), "being in the hold state for 8 hours.", ifThenElse((JobStatus == 5 && isUndefined(RoutedToJobId)), "non-existent route or entry in JOB_ROUTER_ENTRIES.", "input files missing." ) ) )

Inspecting these "non purged jobs", they have a RemoveReason set, but they are not gone nevertheless:

[root@ce02-htc ~]# condor_ce_q 1679707.0 -af JobStatus RemoveReason
5 CE job removed by SYSTEM_PERIODIC_REMOVE due to being in the hold state for 8 hours.

Until now i have no better way than removing these jobs manually using somethin like: condor_ce_q -cons '(JobStatus == 5 ) && (time() - x509UserProxyExpiration > 4 * 3600)' -af 'strcat(ClusterId,".",ProcId)' | xargs condor_ce_rm

Do i miss something obvious?
Cheers,
Stefano