[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] jobs with undefined JobStatus in limbo



Hi all,

we have a number of jobs, where the JobStatus is undefined like [1].

These jobs were apparently submitted during a short window, where we had
deployed a broken config (nothing 'mayor', merely a broken bracket). The
jobs might be late materialization jobs but not necessarily AFAIS.

Thing is, that we cannot remove or release them from their undefined
limbo as the schedd(?) seem not to know about them at that point. In the
logs, only a job transform for the schedd mentions the job ID [2].
A restart the daemons has not affected these jobs.

Next step would be a reboot of the machine, but maybe somebody has an
idea how to get rid of these jobs?

Cheers,
  Thomas

[1]
> condor_q -l  10738840.0
...
JobStatus = undefined
LastJobStatus = 1
...

[2]
/var/log/condor/SchedLog:07/22/20 12:14:33 (pid:528138) job_transforms
for 10738840.0: 12 considered, 10 applied
(T01SysDefaultProject,T02JobDefaults,T03JobValues,T04JobEnhance,T05JobClasses,T07AccountingStatusHold,T08DefaultToOS,T10BirdResource,T11ShellEnvironment,T12JobHistory)

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature