[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] duplicate jobIDs in the condor_history



Ian Chesal wrote:
So the first question is:

Did you delete the $(SPOOL) directory for the scheduler or the contents of that directory or the job_queue.log files? If so, you reset the the cluster ID counter and that's why you've got duplicates.

If you're certain you haven't wiped the job_queue.log file for the scheduler, is it possible you have multiple schedulers writing to the same history file? If so: that's bad.

Or perhaps you have multiple schedds writing to the same job_queue.log file?? That would also be really bad.

> Each scheduler should have its own
history file.


I would state a superset of the above: each schedd should have its own private log and spool subdirectory.

In any event, i think you can reset the next job id Condor assigns by shutting down your schedd (condor_off -schedd), and append the following to the end of the spool/job_queue.log file:
  105
  103 0.0 NextClusterNum xxxxx
  106
where xxx = the next job cluster id you want to be assigned. Then turn your schedd back on (condor_on -schedd). Note I haven't tried this formula, so buyer beware. And if you haven't fixed the underlying problem why the job ids got reused, it may happen again...

Hope the above helps
Todd