Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] duplicate jobIDs in the condor_history

Date: Wed, 24 Nov 2010 23:40:51 +0000
From: Santanu Das <santanu@xxxxxxxxxxxxxxxxx>
Subject: Re: [Condor-users] duplicate jobIDs in the condor_history

Hi Todd, Ian,

I didn't delete but [now] I think I may have messed it up. I upgraded tov7.4.4 on 18th and I got a totally broken schedd, which was crashing onevery start. During my trail and error session, I renamed the currentspool directory and reconfigured the condor. I sill have the old spooldirectory (with the job_queue.log file intact) - is there anything canbe done to stop getting duplicate jobID?


Cheers,
Santanu

Ian Chesal wrote:
So the first question is:
Did you delete the $(SPOOL) directory for the scheduler or thecontents of that directory or the job_queue.log files? If so, youreset the the cluster ID counter and that's why you've got duplicates.
If you're certain you haven't wiped the job_queue.log file for thescheduler, is it possible you have multiple schedulers writing to thesame history file? If so: that's bad.
Or perhaps you have multiple schedds writing to the same job_queue.logfile?? That would also be really bad.
> Each scheduler should have its own
history file.
I would state a superset of the above: each schedd should have its ownprivate log and spool subdirectory.
In any event, i think you can reset the next job id Condor assigns byshutting down your schedd (condor_off -schedd), and append thefollowing to the end of the spool/job_queue.log file:
  105
  103 0.0 NextClusterNum xxxxx
  106
where xxx = the next job cluster id you want to be assigned. Thenturn your schedd back on (condor_on -schedd). Note I haven't triedthis formula, so buyer beware. And if you haven't fixed theunderlying problem why the job ids got reused, it may happen again...
Hope the above helps
Todd

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

Follow-Ups:
- Re: [Condor-users] duplicate jobIDs in the condor_history
  - From: Todd Tannenbaum

References:
- [Condor-users] Trouble running multithreaded job in vanilla universe
  - From: Christopher Whelan
- Re: [Condor-users] Trouble running multithreaded job in vanilla universe
  - From: Ian Chesal
- Re: [Condor-users] Trouble running multithreaded job in vanilla universe
  - From: Christopher Whelan
- [Condor-users] duplicate jobIDs in the condor_history
  - From: Santanu Das
- Re: [Condor-users] duplicate jobIDs in the condor_history
  - From: Ian Chesal
- Re: [Condor-users] duplicate jobIDs in the condor_history
  - From: Todd Tannenbaum

Prev by Date: Re: [Condor-users] duplicate jobIDs in the condor_history
Next by Date: Re: [Condor-users] duplicate jobIDs in the condor_history
Previous by thread: Re: [Condor-users] duplicate jobIDs in the condor_history
Next by thread: Re: [Condor-users] duplicate jobIDs in the condor_history
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] duplicate jobIDs in the condor_history