[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Duplicate ClusterIds in condor_history



Dan,

Yes, it was restarted right when it started reusing ClusterIds.

I see the file...  I stopped condor, edited job_queue.log, and fixed a
line like

103 0.0 NextClusterNum 4833

replacing 4833 with what the ClusterId *should* be at... That seems to
do it, I submitted a test job and numbering is back where it ought to.

Thanks,

-Preston

On May 3, 2006, at 5:32 PM, Dan Bradley wrote:

Preston,

The schedd state is contained in $SPOOL/job_queue.log. Did your schedd
restart around the time that this happened?

--Dan

On May 3, 2006, at 3:53 PM, Preston Smith wrote:

I was just checking some condor_history output, and noticed I'd
gotten two jobs
returned with the same ClusterId.. Looking further, it seems that
Condor on this
particular submission host decided to jump backwards 5000-some
ClusterIds
and is is now merrily reusing them.

Where does Condor decide what the next ClusterId is? I did recently
see some
filesystem strangeness on ~condor about when the jump-back occurred,
so I
wonder if some state file's been corrupted? Any ideas on how to get
it back on
track to minimize the duplicates?

-Preston

--
Preston Smith  <psmith@xxxxxxxxxx>
Systems Research Engineer
Rosen Center for Advanced Computing, Purdue University



_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users