[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Checkpoint server housekeeping



I'm addressing cleaning up leftover crud in checkpoint server spool dirs, and see in the manual

http://www.cs.wisc.edu/condor/manual/v7.0/3_8Checkpoint_Server.html#SECTION00482000000000000000

a mention of sbin/condor_cleanckpts, which doesn't seem to exist in any of my installations.
Maybe just a documentation update needed..


In src/condor_ckpt_server/WISDOM, it suggests housekeeping checkpoint servers with a
	find *.*.*.*/* -atime +${time} -exec ls -l {} \; -exec rm {} \;
  and crossing one's fingers.

It seems to me that one could probably query to see if a job still exists by parsing the file name, and then be fairly sure the job isn't still around. Before I write such a tool, does anybody else have any wisdom to share about housekeeping checkpoint servers?

-Preston


--
Preston Smith  <psmith@xxxxxxxxxx>
Systems Research Engineer
Rosen Center for Advanced Computing, Purdue University