[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] /var/lib/condor/spool usage



On 4/1/2015 4:08 PM, Dimitri Maziuk wrote:
On 04/01/2015 02:57 PM, Richard Pieri wrote:

If you need to reserve capacity for privileged processes on ext2/3/4
then 'tune2fs -m X /mount/point' will reserve X% of the file system's
capacity for root.

My disk use on / went from ~70% @ 11:50 to 100% @ 12:10 this afternoon.
The node stayed up: ext4 reserves enough blocks for root by default to
keep it up for some time, but condor daemons keeled over. So the issue
is, can you tell condor to not kill itself?


I think the only daemons that write to spool are the schedd (and shadows) and the negotiator. The point is that the condor_master does should not write to SPOOL. So if the filesystem specified by SPOOL fills, the schedd and/or negotiator may exit, but in that case the condor_master should keep running and periodically attempt to restart the schedd and/or negotiator.

So now the question becomes can the schedd and/or negotiator keep running if SPOOL fills? Well, these two daemons have persistent state that must be kept, i.e. the job queue for the schedd, and the accountant information for the negotiator. Currently these daemons shut down if they cannot safely write this information (and the condor_master will attempt to periodically restart them); are you hoping for a mode where, for instance, the schedd would keep running without logging queue information to disk (so that if the schedd restarted, all that job information would be lost)? Perhaps of interest is the JOB_QUEUE_LOG config knob that allows you to put the job queue on a volume other than SPOOL -- we use this to increase performance on submit hosts which have a solid state drive which is big enough to hold the frequently written to job_queue.log, but not big enough to hold the whole contents of the spool directory.

Currently the schedd makes a subdirectory in SPOOL for each running job that holds intermediate checkpoint files if the submit file for the job requests ON_EXIT_OR_EVICT for when_to_transfer_output. I've long wanted the option to store these intermediate files in the home directory of the user instead of SPOOL so that the space for these intermediate files comes out of that user's own disk quota...

regards,
Todd



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685