we have condor installed at the campus (around 3500 machines available) and I am trying to submit around 3000 jobs per instance. I have installed condor on my office machine and it acts as a server which administrates the submission and orchestrates the whole thing. The problem is that my hard disk is not fast enough to keep a track of more than 400-500 machines (I have checked the disk queue length while condor is running and it is rather large). We have a network storage scheme which is extremely fast. I was wondering how can I store the “spool” file that keeps the checkpoints for every job in my network space instead of my local machine. I have benchmarked the network storage location and it is fast enough to do the job. The problem is that I don’t know how to make my machine to use the network for checkpoint storage instead of the local one in my computer.
I have seen the “checkpoint server” option but I am not sure if there is any other simpler method to do that.