I am trying to run thousands of jobs on a large condor grid with a single network storage. We noticed that upon increasing the number of jobs, the systems performance is reduced. We discovered that the network drive condor is trying to copy the files on was overwhelmed by the number of simultaneous connections and when the device was busy the job was dropped and restarted somewhere else (we using vanilla universe under windows 7).
I am trying to implement robocopy in my fortran source code .exe simulation that needs to run on condor so that by using a system call to try sending the files to the storage space this way instead. However this does not appear to work on the condor nodes. I did various checks and it works fine on physical machines.