Dear Condor users,
We are looking to set up HTCondor to manage computation on a few GPU servers, with a separate machine acting as the central manager. Each of these server machines have substantial local storage ~ 30 TB, and we would like to make them available to users
in a seamless fashion. I.e., we would like all the local storage to be accessible across all servers as a single entity, such that, the programs are able to access them seamlessly across servers, and the users do not have to know/create separate copies of
data on each machine. This likely requires a distributed file system such as AWS. But we would like to understand how to set up Condor to work in such a way and the potential pros and cons (latency and burden on the network). We found some places in Condor's
documentation that talks about this, but we would appreciate any direct feedback/pointers on its performance and how to get started before going deeper into the documentation.
Thank you very much!
Best