[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Sharing data across nodes



Dear Condor users,

We are looking to set upÂHTCondor to manage computation on a few GPU servers, with a separate machine acting as the central manager. Each of these server machines have substantial local storage ~ 30 TB, and we would like to make them available to users in a seamless fashion. I.e., we would like all the local storage to be accessible across all servers as a single entity, such that, the programs are able to access them seamlessly across servers, and the usersÂdo not have to know/create separate copies of data on each machine. This likely requires a distributed file system suchÂas AWS. But we would likeÂto understand how to set up Condor to work in such a way and the potential pros and cons (latency and burden on the network). We found some places in Condor's documentation that talks aboutÂthis, but we would appreciateÂany direct feedback/pointers on its performance and how to get started before going deeper into the documentation.Â

Thank you very much!
Best