[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Sharing data across nodes

I think you can benefit from a distributed cache on those servers. 
But first let me ask you where is the data currently stored? 
Does the application will be read only? 


Get Outlook for Android

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Krishna Nanda <krishnaanandk@xxxxxxxxx>
Sent: Tuesday, May 17, 2022 9:55:00 PM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Sharing data across nodes
Dear Condor users,

We are looking to set up HTCondor to manage computation on a few GPU servers, with a separate machine acting as the central manager. Each of these server machines have substantial local storage ~ 30 TB, and we would like to make them available to users in a seamless fashion. I.e., we would like all the local storage to be accessible across all servers as a single entity, such that, the programs are able to access them seamlessly across servers, and the users do not have to know/create separate copies of data on each machine. This likely requires a distributed file system such as AWS. But we would like to understand how to set up Condor to work in such a way and the potential pros and cons (latency and burden on the network). We found some places in Condor's documentation that talks about this, but we would appreciate any direct feedback/pointers on its performance and how to get started before going deeper into the documentation. 

Thank you very much!