[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Sharing data across nodes



Hi Krishna,

A distributed filesystem such as ceph (www.ceph.io) or hdfs (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) may provide what you need.

--Mike

On 5/17/22 13:55, Krishna Nanda wrote:
Dear Condor users,

We are looking to set up HTCondor to manage computation on a few GPU
servers, with a separate machine acting as the central manager. Each of
these server machines have substantial local storage ~ 30 TB, and we would
like to make them available to users in a seamless fashion. I.e., we would
like all the local storage to be accessible across all servers as a single
entity, such that, the programs are able to access them seamlessly across
servers, and the users do not have to know/create separate copies of data
on each machine. This likely requires a distributed file system such as
AWS. But we would like to understand how to set up Condor to work in such a
way and the potential pros and cons (latency and burden on the network). We
found some places in Condor's documentation that talks about this, but we
would appreciate any direct feedback/pointers on its performance and how to
get started before going deeper into the documentation.

Thank you very much!
Best


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/