Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Multi GPUs on multiple nodes

Date: Tue, 21 Nov 2023 10:26:24 -0600
From: Benedikt Riedel <briedel@xxxxxxxxxxxxxxxx>
Subject: Re: [HTCondor-users] Multi GPUs on multiple nodes

Few questions for clarification:

* Do they need more GPUs for their application than is on a single node, i.e. more than 4 or 8?

* Are they using model or data parallelism in their training?

Out of the box pytorch only uses the GPUs on a single machine. For cross-node you will need to use something like like pytorch distribute: https://pytorch.org/tutorials/beginner/dist_overview.html

In HTCondor this would require the parallel universe. I am not sure what the status of that with GPUs is.Â

Benedikt

On Tue, Nov 21, 2023 at 10:04âAM Dudu Handelman <duduhandelman@xxxxxxxxxxx> wrote:

Hi All,

My users using pytorch and considering using multi GPUs on multiple physical servers.

I think that pytorch is able to do that out of the box using tcp as a workers.

I wonder if anyone doing that on top of HTCondor?

Thanks

David.

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Benedikt Riedel

Global Computing Coordinator IceCube Neutrino Observatory

Technical Coordinator IceCube Neutrino Observatory

Computing Manager Wisconsin IceCube Particle Astrophysics Center

University of Wisconsin-Madison

Follow-Ups:
- Re: [HTCondor-users] Multi GPUs on multiple nodes
  - From: Dudu Handelman

References:
- [HTCondor-users] Multi GPUs on multiple nodes
  - From: Dudu Handelman

Prev by Date: [HTCondor-users] Multi GPUs on multiple nodes
Next by Date: Re: [HTCondor-users] Multi GPUs on multiple nodes
Previous by thread: [HTCondor-users] Multi GPUs on multiple nodes
Next by thread: Re: [HTCondor-users] Multi GPUs on multiple nodes
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Multi GPUs on multiple nodes