[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] GPU memory request



You'll want something like this:

require_gpus = GlobalMemoryMb >= 2048

To request a GPU with at least 2GB of GPU memory. The gpus_minimum_discovery is only in the 23.x feature branch I believe, not the 23.0 LTS or 10.9.

-Zach

________________________________________
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Weatherby,Gerard <gweatherby@xxxxxxxx>
Sent: Monday, March 25, 2024 10:40 AM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] GPU memory request

You don't often get email from gweatherby@xxxxxxxxx Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

This seems to work on the 23 nodes:

Universe = vanilla
gpus_minimum_memory = 1MB
request_gpus = 1
Executable   = /usr/bin/echo
Arguments = hello compute
output           = h100.txt
error            = h100.err
Log          = h100.log


however, thereâs a warning

WARNING: the line 'gpus_minimum_memory = 1MB' was unused by condor_submit. Is it a typo?

Is that just a condor_submit bug?

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Todd L Miller via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Date: Monday, March 25, 2024 at 12:08âPM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Todd L Miller <tlmiller@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] GPU memory request
*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

> Weâre running a 10.9 / 23 cluster and using
>
> use feature: GPUs
>
> How does a user request a certain amount of GPU memory?

        For recent releases:

https://urldefense.com/v3/__https://htcondor.readthedocs.io/en/latest/man-pages/condor_submit.html*gpus_minimum_memory__;Iw!!Cn_UX_p3!ldAOUS0h-q3CeQ7kXWVRzVV2rYk2DMGydSduBCmjzVfD56nUmfzrVx2-DhPHylqo2vW__YGH72WtaDyiKJm2vJ0i8vd861Xg$<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Fhtcondor.readthedocs.io%2Fen%2Flatest%2Fman-pages%2Fcondor_submit.html*gpus_minimum_memory__%3BIw!!Cn_UX_p3!ldAOUS0h-q3CeQ7kXWVRzVV2rYk2DMGydSduBCmjzVfD56nUmfzrVx2-DhPHylqo2vW__YGH72WtaDyiKJm2vJ0i8vd861Xg%24&data=05%7C02%7Cmcgrewz%40wwu.edu%7Cc47767f245a840b627d808dc4cf8c8f3%7Cdc46140ce26f43efb0ae00f257f478ff%7C0%7C0%7C638469878631954180%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=XKGgnLFJTAhvoO9VZKvlxYDMKgSx%2BjVNum5pYdSan8E%3D&reserved=0>

        For older releases, you'll have to write an expression:

https://urldefense.com/v3/__https://htcondor.readthedocs.io/en/v10_0/man-pages/condor_submit.html*index-60__;Iw!!Cn_UX_p3!ldAOUS0h-q3CeQ7kXWVRzVV2rYk2DMGydSduBCmjzVfD56nUmfzrVx2-DhPHylqo2vW__YGH72WtaDyiKJm2vJ0i8ukBFdTk$<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2Fhtcondor.readthedocs.io%2Fen%2Fv10_0%2Fman-pages%2Fcondor_submit.html*index-60__%3BIw!!Cn_UX_p3!ldAOUS0h-q3CeQ7kXWVRzVV2rYk2DMGydSduBCmjzVfD56nUmfzrVx2-DhPHylqo2vW__YGH72WtaDyiKJm2vJ0i8ukBFdTk%24&data=05%7C02%7Cmcgrewz%40wwu.edu%7Cc47767f245a840b627d808dc4cf8c8f3%7Cdc46140ce26f43efb0ae00f257f478ff%7C0%7C0%7C638469878631969773%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=0Hm7nOyS3PNSOlNNr6Qimy4Cp4rhrG9rHHSq35aQXrY%3D&reserved=0>

-- ToddM