[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Dynamically allocating and releasing custom resource.



 

Hello,

 

Could you please let me know if there is a way to dynamically allocate and release some custom machine resource in Condor? If yes, where could I read about it?

 

I am already doing it statically by allocating the resource in the Condor config file and requesting the resource statically per job. But now I want to do it dynamically to use the resource more efficiently.

 

Example, machine learning jobs require GPU RAM. Most of the GPU RAM is only needed during training and can be released as soon as the training is complete such that other processes can start using the GPU RAM for training their own models. Each job can do multiple model trainings per run and does other calculations in between the trainings when the GPU RAM usage is much smaller, and so, once model training is complete, it is desirable to notify other processes that they can allocate GPU RAM if they are ready start training their own models.

 

Is it possible to implement this in Condor (using Python API, in my case) or I have to write my own code using, for example, shared memory to manage the resource and its current usage?

 

Thank you for your help,

Siarhei.

 

............................................................................

Trading instructions sent electronically to Bernstein shall not be deemed
accepted until a representative of Bernstein acknowledges receipt
electronically or by telephone.  Comments in this e-mail transmission and
any attachments are part of a larger body of investment analysis. For our
research reports, which contain information that may be used to support
investment decisions, and disclosures see our website at
www.bernsteinresearch.com.

For further important information about AllianceBernstein please click here
http://www.alliancebernstein.com/disclaimer/email/disclaimer.html