Re: [Condor-users] GPU and condor?

On Jan 7, 2010, at 9:38 AM, Miron Livny wrote:

To all GPUers out there,

We would be very interested in hearing from you what Condor can do to help you in managing GPU clusters. So far we did not find much we can offer in this space. Any guidance you can provide will be most welcomed.



We've done work helping customers to set up policies enabling GPU scheduling. Our approach has been to set attributes in GPU-specific jobs and slot-types, and require that the attribute be set to match with GPU-specific slots. Condor handles the scheduling gracefully given this setup.

A majority of the work relates to policies. It would be great to get information about the presence of the GPU, its model, and utilization, but we're not aware of any standard ways to do this between GPU vendors/models. GPU model specific scripts can be created to advertise this information in the slot ads using Hawkeye/STARTD_CRON for a dedicated cluster. Condor could help by offering concurrency limits for an individual host (e.g. this machine has a GPU_Limit=2 because it has only 2 GPUs), or making dynamic slots more configurable.

Because of the difficulties w/automatic detection and telemetry, using pre-created policies seems to work well.


