[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] GPU and condor?
On Jan 7, 2010, at 9:38 AM, Miron Livny wrote:
To all GPUers out there,
We would be very interested in hearing from you what Condor can do
to help you in managing GPU clusters. So far we did not find much we
can offer in this space. Any guidance you can provide will be most
We've done work helping customers to set up policies enabling GPU
scheduling. Our approach has been to set attributes in GPU-specific
jobs and slot-types, and require that the attribute be set to match
with GPU-specific slots. Condor handles the scheduling gracefully
given this setup.
A majority of the work relates to policies. It would be great to get
information about the presence of the GPU, its model, and utilization,
but we're not aware of any standard ways to do this between GPU
vendors/models. GPU model specific scripts can be created to
advertise this information in the slot ads using Hawkeye/STARTD_CRON
for a dedicated cluster. Condor could help by offering concurrency
limits for an individual host (e.g. this machine has a GPU_Limit=2
because it has only 2 GPUs), or making dynamic slots more configurable.
Because of the difficulties w/automatic detection and telemetry, using
pre-created policies seems to work well.
Ian D. Alderman
Cycle Computing, LLC
Leader in Condor Grid Solutions
Enterprise Condor Support and Management Tools