Tim Blattner and I worked on this for discovering and advertising GPUs using Hawkeye. For Linux, we have an automated script to detect NVIDIA GPUs and the availability of CUDA with its version. That is what is available on the Sourceforge site. Feel free to use and improve it. (Tim has since graduated and moved on to a PhD program, so development is sitting idle right now.)
Not much of our work has gone into policy decisions or in identifying when jobs have control of a GPU (or GPUs).
A general solution will be tricky, given the current variety of libraries (e.g. CUDA, OpenCL, or OpenGL for older GPGPU codes) and different kinds of GPU processors (or in general other co-processors) available.
A few other comments. I had Tim investigate what happens to memory on the card when jobs complete or are preempted. It appears that GPU memory is not cleared, and since it's not controlled by the operating system the same as host memory, it's possible to read left over data on the card. That bothered us.
Also, it was possible to write a GPU kernel with a simple infinite loop that would prevent Condor from preempting the job, so we weren't entirely convinced GPUs were robust enough to handle an environment with ill-behaved jobs. Has anyone else run into this problem? (Current GPUs may be better than the older ones we used for testing.)
On Jan 7, 2010, at 8:59 AM, Michael O'Donnell wrote:
Our group has been considering this technology as well. I suggest taking a look at these URLS (if you have not seen them already):
The work that has been done has been with Linux OS. Our group has only just began using Condor and we have not spent any significant time looking into this technology, but I think there could be great potential. We are using Condor to support a wide array of research that utilizes statistical packages, Geographic Information Science (GIS) applications, Java and others and therefore, we have not determined how restricted we would be using this technology. A lot of our most demanding work is in GIS, and a majority of our applications work with proprietary software so some of our initial concerns require investigating what type of applications and what type of programming languages will work in this environment as well as what is involved with getting Condor and GPUs to work on Windows environment.
If you do make any progress, it would be great to hear back from you.
Dear Condor Folks,
is there someone in Condor user's community who has build GPU cluster
based on condor?
I mean someone, who has worker nodes hw with GPU graphical cards and job
management is done by condor on the top.
We are very interested in this topic and would like to build such a
infrastructure (condor + gpu worker nodes) for research people in our
In first epoch of this project we'd like to develop standalone cluster:
- master condor head node
- 5 gpu worker nodes (each worker node 2x nVIDIA GTX295)
- storage element for data
I know, there is a lot to see on google about such a experiments, but I
wanted to ask directly from condor users about their
opinions/suggestions/recommendations since we are serious about to build
condor gpu cluster and use it in production for our research activities.
If there is someone who has done similar setup and is willing share the
knowledge, I appreciate talk about it! Any url hints are welcome too...
Thanks and regards,
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at:
Craig A. Struble, Ph.D. | 369 Cudahy Hall | Marquette University
Associate Professor of Computer Science | (414)288-3783
Director, Master of Bioinformatics Program | (414)288-5472 (fax)