[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] GPU and condor?



Dear Miron,

Here I have a suggestion which may help to integrate GPU, or even new
computing hardwares which we haven't seen into condor. Sorry that I am
not a condor expert. What I am going to mention may already have solutions.
Please correct me if it is this case.

In a computing node, is there a way for condor to run a general shell script
as root, to detect the system, and then export some new attributes in the
hardware ClassAds, when condor start up?

Taking GPU for example, people have asked that how do we detect the
existence of GPUs inside a node. Actually the easist way is to run the
"deviceQuery" code provided by CUDA SDK as root, which will provide
completed information of GPUs installed. Then, with a simple script, we
can parse its output and determine the new attributes to export to condor.

Since new hardwares most likely need new library support, such as GPU
provided by Nvidia needs CUDA. It is not easy to make everything built-in
to condor. So, for us, the easist way is to have a custom script to do the
detection, and then do the run-time configuration.

Cheers,

T.H.Hsieh


2010/1/7 Miron Livny <miron@xxxxxxxxxxx>
To all GPUers out there,

We would be very interested in hearing from you what Condor can do to help you in managing GPU clusters. So far we did not find much we can offer in this space. Any guidance you can provide will be most welcomed.

Miron




Xiang Ni wrote:
Hi Tung-Han Hsieh,

Thanks and you sharing is very helpful!

I'm also interested in this topic and I have some confusions.

How do you make condor aware of the existence of GPUs? By modifying the Hawkeye?

Thanks!

Regards,

2010/1/7 Tung-Han Hsieh <tunghan.hsieh@xxxxxxxxx>:
Hello,

We have some experiences on building a GPU cluster using
condor.

Currently we have two GPU clusters, used for different
research
groups. Each cluster is composed by the following
element:

1. Head node: Running condor server, for users login to
build
             their codes, submit jobs,
etc.

2. File servers: The Lustre Cluster filesystems are
deployeed.

3. Computing nodes: Each node has at least one, at most 4
GPUs.
                   Each cluster has more than 64 GPUs
installed.

4. Communication: one has infiniband network, and the other
use
                 Gigabit
network.

The condor system can allocate multi-GPUs for users. In
our
implimentation the number of CPU cores in each computing
node
is not important. So in condor command file, users
specify
"machine_count" is actually specify the number of GPUs
required.
And the number of GPUs in each node is hard coded as
the
"NUM_CPUS" in the local condor config. file in each
node.

Honestly, we are not the condor experts. Hence we also
developed
some codes to help condor to do more complicated tasks, such
as
user quota for number of GPUs, GPU assignment, dead job cleaning,
etc.
But I guess all of these could be done by condor itself. We
just
don't know how to do, so try the somewhat stupid way to write
codes
to do
those.

Probably we can communicate the experience about this subject
:)


Cheers,

T.H.Hsieh


2010/1/7 Marian Zvada <zvada@xxxxxxxx>
Dear Condor Folks,

is there someone in Condor user's community who has build GPU cluster
based on condor?
I mean someone, who has worker nodes hw with GPU graphical cards and job
management is done by condor on the top.

We are very interested in this topic and would like to build such a
infrastructure (condor + gpu worker nodes) for research people in our
organization.
In first epoch of this project we'd like to develop standalone cluster:

- master condor head node
- 5 gpu worker nodes (each worker node 2x nVIDIA GTX295)
- storage element for data

I know, there is a lot to see on google about such a experiments, but I
wanted to ask directly from condor users about their
opinions/suggestions/recommendations since we are serious about to build
condor gpu cluster and use it in production for our research activities.

If there is someone who has done similar setup and is willing share the
knowledge, I appreciate talk about it! Any url hints are welcome too...

Thanks and regards,
Marian
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/





_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/