[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] GPU and condor?
- Date: Thu, 7 Jan 2010 23:20:28 +0800
- From: Xiang Ni <nixiang.nn@xxxxxxxxx>
- Subject: Re: [Condor-users] GPU and condor?
Hi Tung-Han Hsieh,
Thanks and you sharing is very helpful!
I'm also interested in this topic and I have some confusions.
How do you make condor aware of the existence of GPUs? By modifying the Hawkeye?
2010/1/7 Tung-Han Hsieh <tunghan.hsieh@xxxxxxxxx>:
> We have some experiences on building a GPU cluster using
> Currently we have two GPU clusters, used for different
> groups. Each cluster is composed by the following
> 1. Head node: Running condor server, for users login to
> their codes, submit jobs,
> 2. File servers: The Lustre Cluster filesystems are
> 3. Computing nodes: Each node has at least one, at most 4
> Each cluster has more than 64 GPUs
> 4. Communication: one has infiniband network, and the other
> The condor system can allocate multi-GPUs for users. In
> implimentation the number of CPU cores in each computing
> is not important. So in condor command file, users
> "machine_count" is actually specify the number of GPUs
> And the number of GPUs in each node is hard coded as
> "NUM_CPUS" in the local condor config. file in each
> Honestly, we are not the condor experts. Hence we also
> some codes to help condor to do more complicated tasks, such
> user quota for number of GPUs, GPU assignment, dead job cleaning,
> But I guess all of these could be done by condor itself. We
> don't know how to do, so try the somewhat stupid way to write
> to do
> Probably we can communicate the experience about this subject
> 2010/1/7 Marian Zvada <zvada@xxxxxxxx>
>> Dear Condor Folks,
>> is there someone in Condor user's community who has build GPU cluster
>> based on condor?
>> I mean someone, who has worker nodes hw with GPU graphical cards and job
>> management is done by condor on the top.
>> We are very interested in this topic and would like to build such a
>> infrastructure (condor + gpu worker nodes) for research people in our
>> In first epoch of this project we'd like to develop standalone cluster:
>> - master condor head node
>> - 5 gpu worker nodes (each worker node 2x nVIDIA GTX295)
>> - storage element for data
>> I know, there is a lot to see on google about such a experiments, but I
>> wanted to ask directly from condor users about their
>> opinions/suggestions/recommendations since we are serious about to build
>> condor gpu cluster and use it in production for our research activities.
>> If there is someone who has done similar setup and is willing share the
>> knowledge, I appreciate talk about it! Any url hints are welcome too...
>> Thanks and regards,
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> The archives can be found at:
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> The archives can be found at:
Sino-German Joint Software Institute
Computer Science&Engineer Deparment of Beihang University