[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] GPU and condor?



Hi Tung-Han Hsieh,

Thanks and you sharing is very helpful!

I'm also interested in this topic and I have some confusions.

How do you make condor aware of the existence of GPUs? By modifying the Hawkeye?

Thanks!

Regards,

2010/1/7 Tung-Han Hsieh <tunghan.hsieh@xxxxxxxxx>:
> Hello,
>
> We have some experiences on building a GPU cluster using
> condor.
>
> Currently we have two GPU clusters, used for different
> research
> groups. Each cluster is composed by the following
> element:
>
> 1. Head node: Running condor server, for users login to
> build
>               their codes, submit jobs,
> etc.
>
> 2. File servers: The Lustre Cluster filesystems are
> deployeed.
>
> 3. Computing nodes: Each node has at least one, at most 4
> GPUs.
>                     Each cluster has more than 64 GPUs
> installed.
>
> 4. Communication: one has infiniband network, and the other
> use
>                   Gigabit
> network.
>
> The condor system can allocate multi-GPUs for users. In
> our
> implimentation the number of CPU cores in each computing
> node
> is not important. So in condor command file, users
> specify
> "machine_count" is actually specify the number of GPUs
> required.
> And the number of GPUs in each node is hard coded as
> the
> "NUM_CPUS" in the local condor config. file in each
> node.
>
> Honestly, we are not the condor experts. Hence we also
> developed
> some codes to help condor to do more complicated tasks, such
> as
> user quota for number of GPUs, GPU assignment, dead job cleaning,
> etc.
> But I guess all of these could be done by condor itself. We
> just
> don't know how to do, so try the somewhat stupid way to write
> codes
> to do
> those.
>
> Probably we can communicate the experience about this subject
> :)
>
>
> Cheers,
>
> T.H.Hsieh
>
>
> 2010/1/7 Marian Zvada <zvada@xxxxxxxx>
>>
>> Dear Condor Folks,
>>
>> is there someone in Condor user's community who has build GPU cluster
>> based on condor?
>> I mean someone, who has worker nodes hw with GPU graphical cards and job
>> management is done by condor on the top.
>>
>> We are very interested in this topic and would like to build such a
>> infrastructure (condor + gpu worker nodes) for research people in our
>> organization.
>> In first epoch of this project we'd like to develop standalone cluster:
>>
>> - master condor head node
>> - 5 gpu worker nodes (each worker node 2x nVIDIA GTX295)
>> - storage element for data
>>
>> I know, there is a lot to see on google about such a experiments, but I
>> wanted to ask directly from condor users about their
>> opinions/suggestions/recommendations since we are serious about to build
>> condor gpu cluster and use it in production for our research activities.
>>
>> If there is someone who has done similar setup and is willing share the
>> knowledge, I appreciate talk about it! Any url hints are welcome too...
>>
>> Thanks and regards,
>> Marian
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>



-- 
Xiang Ni
Sino-German Joint Software Institute
Computer Science&Engineer Deparment of Beihang University
100191