[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] GPU and condor?



Hello,

For a public reference, please refer to:

http://xxx.lanl.gov/abs/0911.5029

Our first GPU cluster is developed for TWQCD Collaboration, to do Lattice QCD
research. Recently another GPU cluster is developed for the CQSE research
group about 2 weeks ago:

http://cqse.ntu.edu.tw/

However, there may not be too much description about hardware setup. So I can
summarize here.

1. Our computing nodes are all Intel based, Core i7 or Xeon E5530 (or higher). We
    do not have too much experiences on AMD CPUs. But with Intel compilers + the
    current products of Intel CPUs, we found that the performance is quite well.

2. We have S1070 or GTX-280 or GTX-285 installed in the computing node. For
    GTX-280 or GTX-285, they are installed in the Core i7 based PCs, with 4U rack.
    Each node has two GPU installed. The power supply should be at least 750 Watt.
    The memory is 12GB in each node. We have tested that this configuration has the
    optimal throughputs v.s. the costs of hardwares in our applications.

3. For S1070, they are connected to the Supermicro 2U Twin servers (you can
    search them in Google). In a 2U rack, there are 4 computing nodes. Each node
    has dual Xeon E5530, 24GB memory, on board infiniband QDR x4, and a PCI-E
    gen2 interface to control half of a S1070 machine (i.e., each computing node has
    2 GPUs, too). Therefore, one 2U rack of Supermicro 2U Twin can control 2 unit
    of S1070, totally 8 GPUs in 4U rack. This advanced installation is intended for
    more large scale applications we are going to do.

4. We don't have to transfer any data during running jobs, because the whole cluster
    share a common storage space. We installed the Lustre Cluster filesystem (please
    search in Google) in file servers. Here the hardware configuration can have many
    choices. For example, at least we have tried:

    a. One large disk array (SCSI to SATA) with 16TB connected to two I/O servers.

    b. Two powerful I/O server with 8TB SATA built-in disk array.

    In any case, the storage devices of each I/O server can be united to be a logically
    unique storage device (with doubled storage space, of course), and shared by the
    whole cluster. Each node can see the identical filesystem. The data I/O is parallelly
    performed in both I/O nodes. The configuration is expandable. If needed, we can
    add more I/O nodes to scale up both the storage space and performance.

5. There are additional two servers. First one is for login, condor main control, and the
    gateway of the whole cluster. In short, it is the central control plus the user interface.
    The second node is actually part of the Lustre filesystem. It is the file meta-data
    managemet server. The Lustre manual suggest that the proper configuration is to
    separate the real data storage (previously described above) and meta-data
    management to different servers.

6. Our application is currently single GPU, using multi-precision with data size about
    2GB running almost entirely inside GPU. Please refer to our paper for more details.

7. We do have implemented simple scheme to assign one of the free GPU to the newly
    submitted job inside a computing node. Because we started our project about one
    and half years ago. At that time CUDA has no advanced function such as the exclusive
    mode (only allow one code running in one GPU) available today. We found that if
    there are 2 codes running in the same GPU, both results go crazy, especially for
    GTX-280 or GTX-285. So we have to impliment the GPU assignment by ourselves.

    Our method is very simple. We wrote a daemon (using perl) running in each computing
    node. When a job is submitted, and condor found that there are free GPUs in one
    of the computing node (actually, condor think that there are free CPUs, because
    for us we identify the GPUs as CPUs inside condor), and send the job to that node.
    However, in that node there are several GPUs, to assign a GPU to the job is the
    responsibility of that node. So the job has to run a command (also written by us) to
    query a GPU_ID from the daemon. The daemon internally has a record that which
    GPU has assigned to which condor job. Then it will response a GPU_ID to the job.
    Then go back to the job, the CUDA code will pick that GPU_ID, specify to use it, and
    finally start running.

    Therefore, our daemon is actually not tightly bounded to condor. Actually it can be
    easily modified for other job queuing system, such as PBS. But it also has
    disadventages: It cannot force users to do this. Users who do not follow our scheme
    still have chance to run GPU codes as long as condor send it to run, and distrub
    other GPU jobs already running in the computing nodes (e.g., it use the GPU which
    is already assigned to other jobs). So we strictly require our users to follow our scheme
    to design their code.

    In the future we will look into, say, the exclusive mode of CUDA, to fix this problem

Cheers,

T.H.Hsieh


2010/1/8 Marian Zvada <zvada@xxxxxxxx>
Hi there,

thanks for your idea, that's really what we'd like to setup - the schema you sent in first email is also in our minds.

However, I was just curious, if someone else started to do anything about that.
Thung-Han, is there anything more "public visible" of your work for good reference, e.g. what hw do you use, how do you make data move, what is size of data you process on gpu(s)?
Sounds interesting if you have in your condor pool more than 64 gpus...

I've designed simple topology what is our scope of interest. First, build standalone condor pool with just gpus and then based on classads force the pool have gpus and cpus as well for different types of jobs (cpu intensive or gpu intensive).

If there is anything you might offer in order to give us ideas how to proceed, that'll be great.

Our initial hw setup will look like as follows (this is about to be purchased in the following week):
1) central head node (schedd, col/neg): any xeon machine
2) user interface for job submission: no need for too strong machine
3) 5x worker nodes: 2x Opteron Quad Core model, 2x nVIDIA GTX295, 32GB RAM, that there is 4GB per core
  - we'd like to build such workers in 4U chasis

I've read several reviews, benchmarks, but still not sure what exact way to go with HW of worker nodes, this setup sounds to us most acceptable as cheap solutions and maintainable by our local hw experts... Also, there is lot of questions of power consumption, etc... but that's what we will measure just when the infrastructure is on the place, it's gonna be experiment we want develop by self-grown evolution from the user needs. For now, GTX295 sound sufficient enough, just the cluster environment is big need and still missing. Hope, that condor software might help to our people.

I appreciate any useful direction in advance if somebody can offer other advice.

Thanks,
Marian

Tung-Han Hsieh wrote:
Hello Xiang Ni,

Actually we did not do anything to make condor aware of the existence of GPUs.
What we have done is simple and somewhat stupid: That is, hard coded.

Let me post a condor setting in one of our computing node, so that you may be
clear about our implimentation:

===================================================================
DAEMON_LIST = MASTER, STARTD
NETWORK_INTERFACE = 192.168.2.123
NODE_ID  = 122
NUM_CPUS = 2
MTYPE    = "GPU_2G"
NETTYPE  = "GB"
N_HWCPUS = 2
STARTD_ATTRS = "$(COLLECTOR_HOST)", NODE_ID, MTYPE, NETTYPE, N_HWCPUS
===================================================================

This is the local condor config. file of 192.168.2.123 node. In this node we installed 2 GPUs
with model GTX-285, each has 2GB GPU memory. So we define a new attribute "MTYPE"
which has the value "GPU_2G", and force condor to believe that this node has 2 GPUs
(actually, condor thinks that it has 2 CPUs) by setting NUM_CPUS=2.

Therefore, if you want to mix machines, some has GPUs and some do not, then in our
simple implimentation we will just set different values of MTYPE in each machines, and
ask user to specify the "Requirement" in their condor command file, in order to submit
their jobs to the correct group of nodes.

Using this way, probably any standard condor distribution can be used in a GPU cluster.

Cheers,

T.H.Hsieh


2010/1/7 Xiang Ni <nixiang.nn@xxxxxxxxx <mailto:nixiang.nn@xxxxxxxxx>>


   Hi Tung-Han Hsieh,

   Thanks and you sharing is very helpful!

   I'm also interested in this topic and I have some confusions.

   How do you make condor aware of the existence of GPUs? By modifying
   the Hawkeye?

   Thanks!

   Regards,

   2010/1/7 Tung-Han Hsieh <tunghan.hsieh@xxxxxxxxx
   <mailto:tunghan.hsieh@xxxxxxxxx>>:

    > Hello,
    >
    > We have some experiences on building a GPU cluster using
    > condor.
    >
    > Currently we have two GPU clusters, used for different
    > research
    > groups. Each cluster is composed by the following
    > element:
    >
    > 1. Head node: Running condor server, for users login to
    > build
    >               their codes, submit jobs,
    > etc.
    >
    > 2. File servers: The Lustre Cluster filesystems are
    > deployeed.
    >
    > 3. Computing nodes: Each node has at least one, at most 4
    > GPUs.
    >                     Each cluster has more than 64 GPUs
    > installed.
    >
    > 4. Communication: one has infiniband network, and the other
    > use
    >                   Gigabit
    > network.
    >
    > The condor system can allocate multi-GPUs for users. In
    > our
    > implimentation the number of CPU cores in each computing
    > node
    > is not important. So in condor command file, users
    > specify
    > "machine_count" is actually specify the number of GPUs
    > required.
    > And the number of GPUs in each node is hard coded as
    > the
    > "NUM_CPUS" in the local condor config. file in each
    > node.
    >
    > Honestly, we are not the condor experts. Hence we also
    > developed
    > some codes to help condor to do more complicated tasks, such
    > as
    > user quota for number of GPUs, GPU assignment, dead job cleaning,
    > etc.
    > But I guess all of these could be done by condor itself. We
    > just
    > don't know how to do, so try the somewhat stupid way to write
    > codes
    > to do
    > those.
    >
    > Probably we can communicate the experience about this subject
    > :)
    >
    >
    > Cheers,
    >
    > T.H.Hsieh
    >
    >
    > 2010/1/7 Marian Zvada <zvada@xxxxxxxx <mailto:zvada@xxxxxxxx>>

    >>
    >> Dear Condor Folks,
    >>
    >> is there someone in Condor user's community who has build GPU
   cluster
    >> based on condor?
    >> I mean someone, who has worker nodes hw with GPU graphical cards
   and job
    >> management is done by condor on the top.
    >>
    >> We are very interested in this topic and would like to build such a
    >> infrastructure (condor + gpu worker nodes) for research people
   in our
    >> organization.
    >> In first epoch of this project we'd like to develop standalone
   cluster:
    >>
    >> - master condor head node
    >> - 5 gpu worker nodes (each worker node 2x nVIDIA GTX295)
    >> - storage element for data
    >>
    >> I know, there is a lot to see on google about such a
   experiments, but I
    >> wanted to ask directly from condor users about their
    >> opinions/suggestions/recommendations since we are serious about
   to build
    >> condor gpu cluster and use it in production for our research
   activities.
    >>
    >> If there is someone who has done similar setup and is willing
   share the
    >> knowledge, I appreciate talk about it! Any url hints are welcome
   too...
    >>
    >> Thanks and regards,
    >> Marian
    >> _______________________________________________
    >> Condor-users mailing list
    >> To unsubscribe, send a message to
   condor-users-request@xxxxxxxxxxx
   <mailto:condor-users-request@xxxxxxxxxxx> with a

    >> subject: Unsubscribe
    >> You can also unsubscribe by visiting
    >> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
    >>
    >> The archives can be found at:
    >> https://lists.cs.wisc.edu/archive/condor-users/
    >
    >
    > _______________________________________________
    > Condor-users mailing list
    > To unsubscribe, send a message to
   condor-users-request@xxxxxxxxxxx
   <mailto:condor-users-request@xxxxxxxxxxx> with a

    > subject: Unsubscribe
    > You can also unsubscribe by visiting
    > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
    >
    > The archives can be found at:
    > https://lists.cs.wisc.edu/archive/condor-users/
    >
    >



   --
   Xiang Ni
   Sino-German Joint Software Institute
   Computer Science&Engineer Deparment of Beihang University
   100191
   _______________________________________________
   Condor-users mailing list
   To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
   <mailto:condor-users-request@xxxxxxxxxxx> with a

   subject: Unsubscribe
   You can also unsubscribe by visiting
   https://lists.cs.wisc.edu/mailman/listinfo/condor-users

   The archives can be found at:
   https://lists.cs.wisc.edu/archive/condor-users/



------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/