[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] RFC: Adding GPUs into Condor



Hi all,

I'd like to receive a few comments on the path where I boldly went forward and 
added GPUs into Condor. I know several places have already done this, but I 
have not found a working "recipe" out there so far. Note of caution, this is 
still work in progress as I need to figure out how to get this into the frame 
of dynamic slots

This is the current extra configuration of a box with 4 CPU cores and 4 GPUs 
(beware of long lines):

# start only standard universe or jobs which ask for a GPU "+WantGPU in the 
submit file)
START = ( JobUniverse =?= 1 || target.WantGPU =?= true )
# rank GPU jobs much higher to kick out non-GPU jobs
RANK = ( target.WantGPU =?= true ) * 10000000
# these settings are added by a CUDA program identifying the available cards
STARTD_ATTRS = GPU_DEV, GPU_NAME, GPU_CAPABILITY, GPU_GLOBALMEM_MB, \
  GPU_MULTIPROC, GPU_NUMCORES, GPU_CLOCK_GHZ, GPU_CUDA_DRV, \
  GPU_CUDA_RUN, GPU_MULTIPROC, GPU_NUMCORES
SLOT1_GPU_CUDA_DRV=3.20
SLOT1_GPU_CUDA_RUN=3.20
SLOT1_GPU_DEV=0
SLOT1_GPU_NAME="Tesla C2050"
SLOT1_GPU_CAPABILITY=2.0
SLOT1_GPU_GLOBALMEM_MB=2687
SLOT1_GPU_MULTIPROC=14
SLOT1_GPU_NUMCORES=32
SLOT1_GPU_CLOCK_GHZ=1.15
SLOT2_GPU_CUDA_DRV=3.20
SLOT2_GPU_CUDA_RUN=3.20
SLOT2_GPU_DEV=1
SLOT2_GPU_NAME="Tesla C2050"
SLOT2_GPU_CAPABILITY=2.0
SLOT2_GPU_GLOBALMEM_MB=2687
SLOT2_GPU_MULTIPROC=14
SLOT2_GPU_NUMCORES=32
SLOT2_GPU_CLOCK_GHZ=1.15
SLOT3_GPU_CUDA_DRV=3.20
SLOT3_GPU_CUDA_RUN=3.20
SLOT3_GPU_DEV=2
SLOT3_GPU_NAME="Tesla C2050"
SLOT3_GPU_CAPABILITY=2.0
SLOT3_GPU_GLOBALMEM_MB=2687
SLOT3_GPU_MULTIPROC=14
SLOT3_GPU_NUMCORES=32
SLOT3_GPU_CLOCK_GHZ=1.15
SLOT4_GPU_CUDA_DRV=3.20
SLOT4_GPU_CUDA_RUN=3.20
SLOT4_GPU_DEV=3
SLOT4_GPU_NAME="Tesla C2050"
SLOT4_GPU_CAPABILITY=2.0
SLOT4_GPU_GLOBALMEM_MB=2687
SLOT4_GPU_MULTIPROC=14
SLOT4_GPU_NUMCORES=32
SLOT4_GPU_CLOCK_GHZ=1.15

most of this output is added to condor's config during boot-up when a local 
service runs and queries the available cards (attached program).

A typical submit file might look like this:

Executable     = matrixmult.sh
Arguments      = $$(GPU_DEV) 

# these variables are available on a per slot basis
# $$(GPU_NAME) $$(GPU_CAPABILITY) $$(GPU_GLOBALMEM_MB) $$(GPU_MULTIPROC) 
$$(GPU_NUMCORES) $$(GPU_CLOCK_GHZ) $$(GPU_CUDA_DRV) $$(GPU_CUDA_RUN)

Error   = logs/err.$(Process)
Output  = logs/log.$(Process)
Log = gpu-local.log
+WantGPU=True
Universe = vanilla
Queue 10

with matrixmult.sh being
#!/bin/sh

DEVID=$1

# get the cuda environment on our cluster
. /usr/local/nvidia/sdk-3.2/setup.sh
/usr/local/nvidia/sdk-3.2/C/bin/linux/release/matrixMul --noprompt --
device=$DEVID


As you can see here, this is something very specific to our local systems.

My questions:

* is there a better way to do it?
* this is tailored for our Nvidia cards (we don't have any AMD ones in the 
cluster environment so far), thus a similar beast needs to be slaughtered for 
AMD GPUs or ideally a fuly OpenCL'ed version :)
* any other comments?

Cheers

Carsten
/*
 * Copyright 1993-2010 NVIDIA Corporation.  All rights reserved.
 *
 * NVIDIA Corporation and its licensors retain all intellectual property and 
 * proprietary rights in and to this software and related documentation. 
 * Any use, reproduction, disclosure, or distribution of this software 
 * and related documentation without an express license agreement from
 * NVIDIA Corporation is strictly prohibited.
 *
 * Please refer to the applicable NVIDIA end user license agreement (EULA) 
 * associated with this source code for terms and conditions that govern 
 * your use of this NVIDIA software.
 * 
 */

/* This program was derived from the 3.2 SDK version of 
 * deviceQuery.cpp by Carsten Aulbert <carsten.aulbert@xxxxxxxxxx> and
 * hence includes original source code from NVIDIA. I hope to compley
 * with the EULA by placing this statement here:
 *  "This software contains source code provided by NVIDIA Corporation."
 * My personal changes/addition to the original code are hereby placed in 
 * the public domain.
 * No warrenty is attached to this code at all.
 */

// utilities and system includes
#include <shrUtils.h>

// CUDA-C includes
#include <cuda_runtime_api.h>

////////////////////////////////////////////////////////////////////////////////
// Program main
////////////////////////////////////////////////////////////////////////////////
int
main( int argc, const char** argv) 
{
    int deviceCount = 0;
	if (cudaGetDeviceCount(&deviceCount) != cudaSuccess) {
		shrLog("cudaGetDeviceCount FAILED CUDA Driver and Runtime version may be mismatched.\n");
		shrLog("\nFAILED\n");
		shrEXIT(argc, argv);
	}

    // This function call returns 0 if there are no CUDA capable devices.
    if (deviceCount == 0)
        shrLog("There is no device supporting CUDA\n");
    shrLog("STARTD_ATTRS = GPU_DEV, GPU_NAME, GPU_CAPABILITY, GPU_GLOBALMEM_MB, GPU_MULTIPROC, GPU_NUMCORES, GPU_CLOCK_GHZ");

  #if CUDART_VERSION >= 2020
    shrLog(", GPU_CUDA_DRV, GPU_CUDA_RUN");
  #endif
  #if CUDART_VERSION >= 2000
    shrLog(", GPU_MULTIPROC, GPU_NUMCORES");
  #endif
    shrLog("\n");

    int dev;
    for (dev = 1; dev <= deviceCount; ++dev) {
        cudaDeviceProp deviceProp;
	int driverVersion=0, runtimeVersion=0;
        cudaGetDeviceProperties(&deviceProp, dev-1);

    #if CUDART_VERSION >= 2020
	cudaDriverGetVersion(&driverVersion);
	cudaRuntimeGetVersion(&runtimeVersion);
	shrLog("SLOT%d_GPU_CUDA_DRV=%d.%d\n", dev, driverVersion/1000, driverVersion%100);
	shrLog("SLOT%d_GPU_CUDA_RUN=%d.%d\n", dev, runtimeVersion/1000, runtimeVersion%100);
    #endif

        shrLog("SLOT%d_GPU_DEV=%d\n", dev, dev-1);
        shrLog("SLOT%d_GPU_NAME=\"%s\"\n", dev, deviceProp.name);
        shrLog("SLOT%d_GPU_CAPABILITY=%d.%d\n", dev, deviceProp.major, deviceProp.minor);
	shrLog("SLOT%d_GPU_GLOBALMEM_MB=%.0f\n", dev, deviceProp.totalGlobalMem/(1024.*1024.));
    #if CUDART_VERSION >= 2000
        shrLog("SLOT%d_GPU_MULTIPROC=%d\n", dev, deviceProp.multiProcessorCount);
        shrLog("SLOT%d_GPU_NUMCORES=%d\n", dev, ConvertSMVer2Cores(deviceProp.major, deviceProp.minor));
    #endif

        shrLog("SLOT%d_GPU_CLOCK_GHZ=%.2f\n", dev, deviceProp.clockRate * 1e-6f);
    }
	// csv masterlog info
    // *****************************
    // exe and CUDA driver name 
    std::string sProfileString = "deviceQuery, CUDA Driver = CUDART";        
    
    //    shrLogEx(LOGBOTH | MASTER, 0, sProfileString.c_str());
    
    return 0;
}