[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Docker nvidia runtime support


We have been running HTCondor for a while mainly for Python/MATLAB workloads and we want to start packing our applications into container images however some of them depend on accessing NVIDIA GPUs

NVIDIA has released a container runtime for docker that allows direct access to the GPU without having to pass it to the container. Besides having to install this runtime, docker has to be called with --runtime=nvidia

We could allow users to run their jobs in a vanilla universe and call a job wrapper that eventually calls docker but this opens our servers to security vulnerabilities that we want to avoid. The docker universe already does everything we need in terms of restricting user permissions and taking care of mounting volumes automatically but lacks the possibility of passing additional arguments.

Do you guys think it is possible or feasible to add this option to the docker universe?

If I checked the source code correctly, something identical to this might work,

// drop unneeded Linux capabilities
if (param_boolean("DOCKER_DROP_ALL_CAPABILITIES", true /*default*/,
true /*do_log*/, &machineAd, &jobAd)) {
// --no-new-privileges flag appears in docker 1.11
if (DockerAPI::majorVersion > 1 ||
  DockerAPI::minorVersion > 10) {

More info:Âhttps://github.com/NVIDIA/nvidia-docker


JoÃo BaÃto
ScientificÂComputing and Software Platform
Champalimaud Research
Champalimaud Center for the Unknown
Av. BrasÃlia, Doca de PedrouÃos
1400-038 Lisbon, Portugal