[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Singularity does not start with --nv argument when GPUs are requested




Hi,

I've recently installed condor-8.8.0-1.el7.x86_64 in a small cluster
containing two machines with GPU.

I've tried to submit a job using a singularity image that uses
the GPU, but I can see from the StarterLog file that the singularity
is not invoked with the --nv argument required to allow the container to access the nvidia devices.

Am I missing something in the configuration ?


Startd conf:
  ...
  SINGULARITY_JOB = !isUndefined(TARGET.SingularityImage)
  SINGULARITY_IMAGE_EXPR = TARGET.SingularityImage

Job definition:
  ...
  +SingularityImage = "/path_to_image/image.simg"
  ...


I can always run the singularity image directly:

   ...
   executable = /usr/bin/singularity
   arguments  = "exec --nv -C /path_to_image/image.simg ..."
   ...

but I would prefer let the condor daemon deal with the
singularity setup.

Since I didn't find the way to change the condor configuration to
append --nv to the singularity call, I was trying to patch
src/condor_starter.V6.1/singularity.cpp
to add the '--nv' flag. It seems that is not difficult to append the
flag always, but I do not know how to make it only when the job requests
any gpu

        sing_args.AppendArg(sing_exec_str.c_str());
        sing_args.AppendArg("exec");
add =>  sing_args.AppendArg("--nv");

I'm not familiarized with the condor code to do it myself.

Does anybody have any suggestion on configuration or some idea on
how to patch the code to deal with this problem ?

Thanks,

                       Javier


--
-----------------------------------------------------------------------
| Javier Sanchez                        |  Tel: (+34) 963.543.697      |
| IFIC (Instituto de Fisica Corpuscular)|  Fax: (+34) 963.543.742      |
| CSIC - Universidad de Valencia        |E-Mail:                       |
| Parque CientÃfico                     |  Javier.Sanchez@xxxxxxxxxx   |
| c/ CatedrÃtico Josà BeltrÃn, 2        |WWW:                          |
| E-46980 Paterna (Valencia) - SPAIN  |  http://ific.science/~sanchezj |
 -----------------------------------------------------------------------