[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Translating GPU device assignments?



Thanks for the info! 
I came up with this after sleeping on it:

Executable = /bin/sh
Transfer_executable = False
+Arguments = "-c '/usr/bin/env FLAGS_gpu=$(DOLLAR)GPU_DEVICE_ORDINAL caffe train ...etc...'"

This way everything is contained within the submit description and no external file is needed. Still kind of gnarly, since it will show up in condor_q as /bin/sh instead of caffe, but oh well...

This does appear to work, since when I ssh to the job and check the /proc/<pid>/environ file, the FLAGS_gpu variable shows up in the environment


In addition, I think I'll add a "include command :" to read in the prototxt files for the solver, which would be able to inform the transfer_input_files for the net prototxt, provide some fodder for progress monitoring based on the stepsize and max_iter values, and the solver_mode line would allow me to avoid the GPU environment variable shenanigans unless they're necessary, and use the same submit description for any run.

	-Michael Pelletier.

> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
> Of Francisco Pereira
> Sent: Sunday, July 02, 2017 10:48 PM
> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: Re: [HTCondor-users] Translating GPU device assignments?
> 
> Hi Michael,
> 
> My solution has been to have the job executable be a simple shell script
> wrapper that creates a Caffe command line with the option -gpu (with
> whatever CUDA_VISIBLE_DEVICES is set to). It also works for other
> packages, e.g. those that require starting python.
> 
> cheers,
> Francisco
>