[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Translating GPU device assignments?



A little bit of follow-up as I worked on this over the long weekend.

[Michael Pelletier] 
So it turns out that the CUDA_VISIBLE_DEVICES=2,3 environment variable prompts the CUDA library to renumber the GPU ordinals for those devices to 0,1.

Thus in order to get the correct ordinals, you can't just use CUDA_VISIBLE_DEVICES or GPU_DEVICE_ORDINAL.

So it seems that the GPU_DEVICE_ORDINAL variable is being set incorrectly - when used in combination with CUDA_VISIBLE_DEVICES, it should be set to 0 through however many GPUs are requested.

I've worked around via:


GPU_ORDINAL = $CHOICE(REQGPU_INT, "error", "0", "0,1", "0,1,2", \
    "0,1,2,3", "0,1,2,3,4", "0,1,2,3,4,5", "0,1,2,3,4,5,6", \
    "0,1,2,3,4,5,6,7", "0,1,2,3,4,5,6,7,8", "too_many_gpus_requested")

And as I mentioned before, it'd be great to have this as a job attribute as well.

	-Michael Pelletier.