[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CUDA_VISIBLE_DEVICES not in the environment



What GPUs are getting assigned to the slot?

   condor_status -af Name AssignedGPUs

Does CUDA_VISIABLE_DEVICES get set in the environment when you don't use the job wrapper?

-tj

-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Beyer, Christoph
Sent: Thursday, December 12, 2019 6:58 AM
To: htcondor-users <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] CUDA_VISIBLE_DEVICES not in the environment

Hi,

I am struggling a bit with the parallel usage of GPUs as I mentioned earlier. As a matter of fact part of my problems result from  
CUDA_VISIBLE_DEVICES not being set in the job environment 

I use the gpu-feature which expands as expected to: 

[root@batchg010 condor]# condor_config_val use feature:gpus
use FEATURE:GPUs is
	MACHINE_RESOURCE_INVENTORY_GPUs=$(LIBEXEC)/condor_gpu_discovery -properties $(GPU_DISCOVERY_EXTRA)
	ENVIRONMENT_FOR_AssignedGPUs=GPU_DEVICE_ORDINAL=/(CUDA|OCL)//  CUDA_VISIBLE_DEVICES
	ENVIRONMENT_VALUE_FOR_UnAssignedGPUs=10000

I am running a jobwrapper but also in the jobwrapper environment I do not see a sign of CUDA_VISIBLE_DEVICES being set, same thing in the environment once the job is running. 

Subsequently I get all 4 GPUs in a single gpu-slot: 

/usr/libexec/condor/condor_gpu_discovery
DetectedGPUs="CUDA0, CUDA1, CUDA2, CUDA3"


Is there an additional trick that I missed ? 

This on 

$CondorVersion: 8.9.1 Apr 17 2019 BuildID: 466671 PackageID: 8.9.1-1 $




-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/