[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] GPU monitoring vanished in my pool :(



just a follow-up, condor_gpu_utilization seems to work fine: 

[root@batchg003 ~]# /usr/libexec/condor/condor_gpu_utilization
SlotMergeConstraint = StringListMember( "CUDA0", AssignedGPUs )
UptimeGPUsSeconds = 0.000000
UptimeGPUsMemoryPeakUsage = 11
- GPUsSlot0
SlotMergeConstraint = StringListMember( "CUDA0", AssignedGPUs )
UptimeGPUsSeconds = 0.000000
UptimeGPUsMemoryPeakUsage = 11
- GPUsSlot0


-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

----- UrsprÃngliche Mail -----
Von: "Christoph Beyer" <christoph.beyer@xxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Dienstag, 5. Mai 2020 11:37:57
Betreff: [HTCondor-users] GPU monitoring vanished in my pool :(

Hi,

I do use the to GPU features on my GPU nodes: 

[root@batchg003 ~]# condor_config_val use feature:GPUs
use FEATURE:GPUs is
	MACHINE_RESOURCE_INVENTORY_GPUs=$(LIBEXEC)/condor_gpu_discovery -properties $(GPU_DISCOVERY_EXTRA)
	ENVIRONMENT_FOR_AssignedGPUs=GPU_DEVICE_ORDINAL=/(CUDA|OCL)//  CUDA_VISIBLE_DEVICES
	ENVIRONMENT_VALUE_FOR_UnAssignedGPUs=10000

	use feature : GPUsMonitor
[root@batchg003 ~]# condor_config_val use feature:GPUsMonitor
use FEATURE:GPUsMonitor is
	use feature : Monitor( GPUs, WaitForExit, 1, $(LIBEXEC)/condor_gpu_utilization, SUM:GPUs, PEAK:GPUsMemory )

And in the past I was able for a while to check the results in the memory of a job, like this: 

condor_history 11262904 -af:l GPUsMemoryUsage GPUsProvisioned  GPUsUsage
> GPUsMemoryUsage = 29261.0 GPUsProvisioned = 4 GPUsUsage = 3.688929331491713

(given the job used any GPUs of course) This has vanished from my history unfortunately without any changes been made (at least no changes by intention I might want to say). 

I use 8.9.3  on the gpu nodes and 8.9.1 on the sched but that should not explain it - right ? 

Best
christoph

-- 
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/