[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] HTCondor resources for an application which manages its own CPU topology



There's a sim application I'm working on integrating into HTCondor, and it's got an interesting way of handling its command line options with respect to its CPU utilization.

You tell it the "number of cpu devices" and the "number of cores," but that's not quite exactly what you're telling it. You're actually specifying the maximum number of *sockets*, and the maximum number of compute threads overall across all sockets.

So if you tell it to use two devices and 96 cores, while using a four-socket machine equipped with four 12-core, 24-hyperthread processors, it will use only 24 compute threads total - twelve physical cores on two out of the four sockets.

If you do the same on a two-socket 16-core, 32-hyperthread machine, it will run 16 compute threads, or 100% of the physical cores of the system.

What's more, it sets its own CPU affinity on its compute threads based on NUMA domains. I haven't figured out yet what would happen if I turned on HTCondor's affinity enforcement, wish me luck on that.

As a result, the "number of threads" value on the command line isn't really relevant to the request_cpus value in the submit description.

So I think I need to come up with a set of machine benchmark attributes and a request_cpus expression that will take the number of sockets, number of cores, and hyperthreading into account in order to insure that the RequestCpus attribute of a submitted job actually matches what the application will decide to, and that both a minimum and maximum number of compute threads can be specified by the user. And also something to put in the history to record how many CPU cores were actually claimed by the job.

Needless to say, Brian Bockelman's dynamic CPU tricks from last year's HTCondor Week sprung immediately to mind. And also, the DETECTED_PHYSICAL_CPUS value in the condor_config_val -dump would be a useful piece of information to have in the machine ClassAd.

So as part of this little undertaking, I'm wondering if anyone has any scraps of code they'd be willing to share that would allow the startd to advertise the number of processor sockets on a Windows machine? I've got Linux pretty well in hand with /proc/cpuinfo and the "physical id" value, but I'm a bit of a neophyte when it comes to that level of Windows detail.

Thanks for any suggestions!

	-Michael Pelletier.