[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] check for cpu instruction?



Hi Mike,

What you need is already there. What you're looking for is the "TARGET.has_ssse3" requirements Boolean value. Just add it to the job's requirements expression, and the job will only run on machines which have that instruction set:

Requirements = $(requirements:True) && (TARGET.has_ssse3)

There's a variety of other instruction set attributes in the machine ClassAd, though not all potentially useful processor attributes are covered. You can find the current list here:

http://research.cs.wisc.edu/htcondor/manual/v8.6/12_Appendix_A.html#106106

We tend to use has_avx2 since we only have Intel processors and both AVX and AVX2 machines in the pools. One sim job runs about four times faster on AVX2 than on AVX machines, as in 5 hours vs 20 hours, so there's no point in running it on AVX. The AVX 512 attributes are not yet advertised by default, but you can set up a startd_cron benchmark job to look at the flags line of /proc/cpuinfo to pull out anything you might need.

As for auto-detecting, I think the suggestion that I received a couple of weeks ago on automatically looking up user e-mail addresses and adding them to the submit could potentially work for you. In the configuration of HTCondor on the submit host, I added:

IsSubmit = False
SUBMIT.IsSubmit = True
if $(IsSubmit)
    include command : $(SITE_LIBEXEC)/username_email -submit $ENV(LOGNAME) 2>/dev/null
    if defined Notify_User
        SUBMIT_ATTRS = $(SUBMIT_ATTRS) NotifyUser JobNotification
        NotifyUser = "$(Notify_User)"
        JobNotification = ifThenElse( (isUndefined(InteractiveJob) || InteractiveJob =!= True) && ProcID + 1 == TotalSubmitProcs, 1, 0)
    endif
endif

The above will run the "username_email" script with the $LOGNAME environment variable as the argument, which produces the following submit description line:

notify_user = michael.v.pelletier@xxxxxxxxxxxx

Then, the SUBMIT_ATTRS line adds the job ClassAd attributes NotifyUser and JobNotification to the submitted job ClassAd, and then defines those two values. Anything explicitly specified in the submit description overrides this setting.

For your case, you would need the "include command" script to examine the "executable" from the submit description to determine if it had any of the applicable instruction sets and then add a requirements line to the submission rather than a notify_user line - include command : check_instr_set $(Executable)

I think one possible pitfall here, however, is if the user specifies their own requirements line without a $(requirements:true) stanza, then the instruction set requirement will be overridden. Maybe some job transform incantation could address that?

Determining whether a given binary needs specific instruction sets is rather tricky business, apparently. There's a good stab at it here using objdump:

https://superuser.com/questions/726395/how-to-check-if-a-binary-requires-sse4-or-avx-on-linux 

Good luck!

	-Michael Pelletier.

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Michael Di Domenico
Sent: Wednesday, May 16, 2018 3:00 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [External] [HTCondor-users] check for cpu instruction?

I came across an unusual job failure today.  I have both AMD and Intel CPU's in my pool, some older, some newer.  A user compiled a program on their desktop with a pretty new intel chip, using the intel compiler.  as expected the compiler looked at the local chip and added in some extra optimizations.  in this case specifically it used SSSE3.
when the use went to submit the program to condor is ran a large number of nodes, but failed on others, specifically the amd chips that don't support ssse3.

is there a way for condor to check whether a cpu has all the instructions an executable might need before it runs?

clearly i could put in a classed for ssse3 true/false, but chances are the users are not going to know what cpu instructions might be required for their program and accordingly will not set a flag in the submit file.  and i certainly don't want to do this for all the possible flags.  but that all seems pretty messy, hopefully someone else has already solved this issue