[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] check for cpu instruction?
- Date: Thu, 17 May 2018 08:30:02 -0400
- From: Michael Di Domenico <mdidomenico4@xxxxxxxxx>
- Subject: Re: [HTCondor-users] check for cpu instruction?
On Wed, May 16, 2018 at 5:33 PM, Greg Thain <gthain@xxxxxxxxxxx> wrote:
> There is no way for condor to do this. To do this completely would require
> solving the halting problem, which is beyond the scope of our research.
i don't understand "the halting problem". but i agree this is a one
off and probably not high on anyone's (even my) list
> In practice, though, there are some approaches that may help. The program
> that tries to execute an instruction which doesn't exist should get killed
> with SIGILL (Illegal instruction). If this program is the top-level process
> in your job (i.e. there is no wrapper script), Condor will at least see that
> the program got a SIGILL, and you can administratively do something about
> that. (Put the job on hold, retry on a different machine model number etc.)
we do run our jobs through a condor job wrapper. i'm curious how you
would retry the job on an alternate machine model num though. is
there some chunk of classad code that the user would have to put in
their submit or is there something i can cram in the main config?
this would presumably work and be the shortest path for me, i don't
care that the job restarts a few times.