[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] job does not run



On Wed, 2019-06-19 at 07:59 +0200, Valerio Bellizzomi wrote:
> On Tue, 2019-06-18 at 13:46 -0500, Greg Thain wrote:
> > On 6/18/19 1:38 PM, Todd Tannenbaum wrote:
> > >
> > > If you really wanted, I you could explicitly work around this by giving
> > > sudo access to whatever user accounts (uids) are being used to run jobs
> > > on your machines.  Then your job could use sudo to perform actions with
> > > root access.  With sudo, you could limit what actions jobs could perform
> > > as root and also have audit logs available.
> > 
> > 
> > Frequently, GPU devices are configured by sites to be writable by 
> > members of some certain Unix group, and either the slot user or the 
> > run-as-owner user are added to that group.
> > 
> > 
> > -greg
> 
> 
> I have added the users to the video group, still condor_startd display
> this error:
> 
> ocl.getPlatformIDs returned error=-1001 and 0 platforms
> 
> 
> However, condor_gpu_discovery is working well and detects both GPUs
> (OCL0, OCL1).


update: I had to be operational in short time, as htcondor does not
allow jobs to run as root that was a blocking issue that had to be
resolved. I have switched to a Slurm based cluster, because Slurm is
configurable to enable/disable root jobs.