[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] job does not run
- Date: Sun, 30 Jun 2019 06:38:50 +0200
- From: Valerio Bellizzomi <valerio@xxxxxxxxxx>
- Subject: Re: [HTCondor-users] job does not run
On Wed, 2019-06-19 at 07:59 +0200, Valerio Bellizzomi wrote:
> On Tue, 2019-06-18 at 13:46 -0500, Greg Thain wrote:
> > On 6/18/19 1:38 PM, Todd Tannenbaum wrote:
> > >
> > > If you really wanted, I you could explicitly work around this by giving
> > > sudo access to whatever user accounts (uids) are being used to run jobs
> > > on your machines. Then your job could use sudo to perform actions with
> > > root access. With sudo, you could limit what actions jobs could perform
> > > as root and also have audit logs available.
> > Frequently, GPU devices are configured by sites to be writable by
> > members of some certain Unix group, and either the slot user or the
> > run-as-owner user are added to that group.
> > -greg
> I have added the users to the video group, still condor_startd display
> this error:
> ocl.getPlatformIDs returned error=-1001 and 0 platforms
> However, condor_gpu_discovery is working well and detects both GPUs
> (OCL0, OCL1).
update: I had to be operational in short time, as htcondor does not
allow jobs to run as root that was a blocking issue that had to be
resolved. I have switched to a Slurm based cluster, because Slurm is
configurable to enable/disable root jobs.