[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] job does not run



Hi,
can we find a way to tell condor_startd to start as root and to launch
programs as root ?

Thanks.

On Tue, 2019-06-18 at 07:16 +0200, Valerio Bellizzomi wrote:
> Hi,
> to try to remove the -1001 error I have setup on the execute node:
> 
> SLOT_TYPE_1 = cpus=1, gpus=1
> NUM_SLOTS_TYPE_1 = 2
> 
> and
> 
> SLOT1_USER = root
> SLOT2_USER = root
> 
> but the error is still present.
> 
> 
> On Mon, 2019-06-17 at 20:08 +0200, Valerio Bellizzomi wrote:
> > Hi,
> > apart the other issues I did a test on the execute node, I think the
> > reason for which the job remains idle is due to an error. I have run
> > condor_startd by hand on machine compute02 and got an error:
> > 
> > ocl.getPlatformIDs returned error=-1001 and 0 platforms
> > 
> > That means the OpenCL ICD is not found, but this is anomalous as I can
> > run the job locally on the execute node, opencl is installed correctly.
> > The only reason this can happen is that the process does not have
> > privileges to access the opencl platform, but I am running condor_startd
> > as root.
> > 
> > 
> > 
> > -------- Forwarded Message --------
> > From: Valerio Bellizzomi <valerio@xxxxxxxxxx>
> > Reply-to: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> > To: htcondor-users@xxxxxxxxxxx
> > Subject: Re: [HTCondor-users] job does not run
> > Date: Mon, 17 Jun 2019 17:31:46 +0200
> > 
> > On Mon, 2019-06-17 at 11:52 +0000, Bockelman, Brian wrote:
> > > 
> > > > On Jun 17, 2019, at 2:28 AM, Steffen Grunewald <steffen.grunewald@xxxxxxxxxx> wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > On Sun, 2019-06-16 at 16:10:00 +0200, Valerio Bellizzomi wrote:
> > > >> Greetings,
> > > >> after submitting a job, the job is in idle state. Diagnostics with
> > > >> condor_q -analyze show "no match found".
> > > >> 
> > > >> In the submit file I have:
> > > >> 
> > > >> RANK = (Machine == "compute02")
> > > > 
> > > > Please verify (using e.g. condor_status -l compute02) that the machine
> > > > name is correct (is there no domain part?)
> > > > 
> > > >> 1) is this sufficient to select the target machine ?
> > > > 
> > > > With the correct string, IMHO yes
> > > 
> > > Do note that you used "RANK" and not "REQUIREMENTS" -- the job will show a preference for "compute02" if there are multiple available compute hosts.  However, it will still be allowed to run on any host.
> > > 
> > > It might be useful to post the output of "condor_q -better-analyze".  Another thing that could be going wrong is that the Machine attribute is using a FQDN ("compute02.example.com") whereas you are only querying the host ("compute02").
> > 
> > Hi,
> > I have verified that the compute02 node has a problem, that is ps
> > command shows condor_procd running but not condor_startd. Master and
> > Startd are listed in the configuration but condor_startd does not start
> > at first.
> > 
> > Second problem I found and corrected: Schedd was not running on the
> > central manager machine. I was using the DAEMON_LIST generated by the
> > condor_configure --type=manager command and schedd was not in the list.
> > 
> > 
> > 
> > 
> > 
> > 
> > > Brian
> > > 
> > > > 
> > > >> 2) where is the htcondor log file for the job ?
> > > > 
> > > > Did you specify a path in your submit file?
> > > > 
> > > > - S
> > > > _______________________________________________
> > > > HTCondor-users mailing list
> > > > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> > > > subject: Unsubscribe
> > > > You can also unsubscribe by visiting
> > > > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> > > > 
> > > > The archives can be found at:
> > > > https://lists.cs.wisc.edu/archive/htcondor-users/
> > > 
> > > 
> > > _______________________________________________
> > > HTCondor-users mailing list
> > > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> > > subject: Unsubscribe
> > > You can also unsubscribe by visiting
> > > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> > > 
> > > The archives can be found at:
> > > https://lists.cs.wisc.edu/archive/htcondor-users/
> > 
> > 
> > 
> > _______________________________________________
> > HTCondor-users mailing list
> > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> > 
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/htcondor-users/
> > 
> > 
> > _______________________________________________
> > HTCondor-users mailing list
> > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> > 
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/