[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Help on running HTCondor as root




Ciao,
I am experimenting an opportunistic workflow for CMS, in which condor starts in a docker container using uCERNVM + Parrot.
basically, the image contains just the kernel, and also /usr, /bin etc are provided via CVMFS via Parrot.
One of the limitations of this environment is thas setuid commands do not work (trapped by Parrot), so eventually you are root and cannot become any other user.

I reached the point where CMS sw stack fully works, and I can start HTCondor and connect to the integration HTCondor Pool. I can also direct CMS analysis jobs there, and they are seen and reported by the local startd.

The problem is that after having moved the resource to Busy, it returns to Idle soo, I had a look at the Collector logs, and the error there is (shadowlog):

10/15/15 12:34:15 (10807.0) (127491): Job 10807.0 going into Hold state (code 6,666666): Error from slot1@084dee5bd71e: Failed to execute '/root/condor_job_wrapper.sh' with arguments /home/glidein
_pilot/dir_9616/condor_exec.exe -a sandbox.tar.gz --sourceURL=https://cmsweb.cern.ch/crabcacheÂ--jobNumber=8 --cmsswVersion=CMSSW_7_3_5_patch2 --scramArch=slc6_amd64_gcc491 --inputFile=job_input_f


The script exists and dows not show any problem.

We tried again with a simple sleep job, in order to rule out CMS specific poblems, and we basically get the same error.

We also saw in some logs error likes:

child failed because PRIV_USER_FINAL process was still root before exec

which seem to point to errors in user setuid (which as said cannot work in this environment)


Is there a way to remain user "root" after a job is accepted? I tried to play a bit with the configurations, but got a bit lost.

I provie Âthe output ofÂcondor_config_val -dump in

https://www.dropbox.com/s/mr0cqqnms6khma2/out_val_condor?dl=0

thanks a lot

tom