[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Help on running HTCondor as root



Last log for today (StarterLog.slot1_1) conforms all we have been saying, I would say, and offers some more insight:

10/21/15 15:49:39 (pid:2776) Starting a VANILLA universe job with ID: 19167.0
10/21/15 15:49:39 (pid:2776) IWD: /home/glidein_pilot/dir_2776
10/21/15 15:49:39 (pid:2776) Output file: /home/glidein_pilot/dir_2776/_condor_stdout
10/21/15 15:49:39 (pid:2776) Error file: /home/glidein_pilot/dir_2776/_condor_stderr
10/21/15 15:49:39 (pid:2776) Using wrapper /root/condor_job_wrapper.sh to exec /home/glidein_pilot/dir_2776/condor_exec.exe -a sandbox.tar.gz --sourceURL=https://cmsweb.cern.ch/crabcache --jobNumber=12 --cmsswVersion=CMSSW_7_3_5_patch2 --scramArch=slc6_amd64_gcc491 --inputFile=job_input_file_list_12.txt --runAndLumis=job_lumis_12.json --lheInputFiles=False --firstEvent=None --firstLumi=None --lastEvent=None --firstRun=None --seeding=AutomaticSeeding --scriptExe=None --eventsPerLumi=None --scriptArgs=[] -o {}
10/21/15 15:49:39 (pid:2776) Setting job's virtual memory rlimit to 0 megabytes
10/21/15 15:49:39 (pid:2776) Running job as user nobody
10/21/15 15:49:39 (pid:2776) Create_Process(/root/condor_job_wrapper.sh): child failed because PRIV_USER_FINAL process was still root before exec()
10/21/15 15:49:39 (pid:2776) Create_Process(/root/condor_job_wrapper.sh,/home/glidein_pilot/dir_2776/condor_exec.exe -a sandbox.tar.gz --sourceURL=https://cmsweb.cern.ch/crabcache --jobNumber=12 --cmsswVersion=CMSSW_7_3_5_patch2 --scramArch=slc6_amd64_gcc491 --inputFile=job_input_file_list_12.txt --runAndLumis=job_lumis_12.json --lheInputFiles=False --firstEvent=None --firstLumi=None --lastEvent=None --firstRun=None --seeding=AutomaticSeeding --scriptExe=None --eventsPerLumi=None --scriptArgs=[] -o {}, ...) failed: (errno=666666: 'Unknown error 666666')
10/21/15 15:49:39 (pid:2776) Failed to start job, exiting
10/21/15 15:49:39 (pid:2776) ShutdownFast all jobs.
10/21/15 15:49:41 (pid:2776) condor_read() failed: recv(fd=12) returned -1, errno = 104 Connection reset by peer, reading 21 bytes from <128.142.242.250:4080>.

so I guess the "switching to user nobody" is printed before actually trying to do it, and it fails soon after ...

tom

On Wed, Oct 21, 2015 at 3:48 PM, Tommaso Boccali <tommaso.boccali@xxxxxxxxxx> wrote:
ciao, since /usr and /bin are under CVMFS, and not real filesystems, in uCERNVM, also "condor_master" is seen via CVMFS and hence needs to run under parrot ....

tom

On Wed, Oct 21, 2015 at 3:43 PM, Greg Thain <gthain@xxxxxxxxxxx> wrote:
On 10/21/2015 07:22 AM, Tommaso Boccali wrote:

can you point me to the change user that condor tries to do in an occasion like this? which is the user it tries to change to?

Condor is switching user it by calling the system call setuid. In this case it is trying to switch to the condor user.

I still don't understand why condor needs to be started as root in this case. In fact, I don't understand why condor needs to be run under parrot. Isn't it just the payload job that needs to see cvmfs? Condor has a feature, USER_JOB_WRAPPER, that lets you write a shell wrapper that runs "around" each end user job. Would it be easier to start Condor as usual, outside of parrot, and just have the USER_JOB_WRAPPER exec the job under parrot?

-greg



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/