[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jobs that require root permissions



Hi Brian,

I think I am getting a clear picture now -- it seems there are two (potentially unrelated) issues:

On Tue, Mar 19, 2013 at 1:50 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:

On Mar 19, 2013, at 4:47 AM, Michael Hanke <michael.hanke@xxxxxxxxx> wrote:

On Mon, Mar 18, 2013 at 9:38 PM, Brian Bockelman <bbockelm@xxxxxxxxxxx> wrote:
Hi Michael,

I suspect we are chasing an incorrect lead with respect to the job suspension; the fakeroot is being leaked to the mount namespace, not the HTCondor one (so the bug I thought of does not apply here).

However, if you add:

MOUNT_UNDER_SCRATCH=/tmp

it should make those warning/error messages go away.

Could you tell a little more on why bind-mounting /tmp will disable the warnings? From the documentation it is not obvious to me.

Sorry, I'm too terse sometimes - 

Condor is complaining about sandbox cleaning (I think) because it is finding files owned by root in the job sandbox (there are assumptions littered throughout the code, especially sandbox cleanup, that there is only one UID for files in a sandbox; we hit similar issues when using glexec).

It sounds like the root-owned files are all from filesystems which are remounted / bind-mounted into the sandbox by pbuilder (/proc, /dev/pts).  By enabling MOUNT_UNDER_SCRATCH, HTCondor will put the job in a separate "mount namespace" that makes mounts in the job invisible to the rest of the system; this is required to give the job a private /tmp, but the private /tmp is a side-effect in this case.

Hence, /proc and /dev/pts would be invisible to the condor_starter and wouldn't be cleaned up.

This makes sense -- I'll test that out. 

What are your SUSPEND-related attributes set to on that worker node?


% condor_config_val -dump |grep -i suspend
MAXSUSPENDTIME = 10 * $(MINUTE)
SUSPEND = $(UWCS_SUSPEND)
TESTINGMODE_SUSPEND = False
TESTINGMODE_WANT_SUSPEND = False
UWCS_PREEMPT = ( ((Activity == "Suspended") && ($(ActivityTimer) > $(MaxSuspendTime))) || (SUSPEND && (WANT_SUSPEND == False)) )
UWCS_SUSPEND = ( $(KeyboardBusy) || ( (CpuBusyTime > 2 * $(MINUTE)) && $(ActivationTimer) > 90 ) )
UWCS_WANT_SUSPEND = ( $(SmallJob) || $(KeyboardNotBusy) || $(IsVanilla) ) && ( $(SUSPEND) )
VM_SOFT_SUSPEND = True
WANT_SUSPEND = $(UWCS_WANT_SUSPEND)
 
This is a dedicated cluster node -- no keyboard.

Ah - 

What does CpuBusyTime look like?  If there's enough system activity (or if the root-owned processes are not being tracked by the procd and counting as system activity), then the SUSPEND _expression_ could trigger.

Ah right, it makes perfect sense! I'd actually consider A Good Thing (TM). It seems to happen when multiple of these jobs get scheduled. There is nothing preventing eany of these jobs from utilizing all system resources, hence the suspension. I'll check whether the combination of these two conditions actually causes a problem, or whether the bind-mount workaround/fix addresses both.


In any case: Thanks a lot -- much clearer now!

Michael