[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Unprivileged cgroups v2 & delegation



Maksim:


Thanks for the reply.  I suspect what was happening in my previous attempts was that not all three places had ownership.  And for others following along, for HTCondor, we care about this on both sides:  One, when HTCondor is the local batch system with rootly privilege, as it is in your case, we want to have the rootly condor grant the ability for the batch job to then subdivide the given resources further, to enforce and measure them.  But also, in the glidein case, where HTCondor is running without privilege, under some other batch system, we'd like HTCondor to be able to take advantage of cgroup delegation and setup cgroups for each of the glidein slots.


-greg



For the glidein world, our goal is to have the base batch system (HTCondor in this case), which has rootly privilege

On 9/27/23 9:32 AM, Maksim Melnik Storetvedt wrote:
Hi Greg, Jeff,

Is see that there are temporary safeguards in place to disable cgroups v2 if there is no root (by checking can_switch_ids()), but this change should be independent of that.

This would instead be just to allow the executing user in the slot to create new sub-cgroups in case the cgroups tree has already been set up - setting aside the specifics if this was done with/without elevated privileges beforehand.

For this to work, the executing user of the slot must be giving ownership of three things in the fs: the newly created cgroup for the slot (this will allow the user to create new sub-cgroups), the cgroup.procs file (this allows moving processes in the current cgroup to the new sub-cgroups), and lastly the cgroups.subtree_control file (allows delegating controllers, e.g. memory, to the sub-cgroups) (example). The remaining files can remain owned by root.

The benefit of the above approach is that resources can only be subdivided and delegated down. While it is possible for a sub-cgroup further down to request more resources than given, the cgroup above will still enforce its limit, preventing it from gaining these. The only way to request more resources would be by modifying the cgroup that is a level up -- and the top cgroup, outside the one given to the slot, remains (and should be) owned by root.

Best regards,
-Maxim Storetvedt

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Jeff Templon <templon@xxxxxxxxx>
Sent: 27 September 2023 09:09
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Irakli Chakaberia <irakli.chakaberia@xxxxxxx>
Subject: Re: [HTCondor-users] Unprivileged cgroups v2 & delegation
 
Hi,

How does this work?  Giving ownership of the assigned cgroup,  I think that it would be useful (and okay) to allow an unprivileged user to subdivide that assigned cgroup into pieces.  It would definitely NOT be okay to give a user the ability to expand the cgroup past what was assigned (please give me another 32 GB â.)

JT


On 25 Sep 2023, at 20:03, Maksim Melnik Storetvedt <maksim.melnik.storetvedt@xxxxxxx> wrote:

Dear all,

With the arrival of cgroups v2 in recent Linux distributions, there now exists means for having unprivileged cgroups and resource delegation. Is this a feature that could possibly also be added to HTCondor?

HTCondor commonly provides us (ALICE) with the slot where we run our job pilots across the Grid. These pilots have since become highly tasked with managing the resources we have within each slot, so to best utilise the resources given to us. This process has become increasingly challenging, as we often have several user payloads running in parallel in the same slot (as seen by the BQ), and users often requesting arbitrary resources (cpu and memory in particular).

However, Cgroups v2 provides means for unprivileged users to delegate controllers (e.g. for memory). This would enable our pilots to further subdivide the resources given to us within each slot, allowing us to better "box-in" each subjob -- a very useful feature in our use-case. The benefit of this approach is that the unprivileged user is not able to further request/delegate more resource than what was originally given to the slot, but only subpartition those existing resources

For this to work though, the unprivileged user must first be given ownership of the new cgroup given to them by Condor, as well as the subtree_controller/procs files within that cgroup. Is there a chance this could be provided by condor? As an example, adding the following lines (diff) enables us to use this feature within recent versions.

Best regards,
-Maxim Storetvedt
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/