Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use private cgroup namespaces for cgroup v2 #63

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

twz123
Copy link
Member

@twz123 twz123 commented Dec 18, 2023

Using the host's cgroup namespace along with a writable mount of the entire cgroup fs messes with container isolation quite a bit. The main purpose of this is to get a writable mount of the cgroup fs inside containers, so that init systems are able to set up their own cgroups accordingly.

Use a different approach to achieve the same effect: Use a private cgroup namespace. Privileged containers will automatically have write access. A read-write mount is only performed when running non-privileged containers.

Comment on lines +313 to +337
// FIXME: How to clean this up? Especially when Docker is being run
// on a different machine?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only remaining concern with this approach. Generally, it improves the isolation of bootloose machines a lot, but all the cgroups created inside those machines won't be cleaned up. They would, if we could somehow leverage the Docker-managed cgroups, but, due to the chicken-and-egg problem stated above, we can't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could run bootloose itself in a Docker container to do the cleanup? This would be a pretty heavy cleanup procedure, but it is the only way I can think of to tackle this.

Using the host's cgroup namespace along with a writable mount of the
entire cgroup fs messes with container isolation quite a bit. The main
purpose of this is to get a writable mount of the cgroup fs inside
containers, so that init systems are able to set up their own cgroups
accordingly.

Use a different approach to achieve the same effect: Use a private
cgroup namespace. Privileged containers will automatically have write
access. A read-write mount is only performed when running non-privileged
containers.

Signed-off-by: Tom Wieczorek <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant