Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capabilities, priority, and oom-score-adjustment support on Linux #404

Merged
merged 2 commits into from
Nov 18, 2024

Conversation

q66
Copy link
Contributor

@q66 q66 commented Oct 19, 2024

This is not properly tested yet and does not come with documentation. I mostly want to get an opinion on it for now.

This adds 3 things; when dinit is compiled without support, all of them will raise errors:

  1. The capabilities service field. This is an IAB string; it's parsed with cap_iab_from_string so it can take any value that can parse, e.g. ^cap_sys_time,^cap_net_admin for ambient caps. It supports the += operator, which will append to it, delimiting with a comma.
  2. The secure-bits service field, which implements securebits flags as a companion to the above.
  3. The new option no-new-privs which will use prctl to prevent gaining new privileges across execve (e.g. suid).

For now I want to get an opinion on the overall implementation. One thing I am not sure about for example is the IAB string parsing; I currently do it in load-service.cc because it must be done after all the concats have been completed. I find it to be a bit out of place there, however, so maybe there is a better way. Or if you have any other clues about anything, etc...

Fixes #398

@mobin-2008 mobin-2008 added Enhancement/New Feature Improving things or introduce new feature P-Linux Things related to Linux A-Importance: Normal C-dinit-service Things about dinit-service meson Things about Dinit's Meson build system make Things about Dinit's Make build system labels Oct 19, 2024
@q66
Copy link
Contributor Author

q66 commented Oct 19, 2024

I also wonder if i should add functionality for setting scheduling and i/o priorities while at it, and perhaps oom-killer score adjustment (they are useful to have especially for services that change uid, as any wrapper will already run with decreased privileges)? i could also do it later, but it's simple enough to do...

i guess all 3 of these should also be specific to linux at this point, because the implementation will differ and so will the range? should these also be behind an option like cgroups? or just #ifdef __linux__?

@q66
Copy link
Contributor Author

q66 commented Oct 20, 2024

this is what the setting of the 3 values would look like without the parsing (the input values for nice would go integer -20 to 19, for ionice it would probably be good to parse it properly which would mean none, rt:0 to rt:7, be:0 to be:7 and idle, for oom adjustment it should take -1000 to 1000), and guards:

diff --git a/src/includes/proc-service.h b/src/includes/proc-service.h
index bb7ed42..8d622cd 100644
--- a/src/includes/proc-service.h
+++ b/src/includes/proc-service.h
@@ -43,6 +43,9 @@ struct run_proc_params
     bool on_console;          // whether to run on console
     bool in_foreground;       // if on console: whether to run in foreground
     bool unmask_sigint = false; // if in foreground: whether to unmask SIGINT
+    int nice = INT_MIN;
+    int ionice = INT_MIN;
+    int oom_adj = INT_MIN;
     int wpipefd;              // pipe to which error status will be sent (if error occurs)
     int csfd;                 // control socket fd (or -1); may be moved
     int socket_fd;            // pre-opened socket fd (or -1); may be moved
diff --git a/src/run-child-proc.cc b/src/run-child-proc.cc
index b206b6d..ef9392b 100644
--- a/src/run-child-proc.cc
+++ b/src/run-child-proc.cc
@@ -65,6 +65,9 @@ void base_process_service::run_child_proc(run_proc_params params) noexcept
     const char *working_dir = params.working_dir;
     const char *logfile = params.logfile;
     bool on_console = params.on_console;
+    int nice = params.nice;
+    int ionice = params.ionice;
+    int oom_adj = params.oom_adj;
     int wpipefd = params.wpipefd;
     int csfd = params.csfd;
     int notify_fd = params.notify_fd;
@@ -301,6 +304,38 @@ void base_process_service::run_child_proc(run_proc_params params) noexcept
         if (setrlimit(limit.resource_id, &setlimits) != 0) goto failure_out;
     }
 
+    // nice
+    if (nice != INT_MIN) {
+        if (setpriority(PRIO_PROCESS, getpid(), nice) != 0) goto failure_out;
+        // we usually create a new session leader; that makes nice not very
+        // useful as the Linux kernel will autogroup processes by session id
+        // except when disabled - so also work around this where enabled
+        // the r+ is used in order to avoid creating it where already disabled
+        errno = 0;
+        FILE *ag = std::fopen("/proc/self/autogroup", "r+");
+        if (ag) {
+            std::fprintf(ag, "%d\n", nice);
+            std::fclose(ag);
+        }
+        else if (errno != ENOENT) goto failure_out;
+    }
+
+    // ionice
+    if (ionice != INT_MIN) {
+        #ifdef __NR_ioprio_set
+        if (syscall(__NR_ioprio_set, 1, (int)getpid(), ionice) != 0) goto failure_out;
+        #endif
+    }
+
+    // oom score adjustment
+    if (oom_adj != INT_MIN) {
+        errno = 0;
+        FILE *adj = std::fopen("/proc/self/oom_score_adj", "w");
+        if (!adj) goto failure_out;
+        std::fprintf(adj, "%d\n", oom_adj);
+        std::fclose(adj);
+    }
+
     #if SUPPORT_CGROUPS
     if (params.run_in_cgroup != nullptr && *params.run_in_cgroup != 0) {
         err.stage = exec_stage::ENTER_CGROUP;

@q66 q66 force-pushed the capabilities branch 3 times, most recently from 3679b5c to 16ab75c Compare October 20, 2024 15:27
@q66 q66 changed the title [RFC] Add capabilities support on Linux [RFC] Add capabilities, priority, and oom-score-adjustment support on Linux Oct 20, 2024
@q66
Copy link
Contributor Author

q66 commented Oct 20, 2024

I implemented everything. What's left now is basically

  1. documentation
  2. testing
  3. deciding how to guard the features; whether we should have a separate feature for each mainly
  4. the cap parsing thing
  5. maybe something else? you tell me

@davmac314
Copy link
Owner

No strong objections/opinions from me, though as usual good documentation will be important.

nice/io_nice/oom_adj are probably unnecessary when cgroups are used properly. If they were to be supported, documentation must clearly state the requirements, i.e. /proc must be mounted for oom_adj.

One thing I am not sure about for example is the IAB string parsing; I currently do it in load-service.cc

I think that's fine tbh.

It might be worth gating Linux-specific features behind something other than __linux__, so you can use it in documentation too (and so those that prefer to use cgroups could just disable these features).

Setting for nice should be possible on other OSes, setpriority and nice are both part of POSIX (and I think you can use nice rather than setpriority unless I've missed something). The autogroups hack is unfortunate, I guess there's no way around it though.

@q66
Copy link
Contributor Author

q66 commented Oct 21, 2024

well, nice will add to the current level, while setpriority actually sets it, so there is a difference - both are posix though...

the main non-portability for that is range checking, as the acceptable range of values may vary depending on systems

maybe i could

  1. implement nice unconditionally - not sure about the range checking part though? setpriority will clamp the input, so if you set it too high or too low, it will use the nearest supported value; but either way, do the autogroups hack only for linux
  2. put the ionice behind its own feature
  3. put the oom-score-adj behind its own feature

@davmac314
Copy link
Owner

well, nice will add to the current level, while setpriority actually sets it, so there is a difference

Ah right, yeah.

the main non-portability for that is range checking, as the acceptable range of values may vary depending on systems

True, but I don't think we need to be overly fussed about checking it ourselves. Perhaps add a flag to indicate whether the value is set rather than relying on INT_MIN as a sentinel value.

maybe i could [...]

Yep, that sounds reasonable to me.

@q66
Copy link
Contributor Author

q66 commented Oct 21, 2024

ok, i think i should have a cleaned up PR sometime this evening...

@q66
Copy link
Contributor Author

q66 commented Oct 21, 2024

I added documentation, feature flags, and improved the resource management for the capabilities IAB object, which should be a lot cleaner now.

It should be reviewable now, though I still need to test it a bunch more.

@q66 q66 changed the title [RFC] Add capabilities, priority, and oom-score-adjustment support on Linux Add capabilities, priority, and oom-score-adjustment support on Linux Oct 21, 2024
@davmac314
Copy link
Owner

Will review this weekend

Copy link
Owner

@davmac314 davmac314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I actually have a few minor things that I think would be good to change, as noted. Thanks for the PR.

src/includes/load-service.h Outdated Show resolved Hide resolved
src/includes/load-service.h Outdated Show resolved Hide resolved
src/run-child-proc.cc Outdated Show resolved Hide resolved
src/run-child-proc.cc Outdated Show resolved Hide resolved
src/tests/Makefile Outdated Show resolved Hide resolved
configs/mconfig.Linux Outdated Show resolved Hide resolved
src/Makefile Outdated Show resolved Hide resolved
configure Show resolved Hide resolved
src/includes/proc-service.h Show resolved Hide resolved
src/includes/proc-service.h Outdated Show resolved Hide resolved
meson.build Outdated Show resolved Hide resolved
@q66 q66 force-pushed the capabilities branch 4 times, most recently from b5009d7 to 91f0d98 Compare October 29, 2024 23:21
@q66 q66 force-pushed the capabilities branch 6 times, most recently from bdb249d to 428b79b Compare October 29, 2024 23:47
Copy link
Owner

@davmac314 davmac314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's all good other than the outstanding comments.

configs/mconfig.Linux Outdated Show resolved Hide resolved
src/load-service.cc Outdated Show resolved Hide resolved
@aanderse
Copy link

aanderse commented Nov 8, 2024

how do i set no-new-privs? simply add to the service file with no =? if so... i think this will be the only service property that doesn't use a =, right?

i think it is worthwhile keeping this as a boolean, requiring no-new-privs = true because it keeps parsing easier using various libraries... what do you think?


ps. this is great! i am super excited to see capabilities option added ❤️

@q66
Copy link
Contributor Author

q66 commented Nov 8, 2024

it goes in options =

@aanderse
Copy link

aanderse commented Nov 8, 2024

oops! ignore me 😅

thanks again

@q66 q66 force-pushed the capabilities branch 2 times, most recently from 0fc9a1d to 93f52ca Compare November 10, 2024 23:18
@q66
Copy link
Contributor Author

q66 commented Nov 10, 2024

I think I addressed all the comments, let me know if there's anything else

Copy link
Owner

@davmac314 davmac314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one tiny format specifier that I think is incorrect, otherwise, looks good. Sorry for taking a little while to get to it.

if (fd < 0) goto failure_out;
// +4: round up, minus sign, newline, nul terminator
char val_str[std::numeric_limits<int>::digits10 + 4];
int num_chars = snprintf(val_str, sizeof(val_str), "%hd\n", oom_adj);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%hd should be just %d, I believe (the value is a plain int, not a short int).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i changed it to short as that's what it is in other places, i guess i accidentally made it an int there

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. In that case line 354 should have std::numeric_limits<short> instead of std::numeric_limits<int>.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not that it really matters much, just for consistency).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@davmac314 davmac314 merged commit 29c189a into davmac314:master Nov 18, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Importance: Normal C-dinit-service Things about dinit-service Enhancement/New Feature Improving things or introduce new feature make Things about Dinit's Make build system meson Things about Dinit's Meson build system P-Linux Things related to Linux
Projects
None yet
Development

Successfully merging this pull request may close these issues.

capabilities support on linux
4 participants