Skip to content

Translation of execve()

imlk edited this page Aug 23, 2021 · 2 revisions

Translation of execve()

To load and run an executable, a process typically uses the ``execve()` system call.

int execve(const char *pathname, char *const argv[], char *const envp[]);

The meaning of the arguments is as follows.

  • pathname: The path of file that is going to be executed.
  • argv[]: an array of pointers to strings passed to the new program as its command-line arguments.
    • argv[0]being the program name is a custom.
  • envp[]: an array of pointers to strings, conventionally of the form key=value.

The kernel is responsible for parsing and loading the target executable and its dynamic linker/loader (if exists).

Unlike other system calls, in addition to path translation, proot-rs is required to do the loading of the executable instead of the kernel. This is because we need to consider the following scenarios.

  • If the executable being executed is an ELF file and contains a segment named PT_INTERP, then this means that it is a dynamically linked executable. kernel, when loading this program, will also load the dynamic linker/loader program pointed to by the path specified by PT_INTERP. This process takes place in the kernel, which means that the path to dynamic linker/loader is not translated. Refer to LWN - How programs get run: ELF binaries
  • If the executable file executed is a script file and starts with a #!interpreter [optional-arg] (shebang), then the kernel replaces the command line with interpreter [optional-arg] script-file arg when loading the program ... . This process also happens in the kernel, which means that the path to interpreter is not translated. Refer to man page execve(2)

To avoid missing path translations, proot-rs implements a loading process similar to the one in the kernel, and implements a custom loader-shim to load the required files. Currently, both ELF and shebang executables are supported for loading.

Here's a diagram that describes the translation of execve().

syscall-enter-stop of execve()

When tracee executes the execve() system call. proot-rs will enter this phase. In this phase, the kernel has not really executed the logic of execve() yet, and we can modify the parameters of execve() at this time.

It will first try to load the target executable based on the command line arguments provided by execve(). We may update the command line in the process and then trigger a retry. Of course the number of such retries will be limited, otherwise an ELOOP error will be returned.

Our goal is to generate LoadInfo, which will be used in syscall-exit-stop.

The most important point is that we need to replace the first argument of execve() with the path to loader-shim. This makes it possible to execute the code we control first after execve() returns.

syscall-exit-stop of execve()

In this phase, loader-shim has been loaded into the tracee process, and we need to generate the load script based on the LoadInfo generated in the previous phase, and write it to tracee's memory space. load script, which is a collection of LoadStatements, is defined in loader-shim/src/script.rs. loader-shim loads the real dynamic linker/loader and the ELF file that needs to be executed according to these statements. After this is done, loader-shim jump to the program entry point. This is the end of its mission.

Some Useful Links