Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can uninitialized memory come from the outside world? #527

Open
ChayimFriedman2 opened this issue Aug 14, 2024 · 4 comments
Open

Can uninitialized memory come from the outside world? #527

ChayimFriedman2 opened this issue Aug 14, 2024 · 4 comments

Comments

@ChayimFriedman2
Copy link

ChayimFriedman2 commented Aug 14, 2024

For example, suppose we have the following C function:

int* get_uninit() {
    return malloc(sizeof(int));
}

Which we call from Rust:

extern "C" {
    fn get_uninit() -> *mut c_int;
}

let v = *get_uninit();

Is this code UB? We don't initialize the value, but it comes from C, not Rust.

It's pretty clear for me that this needs to be UB, since (I believe) LLVM will optimize that with LTO. But then, what about cases where LLVM will not optimize? For example, what about assembly?

get_uninit:
    mov rax, rsp

We don't initialize the value of [rsp], but LLVM has no way to know that: is it UB?

Furthermore, if it is UB, then we have to define what is considered "initialization": if we are sure we called a function that used the stack space of [rsp], does that mean it is initialized? And what if assembly code wrote into it?

After all (assuming the memory is allocated to the process, so no page faults), at the machine level there is no concept of uninitialized memory. So this brings the question, what happens when the machine and the Rust AM intersect?

Inspired by a question on Reddit.

@RalfJung
Copy link
Member

So this brings the question, what happens when the machine and the Rust AM intersect?

This is really the core of your question. Or rather, it's slightly worse: this is the C AM and the Rust AM interacting. The answer it "it's complicated", and it's been discussed in a bunch of threads here, and ideally some time someone can write a summary that we can just easily point to. :)

Meanwhile, you can consult #421 and #422.

@ChayimFriedman2
Copy link
Author

@RalfJung If I understand those threads correctly, this boils down to "the operations in both abstract machines is lowered to a common abstract machine and executed there, and each step must be representable in both AMs". Which means for C it will be UB because it is uninitialized in the C AM, while in assembly it will be defined behavior since there is no uninitialized memory in the assembly "abstract machine" (i.e. the real machine). Am I correct?

@RalfJung
Copy link
Member

When Rust calls a function that is, in assembly, defined as

get_uninit:
    mov rax, rsp

then that indeed can be "axiomatized" from the Rust side as a function that non-deterministically returns an arbitrary initialized integer. So yeah, that sounds right.

This reasoning only work without cross-lang LTO since with cross-lang LTO, the Abstract Machines are coming together at the LLVM IR level, and at that level, uninitialized memory is still a reality.

@chorman0773
Copy link
Contributor

chorman0773 commented Aug 17, 2024

I would also presume that if you got your hands on an electrically floating register in something used as a parameter/return register, that would count as uninitialized memory to Rust, since the compiler doesn't have to "reset" the register to a defined state, or move it into memory or another register.
Similarily, if asm/C/w/e mmapped a region of memory, then madvise(MADV_FREE)d the memory, yielding a pointer to it to rust would have a Rust allocation that contains a bunch of Uninit bytes.

Although I'd be curious what happens if you return NaT from a function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants