Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rust-tests: Some RISC-V emulation tests #1988

Draft
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

apparentlymart
Copy link
Contributor

Thanks for creating Unicorn! I've been using it as part of an emulator for the syscall ABI of a toy RISC-V operating system.

I started working on this because I was seeing some strange behavior in my own program (invalid invalid instruction hook not being called) and so wanted to see if I could reproduce it without all of my code in play.

Now there is a happy-path test of successfully executing a single instruction, along with two different ways of hooking an invalid instruction (using either general interrupt hook or invalid instruction hook) and also using general interrupt hook to handle an ecall instruction.

As currently written the emulate_riscv64_invalid_insn_hook test fails, because the hook doesn't seem to run at all and so the emu_start function returns Err(UC_EXCEPTION). I'm not sure whether that's my own bug or if something is not working correctly in the main Unicorn code. 🤔 I've opened this PR as a draft in case someone can identify a mistake I could fix to make the instruction hook test work, in which case I'd propose to add these tests just to help reduce the risk of regressions in the RISC-V platform support.

There were previously no RISC-V emulation tests at all.

Now there is a happy-path test of successfully executing a single
instruction, along with two different ways of hooking an invalid
instruction (using either general interrupt hook or invalid instruction
hook) and also using general interrupt hook to handle an ecall instruction.
@apparentlymart
Copy link
Contributor Author

I've also now added some tests for handling of invalid memory accesses, since I found those weren't working as I expected in my program either. Again, I'm not sure if I'm just making incorrect assumptions about how it's intended to work.

If these tests seem like they are valid then I'd be happy to try to investigate more deeply in the main Unicorn code, but I'd like to first confirm that I understand correctly what the intended behavior is.

Thanks again! 😀

@apparentlymart
Copy link
Contributor Author

Looking again with fresh eyes today I realized I didn't include enough instructions from my real program in the code loaded for the memory hook tests, and so what I had pushed yesterday was not actually exercising a memory access error at all -- it was only preparing the address to load from.

I've pushed a complete version now which better matches the behavior I'm seeing in my real program: emu_start returns Err(EXCEPTION) directly without either calling the memory hook or signalling a memory access fault using the interrupt hook.

@apparentlymart
Copy link
Contributor Author

After reading the relevant code more deeply I've learned that:

  1. The virtual TLB code directly sets the EXCEPTION error when a TLB hook returns no TLB entry, without running any target-specific code, and so there currently isn't any way for that to trigger a RISC-V exception signaled through an interrupt hook.

    static void raise_mmu_exception(CPUState *cs, target_ulong address,
    int rw, uintptr_t retaddr)
    {
    cs->uc->invalid_error = UC_ERR_EXCEPTION;
    cs->uc->invalid_addr = address;
    cpu_exit(cs->uc->cpu);
    cpu_loop_exit_restore(cs, retaddr);
    }

    So the current behavior of the emulate_riscv64_mem_error_interrupt seems to match the intent of the implementation.

    Disabling virtual TLB mode and thus letting the target-specific CPU TLB fill run causes it to fail with Err(READ_UNMAPPED) rather than executing the memory hook, but that makes sense because the emulate_riscv64_mem_error_interrupt test doesn't have any hook for the invalid memory access. (I'd ideally like it to fall back to handling that as a normal CPU exception for my use-case, but I understand that probably isn't the common need for other Unicorn users.)

  2. For the other test using an explicit invalid memory access hook rather than just an interrupt hook, disabling the virtual TLB mode and using CPU TLB mode allowed my hook to run.

    However, I can't seem to cause emu_start to return with Ok(()) in that case:

    • If the hook returns false then emu_start returns Err(READ_UNMAPPED), which makes sense to me because my hook is reporting that it doesn't want to "catch" the error in that case.
    • If the hook calls emu.emu_stop() and then returns true then emu_start returns Err(MAP), whereas I expected it to return Ok(()). However, I suppose Err(MAP) is defensible as it means "nothing is mapped at that address" and so the caller could then map something at that address and call emu_start again.

I'm going to push another commit in a moment that includes the adjustments I described above, because I expect that'll make it easier to understand what I'm talking about. My goal here is still mainly to understand what the expected behavior is so that I can build my program around it.

My use-case would benefit from some way to tell Unicorn to handle all of these fault cases as CPU exceptions passed to a single interrupt hook so that my program can more closely emulate what the real OS would do in its exception handler, but I understand that what I'm doing is unusual so I'm okay with faking that using multiple hooks just as long as I can understand how all of these hooks are intended to work. 😀

A TLB miss in virtual TLB mode causes Err(EXCEPTION) to be returned
immediately, without calling any hooks or giving the target-specific code
a chance to emulate a page fault exception.

Therefore this now tests in the default TLB mode that uses target-specific
TLB building logic.
@apparentlymart
Copy link
Contributor Author

apparentlymart commented Aug 26, 2024

(Sorry for the flood of comments. I'm learning more about Unicorn/QEMU as I investigate and sharing in case it's useful to others in the future.)

I have identified one significant misunderstanding that affects my real program: by default the emulated RISC-V CPU is running in M-Mode (the most privileged mode) which works outside of the scope of the MMU.

That mode-based exception is part of the CPU TLB fill, so it gets skipped when using virtual TLB and so the TLB hook would need to check the privilege mode itself to properly emulate the fact that M-Mode is always identity mapped.


Side-note about that misunderstanding:

I had misunderstood that the CPU was running in U-Mode (the unprivileged user mode) because the ecall instruction causes an exception with the cause set to 8, which is "Environment call from U-Mode", but that's actually Unicorn (accidentally?) exposing an implementation detail of QEMU: it implements ecall by always triggering the U-Mode exception at first, but then fixes that up later:

static bool trans_ecall(DisasContext *ctx, arg_ecall *a)
{
/* always generates U-level ECALL, fixed in do_interrupt handler */
generate_exception(ctx, RISCV_EXCP_U_ECALL);
exit_tb(ctx); /* no chaining */
ctx->base.is_jmp = DISAS_NORETURN;
return true;
}

/* ecall is dispatched as one cause so translate based on mode */
if (cause == RISCV_EXCP_U_ECALL) {
assert(env->priv <= 3);
if (env->priv == PRV_M) {
cause = RISCV_EXCP_M_ECALL;
} else if (env->priv == PRV_S && riscv_cpu_virt_enabled(env)) {
cause = RISCV_EXCP_VS_ECALL;
} else if (env->priv == PRV_S && !riscv_cpu_virt_enabled(env)) {
cause = RISCV_EXCP_S_ECALL;
} else if (env->priv == PRV_U) {
cause = RISCV_EXCP_U_ECALL;
}
}

Unfortunately the Unicorn interrupt hooks are called in cpu_handle_exception rather than cpu_handle_interrupt. The former is called before the latter, and it's the latter that chooses the correct exception cause based on the privilege mode executing ecall, so the interrupt hook would need to check the current privilege mode itself to distinguish the three different ecall exception cause numbers.


As far as I can tell Unicorn does not expose any way to directly read or write the current privilege mode. That makes sense because that information is intentionally not exposed directly in any register in the ISA to allow for software-only virtualization where code designed for supervisor mode is actually running in usermode and then the supervisor emulates instructions that would normally require supervisor mode.

However, that makes it challenging (impossible?) to properly emulate various RISC-V behaviors that are defined to vary based on privilege mode, because there's no way for the caller to discover the current privilege mode.

I've made a local modification to Unicorn to expose a pseudo-register called PRIV which reflects the emulated CPU's current privilege level, and used that for a proof-of-concept that seems to allow me to implement everything I need. Therefore I'm going to try to clean that up and submit it as a separate PR to see if it seems like a reasonable way to fill the gaps I've encountered without significantly changing Unicorn's design.

@wtdcode
Copy link
Member

wtdcode commented Aug 27, 2024

As far as I can tell Unicorn does not expose any way to directly read or write the current privilege mode.

This is interesting. I will have a check too.

@apparentlymart
Copy link
Contributor Author

I have been working today on a PR to expose a "pseudo-register" that can be used to read or set the privilege mode. That design mimics part of the RISC-V Debug specification which recommends that a debugger expose the privilege information from one of the debug CSRs in a "virtual register" called priv. I think this makes sense because the Unicorn API has a similar relationship to the emulated hart as a debugger would have to a real hart.

I have implemented it and written a unit test but I ran out of time today before I had updated all of the bindings to support the new constant. I'm hoping to finish it tomorrow and then I'll open a separate PR for that. I'm not sure yet how that will interact with this PR, but I have at least proven that I can handle invalid instruction and page fault exceptions using an interrupt hook in my real program, without the more specialized hook types, once I got the MMU working by changing the emulator to use U-mode.

@wtdcode
Copy link
Member

wtdcode commented Oct 30, 2024

Again, apologize for being late for your PRs because I'm not a RISC-V expert.

The virtual TLB code directly sets the EXCEPTION error when a TLB hook returns no TLB entry, without running any target-specific code, and so there currently isn't any way for that to trigger a RISC-V exception signaled through an interrupt hook.

Since you are doing full system emulation, you should not use VTLB mode.

If the hook calls emu.emu_stop() and then returns true then emu_start returns Err(MAP), whereas I expected it to return Ok(()). However, I suppose Err(MAP) is defensible as it means "nothing is mapped at that address" and so the caller could then map something at that address and call emu_start again

That's expected.

I have identified one significant misunderstanding that affects my real program: by default the emulated RISC-V CPU is running in M-Mode (the most privileged mode) which works outside of the scope of the MMU.

This is by design because we want the simple 1:1 mapping, i.e. skipping MMU as you discovered. The short reason for this design is that Unicorn was not really designed as a full system emulator, though I tried to bring back full system emulation capabilities in Unicorn 2 and recent releases.

Unfortunately the Unicorn interrupt hooks are called in cpu_handle_exception rather than cpu_handle_interrupt. The former is called before the latter, and it's the latter that chooses the correct exception cause based on the privilege mode executing ecall, so the interrupt hook would need to check the current privilege mode itself to distinguish the three different ecall exception cause numbers

This makes sense. I'm happy to accept a PR for fixing this.

If you can fix CI, I'm happy to see this PR getting into dev branch. We lack unit tests for the uncommon architectures.

@apparentlymart
Copy link
Contributor Author

Thank you for the feedback!

If I recall correctly I wrote this PR before I worked on some others, and at the time I wrote this I was still a little confused about exactly what privilege mode Unicorn was running in and thus what behavior I should expect. The other PRs I submitted came later in my learning when I had (hopefully) understood the system better.

I will try to find some time to finish this and hopefully make it suitable to merge. I will also consider how it might be possible to fix the incorrect exception code for ecall from M-mode or S-mode, and if I find a good solution I'll open a separate pull request for that.

Unfortunately my attention has moved elsewhere now so I can't promise to work on this very soon, but I'll try to find some time. Thanks again!

@wtdcode
Copy link
Member

wtdcode commented Oct 30, 2024

I will also consider how it might be possible to fix the incorrect exception code for ecall from M-mode or S-mode, and if I find a good solution I'll open a separate pull request for that.

Instead of "raising the correct mode", I assume a more proper and easy fix is to pass the correct exception code in unicorn hooks (in the reverse direction of qemu implementation details).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants