Propose a PF/AF optional subset for long-vector implmentation #921

sequencer · 2023-10-01T17:58:02Z

For long vector machine, e.g. 4K~16K or more VLEN, it always implements the Cray-like architecture.

However, for architecture like this:

they are always chaining;
for each load store, memory access operations are split into multiple uops and send multiple transactions into memory subsystem;
Thus it's almost impossible(or the overhead is too high) for them to implement precise exception, or handle PF/AF exceptions.

For example, in the uarch of chipsalliance/t1, our MMU might be:

add AGU in each lanes, which contains a L1TLB, while provide a L2TLB in Sequencer;
trigger PF as early as possible, then stall vector pipeline(it's possible for them to flush);
except the scalar core to handle the PF, after PF is cleared, Vector will continue this vector instruction.

This flow is non-standard, however I still think this issue need to be raised for the RISC-V community to think about how to handle such issues.

aswaterman · 2023-10-02T23:24:55Z

I disagree with the premise. It implies that address translation and permission checks are tightly coupled to memory access, which is an artificial restriction. It's straightforward to decouple these operations for unit-stride and strided accesses. (The story for indexed accesses is substantially more complicated.)

With that said, the spec already foresees the possibility you mention: https://github.com/riscv/riscv-v-spec/blob/fc76ec73fc1dd4531360ee8f3138f79f02e8b1b0/v-spec.adoc#174-swappable-traps

But this won't be compatible with the V standard extension, won't be compatible with standard OSes, etc.

sequencer · 2023-10-03T03:19:05Z

I’ll think about how to save/restore micro architectures status for t1 these days.

sequencer · 2023-10-07T13:12:21Z

Thought about it these days, do you think proposing an pause and play instruction to stall and unstall the vector unit is reasonable solution?
Rather than save entire outstanding uArch state into memory, I think save uArch state in-place is a possible solution. This will also align to swappable-traps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propose a PF/AF optional subset for long-vector implmentation #921

Propose a PF/AF optional subset for long-vector implmentation #921

sequencer commented Oct 1, 2023

aswaterman commented Oct 2, 2023 •

edited

Loading

sequencer commented Oct 3, 2023

sequencer commented Oct 7, 2023

Propose a PF/AF optional subset for long-vector implmentation #921

Propose a PF/AF optional subset for long-vector implmentation #921

Comments

sequencer commented Oct 1, 2023

aswaterman commented Oct 2, 2023 • edited Loading

sequencer commented Oct 3, 2023

sequencer commented Oct 7, 2023

aswaterman commented Oct 2, 2023 •

edited

Loading