Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore[venom]: expand venom docs #4314

Merged
merged 12 commits into from
Oct 25, 2024
254 changes: 254 additions & 0 deletions vyper/venom/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,3 +160,257 @@ A number of passes that are planned to be implemented, or are implemented for im
### Function inlining

### Load-store elimination

---

## Structure of a venom program

### IRContext
An `IRContext` consists of multiple `IRFunctions`, with one designated as the main entry point of the program.
Additionally, the `IRContext` maintains its own representation of the data segment.

### IRFunction
An `IRFunction` is composed of a name and multiple `IRBasicBlocks`, with one marked as the entry point to the function.

### IRBasicBlock
An `IRBasicBlock` contains a label and a sequence of `IRInstructions`.
Each `IRBasicBlock` has a single entry point and exit point.
The exit point must be one of the following terminator instructions:
- `jmp`
- `djmp`
- `jnz`
- `ret`
- `return`
- `stop`
- `exit`

Normalized basic blocks can not have multiple predecessors and successors. It has either one (or zero) predecessors and potentially multiple successors or vice versa.

### IRInstruction
An `IRInstruction` consists of an opcode, a list of operands, and an optional return value.
An operand can be a label, a variable, or a literal.

## Instructions

### Special instructions

- `invoke`
- Cause control flow to jump to a function denoted by the label.
- Return values are passed in the return buffer at the offset address.
- Practically only used for internal functions.
- Effectively translates to `JUMP` and therefore changes the program counter value.
- ```
invoke offset, label
```
- `alloca`
- Allocates memory of a given size at a given offset in memory.
- The output is the offset itself.
- Because the SSA form does not allow changing values of registers, handling mutable variables can be tricky. The `alloca` instruction is meant to simplify that.
- ```
out = alloca size, offset
```
- `palloca`
- Like the `alloca` instruction but only used for parameters of internal functions.
- ```
out = palloca size, offset
```
- `iload`
- Load value at immutable section of memory denoted by `offset` into `out` variable.
- The operand can be either a literal, which is a statically computed offset, or a variable.
- ```
out = iload offset
```
- `istore`
- The instruction represents a store into immutable section of memory.
- Like in `iload`, the offset operand can be a literal.
- ```
istore offset value
```
- `phi`
- Because in SSA form each variable is assigned just once, it is tricky to handle that variables may be assigned to something different based on which program path was taken.
- Therefore, we use `phi` instructions. They are used in basic blocks where the control flow path merges.
- So essentially the `out` variable is set to `var_a` if the program entered this block from `label_a` or to `var_b` when it went through `label_b`.
- ```
out = phi var_a, label_a, var_b, label_b
```
- `offset`
- Statically compute offset. Useful for `mstore`, `mload` and such.
- Basically `label` + `op`.
- ```
ret = offset label, op
```
- `param`
- The `param` instruction is used to represent function arguments passed by the stack.
- We assume the argument is on the stack and the `param` instruction is used to ensure we represent the argument by the `out` variable.
- ```
out = param
```
- `store`
- Store variable value or literal into `out` variable.
- ```
out = op
```
- dbname
- make and mark a data segment (one data segment in context - so maybe section it?) dunno
- db
- db stores into the data segment some label? hmm
- `dloadbytes`
- Alias for `codecopy` for legacy reasons. May be removed in future versions.
- `ret`
- Represents a return from an internal call.
- Jumps to a location given by `op`, hence modifies the program counter.
- ```
ret op
```
- `exit`
- Similar to `stop`, but used for constructor exit. The assembler is expected to jump to a special initcode sequence which returns the runtime code.
Copy link
Collaborator Author

@sandbubbles sandbubbles Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why it may be in fallback? there is the revert before it anyway, but it confused me as it doesn't seem to do much with constructor exit.

fallback:  IN=[__main_entry, 47_if_exit] OUT=[] => {}
    revert 0, 0
    exit
    ```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should be in runtime code...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is: --experimental-codegen -f bb_runtime on

totalShares: public(uint256)

# Set up the company.
@deploy
def __init__(_total_shares: uint256):
    pass

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i see what you mean. dead code! but it shouldn't be there, i would consider that a bug in our venom generation.

- ```
exit
```
- sha3_64
- `assert`
- Assert that `op` is zero. If it is not, revert.
- Calls that terminate this way do receive a gas refund.
- ```
assert op
```
- `assert_unreachable`
- Check that `op` is zero. If it is not, terminate with `0xFE` ("INVALID" opcode).
- Calls that end this way do not receive a gas refund.
- ```
assert_unreachable op
```
- `log`
- Similar to the `LOGX` instruction in EVM.
- Depending on the `topic_count` value (which can be only from 0 to 4) translates to `LOG0` ... `LOG4`.
- The rest of the operands correspond to the `LOGX` instructions.
- ```
log offset, size, [topic] * topic_count , topic_count
```
- For example
```
log %53, 32, 64, %56, 2
```
would translate to:
```
LOG2 %53, 32, 64, %56
```
- `nop`
- No operation, does nothing.
- ```
nop
```

### Jump instructions

- `jmp`
- Unconditional jump to code denoted by given `label`.
- ```
jmp label
```
- `jnz`
- A conditional jump depending on `op` value.
- Jumps to `label2` when `op` is not zero, otherwise jumps to `label1`.
- ```
jnz label1, label2, op
```
- `djmp`
- Dynamic jump to an address specified by the variable operand.
- The target is not a fixed label but rather a value stored in a variable, making the jump dynamic.
- ```
djmp var
```

### EVM instructions
The following instructions map one-to-one with [EVM instructions](https://www.evm.codes/).
Operands correspond to stack inputs in the same order. Stack outputs are instruction output.
Instructions have the same effects.
- `return`
- `revert`
- `coinbase`
- `calldatasize`
- `calldatacopy`
- `mcopy`
- `calldataload`
- `gas`
- `gasprice`
- `gaslimit`
- `chainid`
- `address`
- `origin`
- `number`
- `extcodesize`
- `extcodehash`
- `extcodecopy`
- `returndatasize`
- `returndatacopy`
- `callvalue`
- `selfbalance`
- `sload`
- `sstore`
- `mload`
- `mstore`
- `tload`
- `tstore`
- `timestamp`
- `caller`
- `blockhash`
- `selfdestruct`
- `signextend`
- `stop`
- `shr`
- `shl`
- `sar`
- `and`
- `xor`
- `or`
- `add`
- `sub`
- `mul`
- `div`
- `smul`
- `sdiv`
- `mod`
- `smod`
- `exp`
- `addmod`
- `mulmod`
- `eq`
- `iszero`
- `not`
- `lt`
- `gt`
- `slt`
- `sgt`
- `create`
- `create2`
- `msize`
- `balance`
- `call`
- `staticcall`
- `delegatecall`
- `codesize`
- `basefee`
- `blobhash`
- `blobbasefee`
- `prevrandao`
- `difficulty`
- `invalid`
- `sha3`
---

### TODO
- Describe the architecture of analyses and passes a bit more. mention the distiction between analysis and pass (optimisation or transformation).
- mention how to compile into it , bb(deploy), bb_runtime
- perhaps add some flag to skip the store expansion pass? for readers of the code
- if it is meant for using venom, then i should mention api for passes and analyses - should i do that?
- analysis by ir_analysis_cache - request, invalidate, force - type of analysis and additional params
- pass - run_pass

Perhaps mention that functions:
- each function starts as if with empty stack
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exactly -- they take (optional) output buffer and return pc

- alloca and palloca(interf) for some args
- param for args by stack

ask harry or someone:
- _mem_deploy_end is it immutable after that??
Loading