-
Notifications
You must be signed in to change notification settings - Fork 16
RFC #0001 Compiler and Virtual Machine
To execute a Clever program, the virtual machine must fetch opcodes from a (randomly addressable) stream and execute the respective handler (see more on opcodes). Temporary values are stored in a stack called tmpdata.
Simplified implementation proposal:
void run(SomeContainer<Opcode> opcodes) { pc = 0; while (pc < opcodes.length()) { size_t jump_to = pc; Opcode& opcode = opcodes[pc]; opcode.handler(opcode, &jump_to); if (jump_to == pc) pc++; else pc = jump_to; } } void jmp_handler(Opcode op, size_t *pc) { *pc = op.operand[0].data.number; } void fcall_handler(Opcode op, size_t *pc) { Callable v = op.operand[0].data.value; Value *r; v.call((ValueVector) op.operand[0].data.value, r); if (r) tmpdata.push(r); }
Opcodes or operation codes are identifiers to the instructions that the virtual machine must execute. Almost all opcodes have side effects ranging from variable modification to behavioral changes. They often carry some payload called operand which can serve either as input or output.
Clever VM opcodes are allowed to have at most three operands, these must inform the VM from where or to where the data must go. The possible locations of the data are either the tmpdata stack or some Value*. One special case is for operands that contain only a single signed integer (defined by the rules of the ssize_t C type), which ought to be used as offsets, length indication and similar things.
Opcodes also contain function pointers to their respective handlers, these receive the current opcode and a pointer to the current program counter. The handlers are responsible for the "interpretation" of the opcode. The program counter pointer should only be modified if the handler wants for the VM to jump to some position on the opcode stream.
Simplified implementation proposal:
typedef void (*OpcodeHandler)(Opcode&, size_t*); enum OperandType { OPERAND_NONE = 0x00 OPERAND_VALUE = 0x01, OPERAND_NUMBER = 0x02, OPERAND_STACK = 0x03 }; struct Operand { OperandType type; union { Value* value; ssize_t number; } data; }; enum OpcodeType { OP_NOP = 0x00, /* ... */ }; const size_t OpcodeMaxOperands = 3; struct Opcode { OpcodeType type; OpcodeHandler handler; Operand operands[OpcodeMaxOperands]; };
Opcode | Operands | Description |
---|---|---|
add | out a b | out = a + b |
sub | out a b | out a - b |
mul | out a b | out = a * b |
div | out a b | out = a / b |
mod | out a b | out = a % b |
shr | out a b | out = a << b |
shl | out a b | out = a >> b |
Opcode | Operands | Description |
---|---|---|
bwor | out a b | out = a | b |
bwand | out a b | out = a & b |
bwxor | out a b | out = a ^ b |
bwnot | out a | out = -a |
Most of these are not used for conditionals but for inline code like
a = b == c
.
Opcode | Operands | Description |
---|---|---|
or | out a b | out = a || b |
and | out a b | out = a && b |
not | out a | out = !a |
eq | out a b | out = a == b |
neq | out a b | out = a != b |
lt | out a b | out = a < b |
gt | out a b | out = a > b |
le | out a b | out = a <= b |
ge | out a b | out = a >= b |
Opcode | Operands | Description |
---|---|---|
assign | a b | a = b |
load | a | (not decided) |
store | a | (not decided) |
Opcode | Operands | Description |
---|---|---|
jmp | off | jumps to off
|
jmpnz | off, a | jumps to off if a is not 0 |
jmpz | off, a | jumps to off if a is 0 |
fcall | a args | pushes the current address and calls a and passes
args
|
mcall | a args | same as fcall, but for methods. |
ret | a | pushes a into the stack and returns to the caller |
break | ? | ? |
continue | ? | ? |
Opcode | Operands | Description |
---|---|---|
nop | do nothing | |
discard | n | discards the top n values from the stack |
Listing #1:
if (a || b) { d = c(a); } e = f;
Compiles to:
0: or a, b # a || b 1: jmpz 4, (stack) # leave the if 2: fcall c, [a] # c(a) 3: store d # d = (return from c(a)) 4: assign e, f # e = f
Or:
0: jmpz 4, a 1: jmpz 4, b 2: fcall c, [a] 3: store d 4: assign e, f
Listing #2:
Int a = 30;
Compiles to:
0: assign a 30 # see questions
Listing #4:
a = f() + 10;
Compiles to:
0: fcall f, [] # return value on the stack 1: add (stack), a # stack.pop() + a 2: store a # a = stack.pop()
Listing #5:
f(); /* returns something */
Compiles to:
0: fcall f, [] # calls f 1: discard 1 # discards return value
- How should the compiler/virtual machine handle variable declaration?
- What about constructors?
- Should a copy (shallow, perhaps?) happen when a value is pushed into the stack?
- Should we set up a "value pool" to draw values from and use relative/absolute offsets/addresses instead of value pointers on the opcodes?