Skip to content

RFC #0001 Compiler and Virtual Machine

Higor Eurípedes edited this page Jan 8, 2012 · 9 revisions

RFC #0001 - Compiler and Virtual Machine

Execution model

To execute a Clever program, the virtual machine must fetch opcodes from a (randomly addressable) stream and execute the respective handler (see more on opcodes). Temporary values are stored in a stack called tmpdata.

Simplified implementation proposal:

void run(SomeContainer<Opcode> opcodes) {
        pc = 0;
        while (pc < opcodes.length()) {
                size_t jump_to = pc;
                Opcode& opcode = opcodes[pc];

                opcode.handler(opcode, &jump_to);

                if (jump_to == pc)
                        pc++;
                else
                        pc = jump_to;
        }
}

void jmp_handler(Opcode op, size_t *pc) {
        *pc = op.operand[0].data.number;
}

void fcall_handler(Opcode op, size_t *pc) {
        Callable v = op.operand[0].data.value;
        Value *r;

        v.call((ValueVector) op.operand[0].data.value, r);

        if (r)
                tmpdata.push(r);
}

Opcodes and Operands

Opcodes or operation codes are identifiers to the instructions that the virtual machine must execute. Almost all opcodes have side effects ranging from variable modification to behavioral changes. They often carry some payload called operand which can serve either as input or output.

Clever VM opcodes are allowed to have at most three operands, these must inform the VM from where or to where the data must go. The possible locations of the data are either the tmpdata stack or some Value*. One special case is for operands that contain only a single signed integer (defined by the rules of the ssize_t C type), which ought to be used as offsets, length indication and similar things.

Opcodes also contain function pointers to their respective handlers, these receive the current opcode and a pointer to the current program counter. The handlers are responsible for the "interpretation" of the opcode. The program counter pointer should only be modified if the handler wants for the VM to jump to some position on the opcode stream.

Simplified implementation proposal:

typedef void (*OpcodeHandler)(Opcode&, size_t*);

enum OperandType {
        OPERAND_NONE   = 0x00
        OPERAND_VALUE  = 0x01,
        OPERAND_NUMBER = 0x02,
        OPERAND_STACK  = 0x03
};

struct Operand {
        OperandType type;
        union {
                Value*  value;
                ssize_t number;
        } data;
};

enum OpcodeType {
        OP_NOP = 0x00,
        /* ... */
};

const size_t OpcodeMaxOperands = 3;

struct Opcode {
        OpcodeType    type;
        OpcodeHandler handler;
        Operand       operands[OpcodeMaxOperands];
};

Arithmetic

Opcode Operands Description
add out a b out = a + b
sub out a b out a - b
mul out a b out = a * b
div out a b out = a / b
mod out a b out = a % b
shr out a b out = a << b
shl out a b out = a >> b

Bitwise

Opcode Operands Description
bwor out a b out = a | b
bwand out a b out = a & b
bwxor out a b out = a ^ b
bwnot out a out = -a

Logical

Most of these are not used for conditionals but for inline code like a = b == c.

Opcode Operands Description
or out a b out = a || b
and out a b out = a && b
not out a out = !a
eq out a b out = a == b
neq out a b out = a != b
lt out a b out = a < b
gt out a b out = a > b
le out a b out = a <= b
ge out a b out = a >= b

Variable handling

Opcode Operands Description
assign a b a = b
load a (not decided)
store a (not decided)

Control flow

Opcode Operands Description
jmp off jumps to off
jmpnz off, a jumps to off if a is not 0
jmpz off, a jumps to off if a is 0
fcall a args pushes the current address and calls a and passes args
mcall a args same as fcall, but for methods.
ret a pushes a into the stack and returns to the caller
break ? ?
continue ? ?

Misc

Opcode Operands Description
nop   do nothing
discard n discards the top n values from the stack

Calling convention (Internal)

Sample compilation (needs to be updated)

Listing #1:

if (a || b) {
        d = c(a);
}
e = f;

Compiles to:

0: or a, b          # a || b
1: jmpz 4, (stack)  # leave the if
2: fcall c, [a]     # c(a)
3: store d          # d = (return from c(a))
4: assign e, f      # e = f

Or:

0: jmpz 4, a
1: jmpz 4, b
2: fcall c, [a]
3: store d
4: assign e, f

Listing #2:

Int a = 30;

Compiles to:

0: assign a 30  # see questions

Listing #4:

a = f() + 10;

Compiles to:

0: fcall f, []      # return value on the stack
1: add (stack), a   # stack.pop() + a
2: store a          # a = stack.pop()

Listing #5:

f(); /* returns something */

Compiles to:

0: fcall f, []  # calls f
1: discard 1   # discards return value

Questions

  • How should the compiler/virtual machine handle variable declaration?
  • What about constructors?
  • Should a copy (shallow, perhaps?) happen when a value is pushed into the stack?
  • Should we set up a "value pool" to draw values from and use relative/absolute offsets/addresses instead of value pointers on the opcodes?