oddly accurate

hoc3

hoc4: hoc3 inside an interpreter

hoc5

hoc4 has exactly the same grammar as hoc3, but its execution proceeds in an entirely different way. From hoc1hoc3, our interpreter evaluated expressions immediately, and when an expression terminator was reached, we had a double-valued answer. hoc4, on the other hand, separates the process of parsing hoc statements from the process of executing them.

Instead, hoc4 includes a small “virtual machine” module, which understands very simple instructions, and operates on a stack of data (much like a stack-based calculator).

The parser’s new job will be to convert hoc expressions into machine instructions and install them into the virtual machine. Once an expression is completely parsed, the list production will install some final instructions to print the result and/or stop the machine. At this point, we have a tiny “program” now installed into our machine.

After the program has been installed, we ask the machine to execute it. It evaluates each instruction, in order, using its stack to keep track of intermediate values (much like we used the parsing stack for this purpose in hoc1hoc3.

Introducing the hoc VM

The hoc4 machine

(Note: While the symbol table is still separate from the machine, the machine uses it, and so we include it in our diagrams.)

The hoc4 virtual machine has two separate storage areas. One is used to store instructions, called the “program storage” (or program). Each intermediate value the program generates is stored in a stack until it’s used (by a subsequent instruction).

VM Operation

Using the machine requires two “stages” - we first install instructions, then execute them.

During the installation process, the machine merely appends instructions to its program storage, without any interpretation or verification. When we call execute(), the machine sets its program counter (pc) to the beginning of program storage, executes whatever instruction is stored there, and then moves its pc forward. Each instruction may push data onto the stack or pop data from it. As each instruction executes, the pc will move forward by at least one, though, because some instructions consume more than one “word” of program space, it’s more accurate to say that the pc will be placed at the beginning of the next instruction. Let’s look at two examples to ground the discussion.

Example 1: Printing a Number

As a first example, consider a hoc program to print the number 42. Since expressions are always printed in hoc4, we can just write:

42

(Assuming we have a newline at the end) The same program, written for our virtual machine, looks like:

pushlit
42
print
stop

As it executes each instruction, we can examine how the state of the machine changes.

Instruction
PC
Stack
Comment
N/A
0
Empty
Initial state
pushlit
42
2
42
Push the literal value 42 onto the stack. (Notice PC changed by 2.)
print
3
Empty
Pops a value from the stack, prints it to the screen
stop
3
Empty
Stops the machine; no PC change.

The pushlit instruction is an example of an instruction that takes up more than one “word” of the memory in prog. Every pushlit instruction must be followed by a literal number stored in memory, and executing the instruction will examine that number, then move the PC to the next word of memory.

Example 2: Instructions and Symbols

The program instructions may also include Symbols as part of their variables. Since our language comes with some builtin Symbols already, we’ll use those to illustrate. Consider this one-line program to compute & display the cosine of pi:

cos(PI)

This program corresponds to the instructions below; notice that we first push arguments onto the stack, then the BUILTIN Symbol for cos. This is handy, as that’s also our parsing order!

pushref
PI
eval
pushref
cos
call
print
stop

While we don’t show it here, we know that the symbol table already has values for both PI and cos.

Instruction
PC
Stack
Comment
N/A
0
Empty
Initial state
pushref
PI
2
PI
Like pushlit, the pushref instruction pushes a value onto the stack. In this case, the value is a Symbol. The stack can hold both types of values.
eval
3
3.14159
We don’t want to pass a Symbol to the cos function; it operates on double values. Before a variable’s value can be used, it should be evaluated. This pops the symbol from the stack, looks up its value in the symbol table, and then pushes that value back onto the stack.
pushref
cos
5
cos 3.14159
Push our cos symbol onto the stack, so that we can use the call instruction.
call
6
-1
The call instruction will first pop 1 value from the stack. It must always be a BUILTIN. It checks the number of arguments the builtin uses, and pops that number of values from the stack. It then calls the Symbol's associated function with those arguments. The result of calling the BUILTIN is pushed onto the stack.
print
7
Empty
Pops -1 from stack and prints it to the screen
stop
7
Empty
Stops the machine; no PC change.

The hoc VM Instruction Set

We’ve seen some of the instructions the hoc4 VM understands; here are the rest.

Stack-Manipulation Instructions

We have three instructions for adding values to our stack of data:

pushlit
This instruction is always followed by a literal double value, which is pushed onto our machine’s stack.
pushref
This instruction is followed by a Symbol, which is pushed onto our machine’s stack
pushlast
Pushes the most-recently-popped value back onto the stack (for the @ expression)

We have two instructions that remove values from the stack; we use pop to discard the topmost element, or print to display it (after removal).

Arithmetic

The core arithmetic instructions, implemented in our parser during hoc3, are now machine instructions in hoc4: add sub, mul, and div.

Every instruction requires that there are at least two values on the stack. The two values will be popped from the stack, the operationi s performed, and the result is placed back onto the stack.

Symbol Manipulation

eval
Evaluates the Symbol at the top of the stack, and replaces it with the Symbol’s current value from the symbol table.
assignvar
This instruction pops a double and a VAR-type Symbol from the stack, and assigns the double’s value to the symbol.
assignconst
Operates just like assignvar, but marks the Symbol as const.
call
Examines the (function-valued) Symbol at the top of the stack, pops the number of arguments required to call the function, and pushes the result of evaluating it.

Implementation Part 1: The Machine

Before we change anything about hoc’s language operation, we can build (and presumably test) the machine module independently.

Machine State

The earlier diagram tells us the majority of the state we’ll need.

The hoc4 machine

Instructions

Our virtual machine’s current program is simply an array of MachineCode objects. Conceptually, we can think of MachineCode objects as “words” of our machine’s memory. To ease development and debugging, though, each word is typed. A word may be any one of:

Our MachineCode objects contain the type of the word, chosen from the CodeType enumeration, and an associated value; an anonymous union suffices to hold a value of whichever type is appropriate. We’ll declare these types inside hoc.h.

typedef int (*Inst)(void);

typedef enum {
    CT_INST = 'I',
    CT_LITERAL = 'L',
    CT_SYM_REF = 'S'
} CodeType;

typedef struct {
  CodeType type;
  union {
    double  literal; // type == CT_LITERAL
    Inst    inst;    // type == CT_INST
    Symbol *symbol;  // type == CT_SYM_REF
  };
} MachineCode;

hoc4/hoc.h

The Program

Finally, we’re ready to begin machine.c; we’ll add a static array prog we can use to store the MachineCode that’s installed.

#define PROGSIZE 2000 // Static program limit (for now)
static MachineCode prog[PROGSIZE];

hoc4/machine.c

We’ve already mentioned the pc, which tracks the next word to execute. In addition, we’ll need some state for tracking the size of the currently-installed program instructions, so that we’ll know where to add the next instruction.

static MachineCode *progp;  // Next free location for code generation
static MachineCode *pc;     // the current location in the executing program

hoc4/machine.c

The Stack

The stack of data we build up in our machine is made up of Datum objects; these are much like MachineCode, with two differences:

  1. They don’t track their type information; it should be entirely unambiguous from context what type of value they have, so they can be stored in a single union.
  2. They cannot contain an Inst type, because we do not allow hoc programs to execute code from the stack.
typedef union {
  double  value;
  Symbol *symbol;
} Datum;

hoc4/hoc.h

We use another fixed-size array for the stack as well; notice we include a piece of state to “remember” the most-recently-popped value (for use with the @ expression)

#define STACKSIZE 256
static Datum  stack[STACKSIZE];            // the machine stack
static Datum *stackp;                      // next free spot on the stack
static Datum  last_popped = {.value = 0};  // previous top-of-stack value

hoc4/machine.c

Installing Code into our Program

Our machine will have a single entrypoint for installing code into its program:

/**
 * Installs the given `MachineCode` object into the next available
 * location in the machine's current program.  A runtime error will
 * occur if the program is full
 */
void install_code(MachineCode mc);

hoc4/hoc.h

The machine.c implementation is straightforward.

void install_code(MachineCode mc) {
  if (progp >= prog + PROGSIZE) {
    exec_error("Max program size exceeded");
  }
  *(progp++) = mc;
}

hoc4/machine.c

Working with the stack

We’ll need functions to manipulate the stack to simplify the individual instruction implementations.

/** Push new value onto stack */
static void stack_push(Datum d) {
  if (stackp >= stack + STACKSIZE) {
    exec_error("Stack overflow");
  }
  *(stackp++) = d;  // copy
}

static Datum stack_pop(void) {
  if (stackp <= stack) {
    exec_error("Stack underflow");
  }
  last_popped = *(--stackp);
  return last_popped;
}

hoc4/machine.c

Executing our Program

Before we can implement execution, or even our machine instructions, we need to determine a contract for how the machine’s state will be updated as we move through the program. Updating the program counter is not completely straightforward, since each instruction may be encoded in a different number of “words” of prog.

void execute(MachineCode *addr);

hoc4/hoc.h

We declare the following invariants:

This set of invariants gives us the following properties:

This leads us to the following implementation:


void execute(MachineCode *addr) {
  // this argument isn't used in hoc4
  pc = (addr == NULL) ? prog : addr;

  while (!is_stop_inst(pc)) {
    if (pc->type != CT_INST) {
      exec_error("Programming error, type '%c' was not INST!", pc->type);
    }

    (*(*pc++).inst)();  // execute inst and increment PC
  }
}

hoc4/machine.c

Our STOP instruction & is_stop_inst() are declared and defined in the hoc.h header:

#define STOP_INST NULL
#define is_stop_inst(mc) ((mc)->type == CT_INST && (mc)->inst == STOP_INST)

hoc4/hoc.h

Implementing Instructions

Each instruction is implemented as a int-valued function with no arguments, since all of the input we need is contained in our static variables.

We’ll declare them in hoc.h, since they’ll be installed by our parser.

int inst_add(void);
int inst_assignconst(void);
int inst_assignvar(void);
int inst_call(void);
int inst_div(void);
int inst_eval(void);
int inst_mul(void);
int inst_pop(void);
int inst_print(void);
int inst_pushlast(void);
int inst_pushlit(void);
int inst_pushref(void);
int inst_sub(void);

hoc4/hoc.h

Each of our instructions will follow the same basic structure:

int my_instruction(void) {
  // fetch data from stack
  // perform operation
  // update PC if necessary
  // return 0
}

Stack-Manipulation Instructions

Two push instructions pushlit, pushref are always two words long.

pushlit
42

We must be sure to increment pc again before returning.

int inst_pushlit(void) {
  assert(pc->type == CT_LITERAL); // should always be true

  Datum d = (Datum){.value = pc->literal};
  stack_push(d);

  pc++;  // this instruction is two "words" long
  return 0;
}

hoc4/machine.c

The pushlast instruction doesn’t require an argument, because it’s implemented by using the most-recently-popped value from the stack.

int inst_pushlast(void) {
  stack_push(last_popped);
  return 0;
}

hoc4/machine.c

The pop instructions (pop, print) are completely straightforward; the print instruction is replicated from our formatted print in the list production of hoc3’s parser.

int inst_pop(void) {
  stack_pop();
  return 0;
}

int inst_print(void) {
  Datum d = stack_pop();
  printf("\t%.8g\n", d.value);
  return 0;
}

hoc4/machine.c

Arithmetic Instructions

Our arithmetic instructions (add, sub, etc.) are all one word long. They each have two arguments to be taken from the stack. The only detail to keep in mind is reversing the argument order, since they were pushed left-to-right. For example, the stack for 12 / 3 is constructed as follows:

Stack Before
Machine Code Run
Stack After
Empty
pushlit
12
12 12
pushlit
3
3 12 3 12
div
4

This leads to a straightforward implementation.

int inst_div(void) {
  Datum rhs = stack_pop();
  Datum lhs = stack_pop();

  if (rhs.value == 0) {
    exec_error("Division by zero");
  } else {
    stack_push((Datum){.value = lhs.value / rhs.value});
  }
  return 0;
}

hoc4/machine.c

Assignment & Evaluation

Our machine works with Symbols directly, rather than performing any lookup()s itself; actually installing symbols into the table will still be the responsibility of the lexer (though this isn’t absolutely required).

The assignvar instruction, for example, is a single-word instruction that requires a symbol to be on the top of the stack, followed by its to-be-assigned value. Notice that these expected stack items are in the opposite order from the div instruction. This is due to the order in which we parse the expressions, as we’ll see in a later section.

As an example, the program:

pushlit
42
pushref
x
assign

Will result in the following state during execution:

Instruction
PC
Stack
Comment
N/A
0
Empty
Initial state
pushlit
42
2
42
pushref
x
4
42 Symbol(x, UNDEF)
assignvar
5
Empty

The assignvar instruction merely updates the Symbol’s value, just like the assignment action did in hoc3.

int inst_assignvar(void) {
  Datum lhs = stack_pop();  // must be symbol
  Datum rhs = stack_pop();  // must be value
  if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
    exec_error("Cannot assign to symbol '%s'", lhs.symbol->name);
  }
  lhs.symbol->data.val = rhs.value;
  lhs.symbol->type = VAR;
  stack_push((Datum){.value = rhs.value});
  return 0;
}

hoc4/machine.c

The rest of the instructions are similar.

Implementation Part 2: Connecting our Parser and Machine

In hoc1hoc3, our parser actions focused on producing values, which allowed us to build up our computed results. We used the list production at the top of our grammar to display those results to the user.

Now, we want our parser actions to build small programs for our machine to run. The list production, whenever it reaches the end of an expression, will produce two final instructions: print and stop. It also returns to our main() function, which is then responsible for calling the machine module’s execute() function. The program will run, display its result(s) to the user, and we can return to the parser to wait for more input.

This means that the main changes in hoc.y will be to our grammar actions; rather than generating values, e.g.

expr:  NUMBER
       {
         $$ = $1;  // return literal number
       }

We will generate VM instructions that do the same thing.

expr:  NUMBER
       {
         // Push literal number onto stack
         install_instruction(inst_pushlit);
         install_literal($1);
       }

To understand the implementation changes, we’ll explore example programs to see how they’re generated:

Example 1: Printing a Number

Given the hoc input:

42

We’ve seen in our machine discussion that our parser will need to install a machine program such as:

pushlit
42
print
stop

Step 1: Recognize NUMBER, Reduce to expr

Our lexer, which does not change from hoc3, will recognize the 42 as a NUMBER token. This matches our expr: NUMBER rule, which now installs instructions for the first half of our program:

expr:  NUMBER
       {
         install_instruction(inst_pushlit);
         install_literal($1);
       }

hoc4/hoc.y

Step 2: The list action

Since this is the end of our input, our expression is reduced to a list; the list action always finishes our program.

list: ...
      list expr terminator
      {
        install_instruction(inst_print);
        install_instruction(STOP_INST);
        YYACCEPT;
      }

hoc4/hoc.y

Q: Why do we install a STOP instruction after printing? A: Otherwise, the machine might continue to execute old instructions after executing this expression!

Sidenote: Is that “print” always safe?

It may seem odd that we can add a print instruction at the end of every generated program; how do we know that there will be a value on the stack to print?

But recall that expr makes guarantees; in hoc3, we had a similar “top-level” list action:

list: ...
      | list expr terminator
        {
          printf("\t%.8g\n", remember($2));
        }

Here, we knew we could print, because every expr was required to produce a double, and so the list action could always be sure that $2 had a reasonable value.

In hoc4, the list production installs a “print” instruction, which pops a value from the stack. It can do this without fear of underflow, because we have a similar guarantee: In hoc4, we write our actions so that the instructions installed by an expression action will always place at least one double onto the stack. This is important enough to repeat:

In hoc4, any parser action for an expr nonterminal must install instructions that, when eventually executed, will place (at least) one value onto the machine stack.

Unfortunately, unlike our %type declarations, yacc cannot check this invariant for us; we’ll need to be careful to ensure that all of our parser actions follow this rule. As part of this change, our expressions no longer need to produce values for yacc at all.

 %token  <hoc_value>     NUMBER
 %token  <hoc_symbol>    VAR CONST BUILTIN UNDEF
-%type   <hoc_value>     expr assign call
 %type   <hoc_symbol>    assignable

hoc4/hoc.y

Printing Variables

Evaluating variables requires a very similar set of instructions.

expr: ...
      | VAR
        {
          install_instruction(inst_pushref);
          install_ref($1);
          install_instruction(inst_eval);
        }

hoc4/hoc.y

The eval instruction at the end of the program satisfies our guarantee: after evaluating a single-variable expression, our stack always has a double value on top. Variables are not always immediately evaluated, when used as part of a larger expression, as we’ll see in the next example.

Example 2: Assignment

Consider the statement:

answer = 42

Review: hoc3’s Operation

In hoc3, our parser would first resolve the right side of the assignment, as = is right-associative. This is fine, because tokenizing the left-hand-side (which must happen before the right) installs answer into the symbol table. The symbol itself is returned. The right-hand side is a number, which is parsed as an expression as:

NUMBER    { $$ = $1; }

Since numbers are expressions, we have matched the assign rule, which worked as follows:

Parsing assignments: hoc3 Version

assign:   assignable '=' expr
          {
              if ($1->type == CONST) {
                  exec_error("Cannot reassign constant '%s'", $1->name);
              } else {
                  $1->type = VAR;
                  $$ = ($1->data.val = $3);
              }
          }

We have access to both the assignable symbol $1 and the “right-hand side” value, $3. So, the assign rule can simply modify the value of our symbol.

In hoc4, our actions have the same responsibilities. Again, the assignable rule has no behavior and merely returns the Symbol. But the expr rule must ensure that a value is on the stack for assignment:

expr:   NUMBER
        {
            install_instruction(inst_pushlit);
            install_literal($1);
        }

hoc4/hoc.y

When we see a number used as a standalone expression, we install instructions for pushing it onto the stack. Our program, at this point, contains:

pushlit
42

We then reach the assign rule, which must, at some point, install an assignvar instruction. Recall that assignvar requires that the top of our VM stack looks like (<Symbol, value-to-assign,...>). Luckily, we’ve promised to guarantee expr actions always leave a double value on the top of the stack. The assign rule can takes advantage of that to install:

  1. A pushref, to add the symbol x above the double value on the stack.
  2. An assignvar instruction, which will actually set the symbol’s value, and place it onto the stack.
assign:   assignable '=' expr
          {
              install_instruction(inst_pushref);
              install_ref($1);
              install_instruction(inst_assignvar);
          }

hoc4/hoc.y

Our guarantee lets us proceed with confidence, and our program now looks like:

pushlit
42
pushref
x
assignvar

Finally, we match the list rule, which has a special case for assign. As in hoc3, we don’t want to print the results of assignments. So, we install a pop instruction instead, which discards the value.

list:    list assign terminator
         {
            install_instruction(inst_pop);
            install_instruction(STOP_INST);
            YYACCEPT;
         }

hoc4/hoc.y

We’ve generated the final version of our program:

pushlit
42
pushref
x
assign
pop
STOP

And so this tiny program updates the value of the x Symbol, and does nothing else, just like the assignment in hoc3.

A Slightly Longer Example

We combine all the ideas for our final one-liner:

x = 2 * PI

Our parser encounters the following tree:

Parse tree for x = 2 * y

Let’s compare the parser actions from hoc3 to hoc4.

Before: hoc3 Actions

Evaluated
Reduction
Generated
2(NUMBER)
expr: NUMBER
{
    $$ = $1;
}
2
PI (CONST)
expr: CONST
{
    $$ = $1->data.val;
}
3.14159
*
expr: expr * expr
{
    $$ = $1 * $3;
}
6.28318
x =
assign: assignable = expr
{
    ...
    $$ = ($1->data.val = $3);
}
6.28318
N/A
list: list assign terminator
Discards Value

After: hoc4 Actions

Evaluated
Reduction
Machine Code
2 (NUMBER)
expr: NUMBER
{
    install_instruction(inst_pushlit);
    install_literal($1);
}
pushlit
2
PI(CONST)
expr: CONST
{
    install_instruction(inst_pushref);
    install_ref($1);
    install_instruction(inst_eval)
}
pushref
Symbol(PI)
eval
*

expr: expr * expr
{
    install_instruction(inst_mul)
}
mul
x =
assign: assignable = expr
{
    install_instruction(inst_pushref);
    install_ref($1);
    install_instruction(inst_assignvar)
}
pushref
Symbol(x)
assignvar
N/A
list: list assign terminator
{
    install_instruction(inst_pop);
    install_instruction(STOP_INST)
}
pop
STOP

The rest of the action changes are straightforward variations of the above.

Implementation Part 3: Execution

In our previous parsers, our grammar actions contained all of the business logic of our program. We never returned from yyparse(), because the parser ran until the user sent EOF, or the process was terminated.

But our new execution model is slightly different; we want to use yyparse to parse individual expressions, but then we want to execute them ourselves with the machine interface. It wouldn’t be very helpful to only run the programs when all input has been received.

One problem with this idea is that yyparse() expects to run until the entire input stream is parsed. There are several solutions to this, but the simplest is to use a macro, YYACCEPT, directly in the actions that “complete” an expression. This macro indicates to yacc that, though more input remains, we have finished a parse successfully, and it may return to its caller immediately. This is preferable to returning from an action directly, as that can leave the parser in an undefined state.

We use this macro in our list production, anytime we find an input that we can execute().

list:           /* nothing */
        |       list terminator
        |       list error terminator { yyerrok; }
        |       list assign terminator
                {
                    install_instruction(inst_pop);
                    install_instruction(STOP_INST);
                    YYACCEPT;
                }
        |       list expr terminator
                {
                    install_instruction(inst_print);
                    install_instruction(STOP_INST);
                    YYACCEPT;
                }

hoc4/hoc.y

Now each new “list entry” will run as its own program; all of our “state” for hoc programs is tracked through variable values in the symbol table. So, each time we parse an expression successfully, we:

  1. Execute the resulting program
  2. Reset our program/stack state
  3. Try to parse again

We will stop when the parser returns an error result, or we reach the end of stdin.

#define YACC_SUCCESS 0  // Yacc's success result

hoc4/hoc.h

Resetting the Machine

We’ll need to be able to reset the machine’s program storage and pc in between expressions. Remember that all of the “persistent” state is saved in our symbol table.

We’ll add two more functions to our machine interface in hoc.h: machine_startup(), which initializes our machine once at program startup, and machine_reset_program(), to reset prog and pc in between statements.

void machine_startup(void);
void machine_reset_program(void);

hoc4/hoc.y

The functions are very similar; our startup is a handy place to initialize our random number generator.

/** Initialize machine state for startup */
void machine_startup(void) {
  srand(time(NULL));  // use current time as seed for random generator
  machine_reset_program();
}

/** Initialize execution state for a new program */
void machine_reset_program(void) {
  stackp = stack;
  progp = prog;
}

hoc4/machine.c

Adding Reset to our REPL

We can use those in our main loop:

int main(int argc, char *argv[]) {
      ...
      install_hoc_builtins();
+     machine_startup(); // one-time initialization

      setjmp(begin);
+     /* Reset the program after jumping back here on error */
+     machine_reset_program();
+     while(yyparse() == YACC_SUCCESS && !feof(stdin)) {
+         execute(NULL);
+         machine_reset_program();
      }

      return 0;
  }
+

hoc4/hoc.y

Changes from UPE’s hoc4

In the book, the authors do not create the MachineCode abstraction, instead using an array of Inst pointers to hold both Inst and Symbol pointers. Because pointers to different kinds of objects are not guaranteed to be the same size, we use a union of values instead. While the type field in MachineCode is not necessary, it’s quite handy for debugging.

Since the book’s builtins all have exactly one argument, they do not require the handling we have here of multi-argument Symbol functions.

The authors also return a value of 1 directly wherever we use YYACCEPT. Since bison does not recommend bypassing its return macros, we use their documented interface.

What’s Next

hoc4 was a major transition point our operation; hoc5 will build on this foundation, adding flow control and conditional logic to our little virtual machine.

Source Code

Makefile

CFLAGS ?= -std=c2x -Wall -Wextra -pedantic -Wformat -Wformat-extra-args -Wformat-nonliteral
YFLAGS ?= -d
LDFLAGS=-lm
objects := builtins.o math.o symbol.o machine.o hoc.o

.PHONY:all
all: hoc

hoc: $(objects)
    cc $(CFLAGS) $(objects) $(LDFLAGS) -o hoc

hoc.c hoc.tab.h: hoc.y
    yacc $(YFLAGS) hoc.y -o hoc.tab.c
    mv hoc.tab.c hoc.c

%.o: hoc.h hoc.tab.h %.c
    cc $(CFLAGS) -c -o $@ $*.c

.PHONY:clean
clean:
    rm -f y.tab.h
    rm -f hoc.tab.h
    rm -f $(objects)
    rm -f hoc.c
    rm -f hoc
builtins.c

#include "hoc.h"
#include "hoc.tab.h"
#include <stdio.h>
#include <stdlib.h>

/* Defined in math.c */
extern double Abs(double), Acos(double), Atan(double), Atan2(double, double),
    Cos(double), Exp(double), Integer(double), Lg(double), Ln(double),
    Log10(double), Pow(double, double), Random(void), Sin(double), Sqrt(double);

static struct {
  const char *name;
  double      value;
} constants[] = {
    {"PI", 3.14159265358979323846},
    {"E", 2.71828182845904523536},
    {NULL, 0},
};

static struct {
  const char *name;
  int         args;
  union {
    double (*call0)(void);
    double (*call1)(double);
    double (*call2)(double, double);
  };
} builtins[] = {
    {"abs", 1, .call1 = Abs},
    {"acos", 1, .call1 = Acos},
    {"atan", 1, .call1 = Atan},
    {"atan2", 2, .call2 = Atan2},
    {"cos", 1, .call1 = Cos},
    {"exp", 1, .call1 = Exp},
    {"int", 1, .call1 = Integer},
    {"lg", 1, .call1 = Lg},
    {"ln", 1, .call1 = Ln},
    {"log10", 1, .call1 = Log10},
    {"pow", 2, .call2 = Pow},
    {"rand", 0, .call0 = Random},
    {"sin", 1, .call1 = Sin},
    {"sqrt", 1, .call1 = Sqrt},
    {NULL, 0, .call0 = NULL},
};

void install_hoc_builtins(void) {
  for (int i = 0; constants[i].name != NULL; i++) {
    install_symbol(constants[i].name, CONST, constants[i].value);
  }
  for (int i = 0; builtins[i].name != NULL; i++) {
    Symbol *s = install_symbol(builtins[i].name, BUILTIN, 0.0);
    s->data.func.args = builtins[i].args;
    switch (builtins[i].args) {
      case 0:
        s->data.func.call0 = builtins[i].call0;
        break;
      case 1:
        s->data.func.call1 = builtins[i].call1;
        break;
      case 2:
        s->data.func.call2 = builtins[i].call2;
        break;
      default:
        exec_error("Cannot start hoc: Argument count error for builtin '%s'",
                   builtins[i].name);
    }
  }
}
hoc.h

#ifndef HOC_INC
#define HOC_INC 1

/*
 * Global types & declarations for use in hoc
 */

#include <stddef.h>

///----------------------------------------------------------------
/// Symbols
///----------------------------------------------------------------

#define SYMBOL_NAME_LIMIT 100
typedef struct Symbol Symbol;

/** Function descriptor for builtins: these always return doubles */
struct BuiltinFunc {
  int args;
  union {
    // anon union: e.g. use `data.func.call0` directly. */
    double (*call0)(void);
    double (*call1)(double);
    double (*call2)(double, double);
  };
};

struct Symbol {
  short   type;  // Generated from our token types
  char   *name;
  Symbol *next;
  union {
    double             val;   // a double-valued symbol
    struct BuiltinFunc func;  // a callable symbol
  } data;
};

Symbol *install_symbol(const char *name, short type, double value);
Symbol *lookup(const char *name);

///----------------------------------------------------------------
/// Machine Code
///----------------------------------------------------------------

typedef int (*Inst)(void);

typedef enum { CT_INST = 'I', CT_LITERAL = 'L', CT_SYM_REF = 'S' } CodeType;

typedef struct {
  CodeType type;
  union {
    double  literal;
    Inst    inst;
    Symbol *symbol;
  };
} MachineCode;

#define STOP_INST NULL

///----------------------------------------------------------------
/// Machine Program
///----------------------------------------------------------------

/**
 * Installs the given `MachineCode` object into the next available
 * location in the machine's current program.  A runtime error will
 * occur if the program is full
 */
void install_code(MachineCode mc);

#define install_literal(lit) \
  install_code((MachineCode){.type = CT_LITERAL, .literal = (lit)})

#define install_ref(sp) \
  install_code((MachineCode){.type = CT_SYM_REF, .symbol = (sp)})

#define install_instruction(instr) \
  install_code((MachineCode){.type = CT_INST, .inst = (instr)})

void execute(MachineCode *addr);

///----------------------------------------------------------------
/// Instructions
///----------------------------------------------------------------

int inst_add(void);
int inst_assignconst(void);
int inst_assignvar(void);
int inst_call(void);
int inst_div(void);
int inst_eval(void);
int inst_mul(void);
int inst_pop(void);
int inst_print(void);
int inst_pushlast(void);
int inst_pushlit(void);
int inst_pushref(void);
int inst_sub(void);

///----------------------------------------------------------------
/// Machine Stack
///----------------------------------------------------------------

typedef union {
  double  value;
  Symbol *symbol;
} Datum;

void machine_startup(void);
void machine_reset_program(void);

///----------------------------------------------------------------
/// Runtime functions
///----------------------------------------------------------------

/** global error handling */
__attribute__((format(printf, 1, 2))) void warning(const char *msg, ...);
__attribute__((format(printf, 1, 2))) void exec_error(const char *msg, ...);

/** builtins */
void install_hoc_builtins(void);

#endif  // HOC_INC
hoc.y

%{
///----------------------------------------------------------------
/// C Preamble
///----------------------------------------------------------------

/*
 * "higher-order calculator" - Version 4
 * From "The UNIX Programming Environment"
 */

#include <stdbool.h>
#include <stdio.h>
#include <ctype.h>
#include <signal.h>
#include <setjmp.h>
#include <stdarg.h>
#include "hoc.h"

///----------------------------------------------------------------
/// global state
///----------------------------------------------------------------

jmp_buf begin;

///----------------------------------------------------------------
/// local declarations
///----------------------------------------------------------------

#define YACC_SUCCESS 0
int yylex(void);
void yyerror(const char *s);
void check_call_args(const Symbol *s, int actual);
%}

%union {
  double hoc_value;
  Symbol *hoc_symbol;
}

%token  <hoc_value> NUMBER
%token  <hoc_symbol>    VAR CONST BUILTIN UNDEF
%type   <hoc_symbol>    assignable
%right  '='       /* right-associative, much like C */
%left   '+' '-'   /* left-associative, same precedence */
%left   '*' '/'   /* left-assoc; higher precedence */
%right  '^'       /* exponents */
%left   UNARY_MINUS  /* Even higher precedence
            than mul or div. Can't use '-', as we've used
            it already for subtraction. */
%%
list:           /* nothing */
        |       list terminator
        |       list error terminator { yyerrok; }
        |       list assign terminator
                {
                    install_instruction(inst_pop);
                    install_instruction(STOP_INST);
                    YYACCEPT;
                }
        |       list expr terminator
                {
                    install_instruction(inst_print);
                    install_instruction(STOP_INST);
                    YYACCEPT;
                }
        ;
expr:           NUMBER
                {
                    install_instruction(inst_pushlit);
                    install_literal($1);
                }
        |       '@'
                {
                    install_instruction(inst_pushlast);
                }
        |       assignable
                {
                    install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_eval);
                }
        |       assign
        |       call
        |       expr '+' expr   { install_instruction(inst_add); }
        |       expr '-' expr   { install_instruction(inst_sub); }
        |       expr '*' expr   { install_instruction(inst_mul); }
        |       expr '/' expr   { install_instruction(inst_div); }
        |       expr '^' expr
                {
                    install_instruction(inst_pushref);
                    install_ref(lookup("pow"));
                    install_instruction(inst_call);
                }
        |       '-'expr         %prec UNARY_MINUS /* prec makes us bind tighter than subtraction */
                {
                    install_instruction(inst_pushlit);
                    install_literal(-1);
                    install_instruction(inst_mul);
                }
        |       '(' expr ')'
        ;
assign:         assignable '=' expr
                {
                    install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_assignvar);
                }
        |       assignable ':' '=' expr
                {
                    install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_assignconst);
                }
        ;
call:           BUILTIN '(' ')'
                {
                    check_call_args($1, 0);
                    install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_call);
                }
        |       BUILTIN '(' expr ')'
                {
                    check_call_args($1, 1);
                    install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_call);
                }
        |       BUILTIN '(' expr ',' expr ')'
                {
                    check_call_args($1, 2);
                    install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_call);
                }
        ;
assignable: VAR | CONST
        ;
terminator: '\n' | ';'
        ;
%% // end of grammar

/* error tracking */
char *program_name;
int line_number = 1;

int main(int argc, char *argv[]) {
    (void)argc;
    program_name = argv[0];
    install_hoc_builtins();
    machine_startup(); // one-time initialization

    setjmp(begin);
    /* Reset the program after jumping back here on error */
    machine_reset_program();
    while(yyparse() == YACC_SUCCESS && !feof(stdin)) {
        execute(NULL);
        machine_reset_program();
    }

    return 0;
}

/* our simple, hand-rolled lexer. */
int yylex(void) {
    int c;
    do {
        c = getchar();
    } while (c == ' ' || c == '\t');

    if (c == EOF) {
        return 0;
    }

    if (c == '.' || isdigit(c)) {
        ungetc(c, stdin);
        scanf("%lf", &yylval.hoc_value);
        return NUMBER;
    }

    if (c == '\n') {
        line_number++;
    }


    if (isalpha(c)) {
        Symbol *s;

        char buf[SYMBOL_NAME_LIMIT + 1];
        size_t nread = 0;
        do {
            buf[nread++] = c;
            c = getchar();
        } while (nread < SYMBOL_NAME_LIMIT && (isalpha(c) || isdigit(c)));

        // Just in case we exceeded the limit
        while ((isalpha(c) || isdigit(c))) {
            c = getchar();
        }

        // at this point, we have a non-alphanumeric 'c'
        ungetc(c, stdin);

        buf[nread] = '\0';
        if ((s=lookup(buf)) == NULL) {
            s = install_symbol(buf, UNDEF, 0.0);
        }
        yylval.hoc_symbol = s;
        return (s->type == UNDEF ? VAR : s->type);
    }

    return c;
}

void yyerror(const char *s) {
  warning("%s", s);
}

/*
 * Verify Symbol's declared arg count matches actual,
 * or display a helpful error message with mismatch
 */
void check_call_args(const Symbol *s, int actual) {
    int expected = s->data.func.args;
    if (expected != actual) {
        exec_error("Wrong number of arguments for %s: expected %d, got %d",
        s->name, expected, actual);
    }
}

#define print_error_prefix() fprintf(stderr, "%s: ", program_name)
#define print_error_suffix() fprintf(stderr, " (on line %d)\n", line_number)

void warning(const char *msg, ...) {
  va_list args;
  va_start(args, msg);

  print_error_prefix();
  vfprintf(stderr, (msg), args);
  print_error_suffix();

  va_end(args);
}

void exec_error(const char *msg , ...) {
  va_list args;
  va_start(args, msg);

  print_error_prefix();
  vfprintf(stderr, (msg), args);
  print_error_suffix();

  va_end(args);

  longjmp(begin, 0);
}
machine.c

#include "hoc.h"
#include "hoc.tab.h"
#include <assert.h>
#include <stddef.h>  // NULL
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

/*
 * MACHINE STATE
 * =============
 *
 * We have our "program" stored as a list of MachineCode
 * objects, each encoding one of:
 *   - an instruction
 *   - a literal value
 *   - a symbol reference
 *
 * We "execute" our program by setting the program counter (pc)
 * pointer to the first MachineCode (which *must* be an instruction),
 * and handling each instruction we find. Within an instruction, the
 * program counter might be updated so that by the *end* of any
 * instruction, the program counter is always pointing to another
 * instruction.
 *
 *
 * The Stack
 * ---------
 * Each instruction manipulates the stack, pushing,
 * popping, or combining values from the stack.  The stack elements
 * are Datum objects, which act much like MachineCodes, but with two
 * differences.
 *
 * 1. Datums cannot represent an instruction; they resolve to a value
 * 2. The type of a datum can always be resolved from their context,
 *    and so no "type" field is necessary.
 *
 * ┌────────────────────────────────────────────────────┐
 * │   == PROG ==                 == STACK ==           │
 * │ ┌─────────────┐◀───┐       ┌─────────────┐         │
 * │ │ MachineCode │    pc      │    Datum    │         │
 * │ ├─────────────┤            ├─────────────┤         │
 * │ │ MachineCode │            │    Datum    │         │
 * │ ├─────────────┤            ├─────────────┤         │
 * │ │ MachineCode │   progp    │    Datum    │ stackp  │
 * │ ├─────────────┤◀────┘      ├─────────────┤ ◀──┘    │
 * │ │     ...     │            │     ...     │         │
 * │ └─────────────┘            └─────────────┘         │
 * └────────────────────────────────────────────────────┘
 */

#define PROGSIZE 2000
static MachineCode prog[PROGSIZE];  // the installed code

static MachineCode *progp;  // Next free location for code generation
static MachineCode *pc;     // Current location in the executing program

#define STACKSIZE 256
static Datum  stack[STACKSIZE];            // the machine stack
static Datum *stackp;                      // next free spot on the stack
static Datum  last_popped = {.value = 0};  // previous top-of-stack value
static void   stack_push(Datum d);
static Datum  stack_pop(void);

#define is_stop_inst(mc) ((mc)->type == CT_INST && (mc)->inst == STOP_INST)

/** Initialize machine state for startup */
void machine_startup(void) {
  srand(time(NULL));  // use current time as seed for random generator
  machine_reset_program();
}

/** Initialize execution state for a new program */
void machine_reset_program(void) {
  stackp = stack;
  progp = prog;
}

///----------------------------------------------------------------
/// Code Installation
///----------------------------------------------------------------

void install_code(MachineCode mc) {
  if (progp >= prog + PROGSIZE) {
    exec_error("Max program size exceeded");
  }

  *(progp++) = mc;
}

///----------------------------------------------------------------
/// Execution
///----------------------------------------------------------------

void execute(MachineCode *addr) {
  pc = (addr == NULL) ? prog : addr;

  while (!is_stop_inst(pc)) {
    if (pc->type != CT_INST) {
      exec_error("Programming error, type '%c' was not INST!", pc->type);
    }

    (*(*pc++).inst)();  // increment PC and execute inst
  }
}

/** Push new value onto stack */
static void stack_push(Datum d) {
  if (stackp >= stack + STACKSIZE) {
    exec_error("Stack overflow");
  }
  *(stackp++) = d;  // copy
}

static Datum stack_pop(void) {
  if (stackp <= stack) {
    exec_error("Stack underflow");
  }
  last_popped = *(--stackp);
  return last_popped;
}

int inst_pushlit(void) {
  assert(pc->type == CT_LITERAL);

  Datum d = (Datum){.value = pc->literal};
  stack_push(d);
  pc++;  // this instruction is two "words" long
  return 0;
}

int inst_pushref(void) {
  assert(pc->type == CT_SYM_REF);

  Datum d = (Datum){.symbol = pc->symbol};
  stack_push(d);
  pc++;  // this instruction is two "words" long
  return 0;
}

int inst_pushlast(void) {
  stack_push(last_popped);
  return 0;
}

int inst_eval(void) {
  Datum d = stack_pop();  // must be symbol
  switch (d.symbol->type) {
    case UNDEF:
      exec_error("Undefined variable: '%s'", d.symbol->name);
      break;
    case BUILTIN:
      exec_error("Builtin '%s' cannot be evaluated", d.symbol->name);
      break;
    default:  // VAR,CONST
      stack_push((Datum){.value = d.symbol->data.val});
      break;
  }
  return 0;
}

int inst_call(void) {
  Datum s = stack_pop();  // must be symbol
  if (s.symbol->type != BUILTIN) {
    exec_error("Cannot call non-function '%s'", s.symbol->name);
  }
  double result;
  switch (s.symbol->data.func.args) {
    case 0:
      result = s.symbol->data.func.call0();
      break;
    case 1: {
      Datum arg = stack_pop();
      result = s.symbol->data.func.call1(arg.value);
      break;
    }
    case 2: {
      Datum arg2 = stack_pop();
      Datum arg1 = stack_pop();
      result = s.symbol->data.func.call2(arg1.value, arg2.value);
      break;
    }
  }
  stack_push((Datum){.value = result});
  return 0;
}

int inst_assignvar(void) {
  Datum lhs = stack_pop();  // must be symbol
  Datum rhs = stack_pop();  // must be value
  if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
    exec_error("Cannot assign to symbol '%s'", lhs.symbol->name);
  }
  lhs.symbol->data.val = rhs.value;
  lhs.symbol->type = VAR;
  stack_push((Datum){.value = rhs.value});
  return 0;
}

int inst_assignconst(void) {
  Datum lhs = stack_pop();  // must be symbol
  Datum rhs = stack_pop();  // must be value
  if (lhs.symbol->type == CONST) {
    exec_error("Cannot reassign constant '%s'", lhs.symbol->name);
  } else if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
    exec_error("Cannot assign to symbol '%s'", lhs.symbol->name);
  }
  lhs.symbol->data.val = rhs.value;
  lhs.symbol->type = CONST;
  stack_push((Datum){.value = rhs.value});
  return 0;
}

int inst_pop(void) {
  stack_pop();
  return 0;
}

int inst_print(void) {
  Datum d = stack_pop();
  printf("\t%.8g\n", d.value);
  return 0;
}

int inst_add(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push((Datum){.value = lhs.value + rhs.value});
  return 0;
}

int inst_sub(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push((Datum){.value = lhs.value - rhs.value});
  return 0;
}

int inst_div(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  if (rhs.value == 0) {
    exec_error("Division by zero");
  } else {
    stack_push((Datum){.value = lhs.value / rhs.value});
  }
  return 0;
}

int inst_mul(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value
  stack_push((Datum){.value = lhs.value * rhs.value});
  return 0;
}
math.c

#include "hoc.h"
#include <errno.h>
#include <fenv.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>  // rand

#define HOC_RAND_MAX 1000000L

static double check_math_err(double result, const char *op);

double Abs(double x) {
  return check_math_err(fabs(x), "fabs");
}

double Acos(double x) {
  return check_math_err(acos(x), "acos");
}

double Atan(double x) {
  return check_math_err(atan(x), "atan");
}

double Atan2(double x, double y) {
  return check_math_err(atan2(x, y), "atan2");
}

double Cos(double x) {
  return check_math_err(cos(x), "cos");
}

double Exp(double x) {
  return check_math_err(exp(x), "exp");
}

double Integer(double x) {
  return (double)(long)x;
}

double Lg(double x) {
  return check_math_err(log2(x), "lg");
}

double Ln(double x) {
  return check_math_err(log(x), "log");
}

double Log10(double x) {
  return check_math_err(log10(x), "log10");
}

double Pow(double x, double y) {
  return check_math_err(pow(x, y), "exponentiation");
}

double Random(void) {
  // We have HOC_RAND_MAX buckets
  const int bucket_size = RAND_MAX / HOC_RAND_MAX;
  int       result;
  do {
    result = rand() / bucket_size;
  } while (result >= HOC_RAND_MAX);

  return result + 1;
}

double Sin(double x) {
  return check_math_err(sin(x), "sin");
}

double Sqrt(double x) {
  return check_math_err(sqrt(x), "sqrt");
}

static double check_math_err(double result, const char *op) {
  static const char *domain_msg = "argument outside domain";
  static const char *range_msg = "result outside range";
  static const char *other_msg = "floating-point exception occurred";

  const char *msg = NULL;
  if ((math_errhandling & MATH_ERREXCEPT) && fetestexcept(FE_ALL_EXCEPT)) {
    // Special case: Inexact results are not errors.
    if (fetestexcept(FE_INEXACT)) {
      goto done;
    }

    if (fetestexcept(FE_INVALID)) {
      msg = domain_msg;
    } else if (fetestexcept(FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW)) {
      msg = range_msg;
    } else {  // unknown
      msg = other_msg;
    }

    feclearexcept(FE_ALL_EXCEPT);
  } else if (errno) {
    if (errno == EDOM) {
      msg = domain_msg;
    } else if (errno == ERANGE) {
      msg = range_msg;
    } else {
      msg = other_msg;
    }
    errno = 0;
  }

  if (msg != NULL) {
    exec_error("math error during %s: %s", op, msg);
  }

done:
  return result;
}
symbol.c

#include "hoc.h"
#include "hoc.tab.h"  // generated from yacc -d on our grammar
#include <stddef.h>   // NULL
#include <stdlib.h>   // malloc
#include <string.h>   // strcmp

static Symbol *symbol_table = NULL;

void *emalloc(size_t nbytes);

Symbol *lookup(const char *name) {
  Symbol *current = symbol_table;
  while (current != NULL) {
    if (strcmp(current->name, name) == 0) {
      return current;
    }
    current = current->next;
  }

  return NULL;
}

Symbol *install_symbol(const char *name, short type, double value) {
  size_t  name_len = strlen(name);
  Symbol *s = emalloc(sizeof(Symbol)                       // actual symbol
                      + (sizeof(char)) * (name_len + 1));  // room for name

  s->name = (char *)(s + 1);
  strcpy(s->name, name);
  s->name[name_len] = '\0';

  s->type = type;
  s->data.val = value;
  s->next = symbol_table;
  symbol_table = s;
  return s;
}

void *emalloc(size_t nbytes) {
  void *ptr = malloc(nbytes);
  if (ptr == NULL) {
    exec_error("Out of memory!");
  }
  return ptr;
}