oddly accurate

hoc4

hoc5: Flow Control

hoc6

hoc5 builds on the VM model constructed for hoc4, enabling more complex constructs like if/else branching and while loops. As we’ll see, these flow-control constructs will be more useful if we can also perform some limited output, so we’ll add a print keyword as well.

These constructs are the first step to executing larger programs, instead of the single-line “programs” executed by hoc4. We’ll begin with an overview of these new features to better understand how or implementation might change.

Feature: Relational and Boolean Operators

The first feature we’ll need for conditional logic is a set of relational operators, so that we can compare two values. We’ll add the standard set of ==, !=, <, <=, >, and >=.

These operators, like our arithmetic operators, will be compiled into instructions which will pop two values from the stack, then push the value 1 if the given comparison is true, and 0 otherwise.

The boolean operators, && and ||, operate the same way. In hoc5, unlike most languages, we’ll always evaluate both operands, instead of short-circuiting. We do want to inherit C’s approach to boolean precedence, however. Namely, the boolean operators should have a very low precedence, so that expressions like:

x < y && y < z

Are evaluated as:

Parsing boolean operators correctly

Instead of:

Parsing boolean operators incorrectly

Feature: Conditional Logic

The hoc language up to this point is meant for line-by-line interactions. The user enters an expression, and hoc prints a response. This “calculator-style” of interaction does not work as well once we introduce conditional logic. Consider:

x = 10
y = 1
while (x > 0) {
    y = y * x
    x = x - 1
}
lg(y)

hoc code for lg(10!)

The first two lines of this example will execute correctly in hoc4. Each one will be parsed into a tiny “assignment” program, and execute()ed in the virtual machine.

The while loop, however, presents some brand-new challenges. There are at least 3 hoc4 “programs” we can see inside this snippet of code.

Loop Components

Furthermore, some of these programs will be executed a variable number of times:

Obviously, our parser cannot know how many iterations of the loop we will run, so the machine must be able to “jump” between the condition, body, and post-body sections on its own.

We can visualize its behavior as follows:

Jumping around our loop

We’ll need a new set of instructions for this; branching instructions, which allow us to move our pc to a new value and continue executing.

We’ll still need some help from the parser, since our machine needs to be told what address in the program to jump to. We can figure this out at parsing time, but only after the whole loop is parsed and all of its code is installed. So, we’ll need to be able to install the entire while construct, including the conditional expression and loop body, before we execute any of it. This will require some structural changes to our grammar.

Implementing Relational and Boolean Operators

For the most part, implementing relational operations is a straightforward extension of our current arithmetic operations. There is a small additional challenge regarding their syntax: the operators themselves may be 1 or 2 characters, and their prefixes overlap. For example, we must be able to distinguish < from <=, regardless of whitespace and surrounding digits/letters, e.g. 2 <=5 vs 2< 5.

Step 1: Add Operator Tokens

The relational operations are left-associative, just like arithmetic. They are set to a lower priority than any arithmetic operation, since we want x < y + 2 to mean what we would expect. Similarly, we want most operations to be complete before evaluating the OR and AND operators.

  %type   <hoc_symbol>    assignable
  %right  '='       /* right-associative, much like C */
+ %left   OR
+ %left   AND
+ %left   EQ GT GTE LT LTE NEQ
  %left '+' '-'   /* left-associative, same precedence */
  %left '*' '/'   /* left-assoc; higher precedence */
  %left UNARY_MINUS NOT /* Prefix operators have highest priority*

hoc5/hoc.y

Step 2: Updating our Lexer

Next, at the end of the lexing code, we add logic to handle the case where the character we’ve read matches more than one operator.

  /* our simple, hand-rolled lexer. */
  int yylex(void) {
      static bool hit_eof = false;
      int c;

      // Skip space
      // Get numbers
      ...
      if (isalpha(c)) { // Symbols
      ...
+     switch (c) {
+       case '<':
+           return lookahead('=', LTE, LT);
+       case '>':
+           return lookahead('=', GTE, GT);
+       case '=':
+           return lookahead('=', EQ, '=');
+       case '!':
+           return lookahead('=', NEQ, NOT);
+       case '|':
+           return lookahead('|', OR, '|');
+       case '&':
+           return lookahead('&', AND, '&');
+       default:
+           return c;
+     }
+ }
+

hoc5/hoc.y

The lookahead function “peeks” at the next character in our input stream. If it matches a two-character operator, then we consume both characters and return the corresponding token. Otherwise, we put the the character back onto the front of the input stream for the next lexing pass, and return the otherwise token.

/**
 * "Looks ahead 1 character for a possibility. If it's found, returns
 * `match`. If not:
 * - places character back onto input stream for next read
 * - returns the `otherwise` result.
 */
static int lookahead(char expect, int match, int otherwise) {
    int c = getchar();
    if (c == expect) {
        return match;
    } else {
        ungetc(c, stdin);
        return otherwise;
    }
}

hoc5/hoc.y

Step 3: Grammar Changes

The relational operations fit nicely into our existing expr productions. We’ll need to add references to the new instructions into hoc.h as well.

  expr:         NUMBER
...
        |       expr '+' expr   { install_instruction(inst_add); }
        |       expr '-' expr   { install_instruction(inst_sub); }
        |       expr '*' expr   { install_instruction(inst_mul); }
        |       expr '/' expr   { install_instruction(inst_div); }
+       |       expr LT expr    { install_instruction(inst_lt); }
+       |       expr LTE expr   { install_instruction(inst_lte); }
+       |       expr GT expr    { install_instruction(inst_gt); }
+       |       expr GTE expr   { install_instruction(inst_gte); }
+       |       expr EQ expr    { install_instruction(inst_eq); }
+       |       expr NEQ expr   { install_instruction(inst_neq); }
+       |       expr AND expr   { install_instruction(inst_and); }
+       |       expr OR expr    { install_instruction(inst_or); }
...

hoc5/hoc.y

Step 4: Machine Instructions

Each instruction operates just like arithmetic; the operands are already on the stack, in reverse order. We pop both, perform our operation, and push the result.

The <= operator is below; the rest are similar.

int inst_lte(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value <= rhs.value));
  return 0;
}

hoc5/machine.c

You should now be able to test simple relational expressions, such as 1 <= 0 or 42 > 13 && 0 < 1.

Implementing Loops

There are several changes we’ll need to make to the language to incorporate flow-control constructs. We’ll add the functionality bit-by-bit, solving one problem at a time.

Step 1: Adding Keywords

Before we do anything else, we have one easy requirement: we’ll need to be able to recognize some keywords for our language now; hoc5 requires all of: while, if, else, and print.

Luckily, we already have a mechanism for matching multi-character tokens in our language; the symbol table. We will extend our approach to BUILTIN tokens to handle keywords as well.

First, we’ll need to setup tokens for each of the keywords:

 %token  <hoc_value>     NUMBER
 %token  <hoc_symbol>    VAR CONST BUILTIN UNDEF
+%token  <hoc_symbol>    IF_KW ELSE_KW PRINT_KW WHILE_KW
 %type   <hoc_symbol>    assignable

hoc5/hoc.y

Then, just as we did for our builtin constants and functions, we create a small array of “input data” structs for use during install_hoc_builtins, and install them the same way.

+static struct {
+  const char *name;
+  int         token_type;
+} keywords[] = {
+    {"if", IF_KW},
+    {"else", ELSE_KW},
+    {"print", PRINT_KW},
+    {"while", WHILE_KW},
+    {NULL, 0},
+};

void install_hoc_builtins(void) {
  for (int i = 0; constants[i].name != NULL; i++) {
    install_symbol(constants[i].name, CONST, constants[i].value);
  }
+
+ for (int i = 0; keywords[i].name != NULL; i++) {
+   install_symbol(keywords[i].name, keywords[i].token_type, 0.0);
+ }

  for (int i = 0; builtins[i].name != NULL; i++) {
      ...

hoc5/builtins.c

Step 2: Introducing Statements

The next problem we’ll tackle actually occurs in the “middle” of our loop constructs. Consider the factorial example from above:

x = 10
y = 1
while (x > 0) {
    y = y * x
    x = x - 1
}

Let’s focus on the body of this loop. We know that we will need to parse the entire body before we execute anything. Since our current grammar stops after every terminated expression, there’s no way for our current grammar to parse the body before executing it. We’ll need a new construct for these situations. We could try to add a new kind of expression, such as loop. Then we could match multiple “sub-expressions” inside our loop body, using the same kind of production that we use for our REPL list nonterminal. e.g.

expr:   ...
    |   while '(' expr ')' '{' loop_body '}'


loop_body: /* nothing */
    |   loop_body expr terminator
        loop_body terminator

One possibility: creating a loop expression

There are two questions to answer here:

  1. What do we do with the value left on the stack by the expr in loop_body?
  2. If this construct is an expression, what value does it leave on the stack?

We could certainly account for the first question, by using our action to remove each line’s expression from the stack. The second question raises more new questions than it answers; for example, if this construct is an expression, does that mean we can use it as the right-side of an assignment?

For the following example, ask yourself:

x = 10
y = 1
z = while (x > 0) {
    y = y * x
    x = x - 1
}

While this is perfectly acceptable in some languages, for the sake of adhering to C-like language constructs, (and consistency with UPE), we’ll decide that no, we will not allow this. Instead, we will distinguish expressions, which leave a value on the top of the machine stack, from statements, which do not.

Since we’ll want to combine expressions and statements together in our loop bodies, we’ll say that every expression may be used as a statement, leading us to the following productions:

stmt:           expr { install_instruction(inst_pop); }
        |       '{' stmtlist '}

stmtlist:       /* nothing */
        |       stmtlist stmt terminator
        |       stmtlist terminator

hoc5/hoc.y

Our stmt: expr prodcution allows us to use any expression in our loop bodies, and the stmtlist productions allow as many statements inside braces as we want, including empty lines.

We must also make sure that it’s acceptable for statements to appear as top-level entries in our list production. Unlike expressions, there is no value to print, so our action will merely be to STOP.

list:       /* nothing */
        |   list terminator
+       |   list stmt terminator
+           {
+             install_instruction(STOP_INST);
+             YYACCEPT;
+           }
        |   list assign terminator
            {
                install_instruction(inst_pop);
                install_instruction(STOP_INST);
                YYACCEPT;
            }

hoc5/hoc.y

Review Q: Why do we have to install STOP after a stmt? A: Otherwise, the machine’s execute() function might continue to execute instructions installed by older statements / expressions.

Sadly, this will add another shift/reduce conflict to our compilation output. (stmtlist and list both match terminator-separated statements).

If you’re following along, the following program should now parse correctly; notice that once you type the opening brace, the REPL will not evaluate or print anything until you close it and write an expression on a new line.

{
  x = 12
  y = 14
  z = x + y
}
z

Testing statements

Step 3: Exposing Addresses

Next, let’s add the infrastructure required for specifying and “jumping to” a specific location in our VM program.

In order for our language to include flow control, there must be the concept of an instruction’s address within the current program, and the ability for the machine to change its program counter to a given address.

To represent addresses, we add an Addr typedef, which simply points to a location in our prog array; that is, a MachineCode pointer. So that we can use the Addr type inside our MachineCode definition as well, we need to forward-declare the MachineCode struct, leading to a small rework of the declaration.

+ typedef struct MachineCode MachineCode;
+ typedef MachineCode       *Addr;

- typedef struct MachineCode {
+ struct MachineCode {
+   CodeType type;
+   union {
+     double  literal;
+     Inst    inst;
+     Symbol *symbol;
+     Addr    addr;
+   };  // anonymous union; no need to name it
- } MachineCode;
+ };
+

hoc5/hoc.h

In our diagram of a compiled while statement, we see several GO actions that seem like they would need addresses.

Highlighted actions require addresses

Thus, we’ll need to be able to embed an address as part of an instruction, so we need to allow our MachineCode objects to hold addresses.

 ...
 typedef int (*Inst)(void);

 typedef enum CodeType {
   CT_INST = 'I',
   CT_LITERAL = 'L',
   CT_SYM_REF = 'S',
+  CT_ADDR = 'A',
 } CodeType;

 struct MachineCode {
   CodeType type;
   union {
     double  literal;
     Inst    inst;
     Symbol *symbol;
+    Addr    addr;
   };
 };

hoc5/hoc.h

Accessing Addresses During Parsing

In order to embed addresses into our instructions, they will need to be available at parse time. Luckily, since hoc4 moved value computation from the parser to the machine, our expressions no longer produced any value for use by other actions. Now that larger statements will need to use the addresses of child statements, we can update our productions to ensure that each expression and statement will return the address at which its parsed code block starts.

Address-Aware Installation

First, we need to enable the machine to report its current “installation address” through its code-installation interface. The parsing routines can use these addresses to report where they’ve installed code.

 /**
  * Installs the given `MachineCode` object into the next available
  * location in the machine's current program. A runtime error will
  * occur if the program is full.
+ *
+ * Returns the address at which the code was installed.
  */
- void install_code(MachineCode mc);
+ Addr install_code(MachineCode mc);

+ /** The next available address in the machine's current program */
+ extern Addr progp;

hoc5/hoc.h

The machine implementation changes only slightly; we expose progp externally, and update install_code to return the installation address:

-static MachineCode *progp;  // Next free location for code installation
+Addr progp;  // Next free location for code installation
...

-void install_code(MachineCode mc) {
+Addr install_code(MachineCode mc) {
   if (progp >= prog + PROGSIZE) {
     exec_error("Max program size exceeded");
   }
-  *(progp++) = mc;
+  *progp = mc;
+  return progp++;  // return location of THIS instruction
 }

hoc5/machine.c

Handling Addresses in our Parser

Next, we can update our grammar to require every expression and statement to report their start address. This means any nonterminals used by expr, such as assign or call, will also return addresses.

 %union {
   double hoc_value;
   Symbol *hoc_symbol;
+  Addr hoc_address;
 }
 ...
+ %type   <hoc_address>   assign expr call end stmt stmtlist

hoc5/hoc.y

Most of the rules are straightforward; for example, NUMBER and VAR expressions just return the address of the first instruction they install.

  expr:       NUMBER
              {
-                 install_instruction(inst_pushlit);
                  $$ = install_instruction(inst_pushlit);
                  install_literal($1);
              }
      |       '@'
              {
-                 install_instruction(inst_pushlast);
+                 $$ = install_instruction(inst_pushlast);
              }
      |       assignable
              {
-                 install_instruction(inst_pushref);
+                 $$ = install_instruction(inst_pushref);
                  install_ref($1);
                  install_instruction(inst_eval);
              }

Returning addresses of numbers and variables hoc5/hoc.y

Operator Expressions

Surprisingly, our arithmetic, relational, and boolean operator actions require no changes at all. Why? The default value produced by an action, if we don’t specify one, is $1; that is, all actions act as though they begin with $$ = $1.

When we are handling the operator productions, we’ve already processed the operand expressions. Since all arithmetic/relational/boolean operators are left-associative, their left-hand expression is evaluated first, and it will produce the address corresponding to pushing its value onto the stack.

Example: Generating Addresses

In the example below, the expression 2 * PI will return the PROG address where we start generating that value. In this case, it’ll produce 10. Since this is exactly what we want, we can leave the default behavior.

Installing binary operators

Notice that this behavior is why our machine expects math-operation arguments in right-to-left order. Since the left-hand side is produced first, it’s lower on the stack!

Assignment Expressions

Like our other operators, by the time we are reducing an assignment, we’ve already installed instructions to generate the left and right sides. Unlike the other operators, assignment is right-associative, and so the first instruction executed to generate the assignment actually comes from generating the right-side operand.

assign:         assignable '=' expr
                {
                    install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_assignvar);
+                   $$ = $3; /* instructions begin with RHS */
                }
        |       assignable ':' '=' expr
                {
                    install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_assignconst);
+                   $$ = $4; /* instructions begin with RHS */
                }

hoc5/hoc.y

Similarly, the call actions must return the address corresponding to producing their first argument

Statement Addresses

We can now fill in the actions for our stmt productions. For a single-expression statement, we have two jobs:

  1. Remove its value from the machine stack, to satisfy the statement guarantee.
  2. Return the address where execution begins for this statement, which is the address where the expression begins.

For a statement list, we have a sequence of zero or more statements. Each statement removes its own stack item, so there’s no need to add any more instructions. But for an empty block, we have no instructions at all; in this case, we’ll produce the address following the block–the next free program address.

stmt:           expr { install_instruction(inst_pop); }
        |       '{' stmtlist '}' { $$ = $2; }

stmtlist:       /* nothing */            { $$ = progp; }
        |       stmtlist stmt terminator { $$ = $1; }
        |       stmtlist terminator      { $$ = $1; }

hoc5/hoc.y

Step 4: Looping Instructions: whilecode

Before we write the parsing logic for while-loop syntax, let’s figure out the structure of the machine instructions, so we have an idea of our “goal output” for the parser.

Our “high-level plan” for while loops was:

  1. Execute the condition, which in most languages may be any expression.
  2. Once the condition produces a result, we must examine the result, and choose one of two addresses: the address of the loop’s body, or the address immediately following it.
  3. Whichever address we choose, we should update the program counter so that we execute code starting at that address.
  4. If the condition was true, then we must go back to the condition’s address to re-evaluate it.

This is a complex set of behaviors, and in most real machines, is achieved by installing “jumping” instructions at appropriate places in the execution flow. We’ll examine that approach in adhoc, but for hoc5, we’ll create a “super-instruction” that handles the looping mechanism itself, directly in C: whilecode.

The Shape of a whilecode instruction

You can think of the whilecode instruction as being a “variable-length” instruction, which includes the instructions for the conditional expression and the loop body.

The whilecode instruction

The whilecode instruction itself is immediately followed by our two addresses: one for the loop body and the other immediately following the body.

After the addresses, we have our condition. Conditions are allowed to be any expression, so we must be able to run all of the instructions that make up the condition, and then stop as soon as we’ve produced the result. We already have a mechanism for doing that: the execute() method. Whwn we execute a whilecode instruction, we’ll treat the condition like a tiny program of its own; we’ll call execute() on its address! This call will simply run all of the instructions from the beginning of cond until it reaches a STOP instruction. (We should be sure to install this while parsing the cond). Likewise, we’ll need to install a STOP instruction after the loop body, so we can again evaluate the condition.

Executing whilecode

Assuming we will eventually install this structure into our machine, we are finally ready to write our execution flow for loops. Upon entering the instruction, our PC is already pointing to the machine code for the body’s address.

int inst_whilecode(void) {
  Addr body_addr = (pc++)->addr;
  Addr end_addr = (pc++)->addr;
  Addr cond_addr = pc;

  while (true) {
    execute(cond_addr);
    if (stack_pop().value) {  // guaranteed: conditions are expressions
      execute(body_addr);
    } else {
      break;
    }
  }

  pc = end_addr;
  return 0;
}

hoc5/machine.c

While we use recursion to execute our loop, we don’t recurse to leave the loop, since that could quickly lead to infinite recursion.

Step 5: Parsing & installing while statements

The syntax of a while statement is familiar:

while (conditional-expression) {
  body
}
(or)
while (conditional-expression) body

We’ll need to parse that syntax into the whilecode instruction we’ve designed.

In particular, we’ll need to install:

This is complicated somewhat by the fact that we won’t know the addresses of the body / post-body instructions until after the body is parsed. To ensure our parse behaviors happen in the appropriate order, we create several “sub-rules” to perform actions at the appropriate times in the parsing process.

  1. Immediately upon recognizing the while keyword, we should reserve space for the whilecode instruction and its addresses. This implies we’ll need a rule just for that keyword
  2. We’ll need rules “in-between” recognizing the condition expression and the body statement, to install a STOP instruction

Below you can see the components of our while loop, and how they correspond to the installed MachineCode.

Populating instructions and addresses for whilecode

Reserving Space: the while Keyword

First off, as soon as we see the while keyword, we should reserve space inside prog for the whilecode instruction and its addresses. This reduction occurs before any other part of the loop is parsed, so these 3 words of program memory come first.


+while: WHILE_KW
+       {
+           $$ = install_instruction(STOP_INST);
+           install_instruction(STOP_INST); // body-addr
+           install_instruction(STOP_INST); // after-body-addr
+       }

hoc5/hoc.y

Don’t forget to mark the whilecode address as the produced value! Our while loops will be statements, so they must produce their starting address.

- %type   <hoc_address>   assign call expr stmt stmtlist
+ %type   <hoc_address>   assign call expr stmt stmtlist while

hoc5/hoc.y

Checking Conditions: the cond Nonterminal

The “conditional” part of our loop is just an expression; the only detail we must take care of is installing a STOP instruction before the loop-body instructions are generated. (Since the condition’s instructions always start two words after the whilecode instruction, there’s no need to report it, so we don’t return anything from a cond nonterminal.)

%%
+cond: '(' expr ')'
+      {
+          install_instruction(STOP_INST);
+      }

hoc5/hoc.y

Installing STOPs: The end rule

A loop body is just a stmt, but like our condition, we must STOP when it’s complete. We need another STOP instruction after the stmt. We use an end nonterminal to do so, as well as to generate the address following the body.

- %type   <hoc_address>   assign call expr stmt stmtlist while
+ %type   <hoc_address>   assign call end expr stmt stmtlist while

hoc5/hoc.y

end is a tiny “helper” used to generate the next available address; this is useful for cases where we might not have a next statement.

end:    /* nothing */
        {
            install_instruction(STOP_INST);
            $$ = progp;
        }

hoc5/hoc.y

Putting it all together: the while statement

We can finally combine all of these nonterminals to create our while statement structure. At the point we’re reducing this rule, all addresses have been completely resolved, and so it’s time to install them into the prog slots reserved for us by the while nonterminal.

 stmt:           expr { install_instruction(inst_pop); }
         |       '{' stmtlist '}' { $$ = $2; }
+        |       while cond stmt end
+                {
+                    $1[0].type = CT_INST;
+                    $1[0].inst = inst_whilecode;
+                    $1[1].type = CT_ADDR;
+                    $1[1].addr = $3;
+                    $1[2].type = CT_ADDR;
+                    $1[2].addr = $4;
+                }

hoc5/hoc.y

Implementing Conditional Logic

Because the if/else statements are so similar to loops, we don’t need to exhaustively cover their implementation. The installed structure is very similar:

The ifcode instruction

Parsing if/else statements

First, the if statement produces an address:

- %type   <hoc_address>   assign call end expr stmt stmtlist while
+ %type   <hoc_address>   assign call end expr if stmt stmtlist while

hoc5/hoc.y

It also reserves space for its addresses, just like while:

if:     IF_KW
        {
            $$ = install_instruction(STOP_INST);
            install_instruction(STOP_INST);
            install_instruction(STOP_INST);
            install_instruction(STOP_INST);
        }

hoc5/hoc.y

Because our else clause is optional, we include two forms in our grammar; they install the same set of addresses , but the if-only variant leaves its else-body address set to the STOP instruction.

stmt:           expr { install_instruction(inst_pop); }
       ...
        |       if cond stmt end
                {
                    $1[0].type = CT_INST;
                    $1[0].inst = inst_ifcode;
                    $1[1].type = CT_ADDR;
                    $1[1].addr = $3;
                    $1[3].type = CT_ADDR;
                    $1[3].addr = $4;
                }
        |       if cond stmt end ELSE_KW stmt end
                {
                    $1[0].type = CT_INST;
                    $1[0].inst = inst_ifcode;
                    $1[1].type = CT_ADDR;
                    $1[1].addr = $3;
                    $1[2].type = CT_ADDR;
                    $1[2].addr = $6;
                    $1[3].type = CT_ADDR;
                    $1[3].addr = $7;
                }

hoc5/hoc.y

Syntax Sidenote: Braces & Newlines (or skip)

Notice that our syntax has relatively strict requirements for the locations of the braces:

// ALLOWED
x = 42
while (x > 0) x = x - 1

// OR
while (x > 0) {
  x = x - 1
  print x
}

// NOT ALLOWED
while (x > 0)
{
  x = x - 1
  print x
}

These unfortunate requirements are due to the fact that we treat the newline terminator as an “execution point” in our grammar (in the list rule), so that we can have an interactive REPL. This could be resolved by treating newlines as whitespace in a non-REPL version of hoc.

Executing if/else statements

Our machine implementation is also very similar to whilecode, though we do not update our pc as we go; rather, we choose to be specific about which word we are accessing.

int inst_ifcode(void) {
  Addr ifbody_addr = pc[0].addr;
  Addr maybe_elsebody_addr = NULL;
  if (!is_stop_inst(pc + 1)) {
    maybe_elsebody_addr = pc[1].addr;
  }

  Addr end_addr = pc[2].addr;
  Addr cond_addr = pc + 3;
  execute(cond_addr);
  Datum d = stack_pop();  // must be value
  if (d.value) {
    execute(ifbody_addr);
  } else if (maybe_elsebody_addr != NULL) {
    execute(maybe_elsebody_addr);
  }

  pc = end_addr;
  return 0;
}

hoc5/machine.c

Implementing a print Statement

Now that we have statements, there will be times when we want to display a value inside a loop or conditional body. Since statements do not produce values, the REPL cannot output them. We will need to display them ourselves.

We add a print statement via a print keyword and associated prexpr instruction.

 stmt:           expr { install_instruction(inst_pop); }
        ...
+        |       PRINT_KW expr
+                {
+                    install_instruction(inst_prexpr);
+                    $$ = $2;
+                }

hoc5/hoc.y

And the machine implementation is very simple; it’s print without a tab.

int inst_prexpr(void) {
  Datum d = stack_pop();
  printf("%.8g\n", d.value);
  return 0;
}

hoc5/machine.c

What’s Next?

hoc5 finishes the core syntax of logic in hoc; we’ve finished the core vocabulary of the language. What remains to build are tools for abstraction; we need a way to reuse blocks of code, rather than writing a program as one huge top-to-bottom script!

hoc6, the last version of hoc presented in The UNIX Programming Environment, solves that issue. It includes the largest and most interesting new feature of any so far: user-defined subroutines! So grab a coffee (or take a break), and then we’ll continue onward.

Source Code

Makefile

CFLAGS ?= -std=c2x -Wall -Wextra -pedantic -Wformat -Wformat-extra-args -Wformat-nonliteral
YFLAGS ?= -d
LDFLAGS=-lm
objects := builtins.o math.o symbol.o machine.o hoc.o

.PHONY:all
all: hoc

hoc: $(objects)
    cc $(CFLAGS) $(objects) $(LDFLAGS) -o hoc

hoc.c hoc.tab.h: hoc.y
    yacc $(YFLAGS) hoc.y -o hoc.tab.c
    mv hoc.tab.c hoc.c

%.o: hoc.h hoc.tab.h %.c
    cc $(CFLAGS) -c -o $@ $*.c

.PHONY:clean
clean:
    rm -f hoc.tab.h
    rm -f $(objects)
    rm -f hoc.c
    rm -f hoc
builtins.c

#include "hoc.h"
#include "hoc.tab.h"
#include <math.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>  // rand

/* Defined in math.c */
extern double Abs(double), Acos(double), Atan(double), Atan2(double, double),
    Cos(double), Exp(double), Integer(double), Lg(double), Ln(double),
    Log10(double), Pow(double, double), Random(void), Sin(double), Sqrt(double);

static struct {
  const char *name;
  double      value;
} constants[] = {
    {"PI", 3.14159265358979323846},
    {"E", 2.71828182845904523536},
    {NULL, 0},
};

static struct {
  const char *name;
  int         args;
  union {
    double (*call0)(void);
    double (*call1)(double);
    double (*call2)(double, double);
  };
} builtins[] = {
    {"abs", 1, .call1 = Abs},
    {"acos", 1, .call1 = Acos},
    {"atan", 1, .call1 = Atan},
    {"atan2", 2, .call2 = Atan2},
    {"cos", 1, .call1 = Cos},
    {"exp", 1, .call1 = Exp},
    {"int", 1, .call1 = Integer},
    {"lg", 1, .call1 = Lg},
    {"ln", 1, .call1 = Ln},
    {"log10", 1, .call1 = Log10},
    {"pow", 2, .call2 = Pow},
    {"rand", 0, .call0 = Random},
    {"sin", 1, .call1 = Sin},
    {"sqrt", 1, .call1 = Sqrt},
    {NULL, 0, .call0 = NULL},
};

static struct {
  const char *name;
  int         token_type;
} keywords[] = {
    {"if", IF_KW},
    {"else", ELSE_KW},
    {"print", PRINT_KW},
    {"while", WHILE_KW},
    {NULL, 0},
};

extern void command_list(void);
static struct {
  const char *name;
  void (*command_func)(void);
} commands[] = {
    {"LIST", command_list},
    {NULL, 0},
};

void install_hoc_builtins(void) {
  for (int i = 0; constants[i].name != NULL; i++) {
    install_symbol(constants[i].name, CONST, constants[i].value);
  }
  for (int i = 0; builtins[i].name != NULL; i++) {
    Symbol *s = install_symbol(builtins[i].name, BUILTIN, 0.0);
    // TODO: Why not have separate install_value vs install_function?
    s->data.func.args = builtins[i].args;
    switch (builtins[i].args) {
      case 0:
        s->data.func.call0 = builtins[i].call0;
        break;
      case 1:
        s->data.func.call1 = builtins[i].call1;
        break;
      case 2:
        s->data.func.call2 = builtins[i].call2;
        break;
      default:
        fprintf(
            stderr, "Invalid arg count for symbol '%s'\n", builtins[i].name);
        exit(1);
    }
  }

  for (int i = 0; keywords[i].name != NULL; i++) {
    install_symbol(keywords[i].name, keywords[i].token_type, 0.0);
  }

  for (int i = 0; commands[i].name != NULL; i++) {
    Symbol *s = install_symbol(commands[i].name, COMMAND, 0.0);
    s->data.command_func = commands[i].command_func;
  }
}
hoc.h

#ifndef HOC_INC
#define HOC_INC 1

/*
 * Global types & declarations for use in hoc
 */
#include <stddef.h>

///----------------------------------------------------------------
/// Symbols
///----------------------------------------------------------------

#define SYMBOL_NAME_LIMIT 100
typedef struct Symbol Symbol;

/** Function descriptor for builtins: these always return doubles */
struct BuiltinFunc {
  int args;
  union {
    /* We use an anonymous union for the function pointer; there's
       no meaningful name for the union as a field */
    double (*call0)(void);
    double (*call1)(double);
    double (*call2)(double, double);
  };
};

struct Symbol {
  short   type;  // Generated from our token types
  char   *name;
  Symbol *next;
  union {
    double             val;
    struct BuiltinFunc func;
    void (*command_func)(void);
  } data;
};

Symbol *install_symbol(const char *name, short type, double value);
Symbol *lookup(const char *name);

///----------------------------------------------------------------
/// Machine Code
///----------------------------------------------------------------

typedef int (*Inst)(void);

typedef enum CodeType {
  CT_INST = 'I',
  CT_LITERAL = 'L',
  CT_SYM_REF = 'S',
  CT_ADDR = 'A'
} CodeType;

typedef struct MachineCode MachineCode;
typedef MachineCode       *Addr;

struct MachineCode {
  CodeType type;
  union {
    double  literal;
    Inst    inst;
    Symbol *symbol;
    Addr    addr;
  };  // anonymous union; no need to name it
};

///----------------------------------------------------------------
/// Machine Program
///----------------------------------------------------------------

/**
 * Installs the given `MachineCode` object into the next available
 * location in the machine's current program. A runtime error will
 * occur if the program is full.
 *
 * Returns the address at which the code was installed.
 */
Addr install_code(MachineCode mc);

/** The next available address in the machine's current program */
extern Addr progp;

#define STOP_INST NULL
#define is_stop_inst(mc) ((mc)->type == CT_INST && (mc)->inst == STOP_INST)

#define install_literal(lit) \
  install_code((struct MachineCode){.type = CT_LITERAL, .literal = lit})

#define install_ref(sp) \
  install_code((struct MachineCode){.type = CT_SYM_REF, .symbol = sp})

#define install_instruction(instr) \
  install_code((struct MachineCode){.type = CT_INST, .inst = instr})

#define install_address(address) \
  install_code((struct MachineCode){.type = CT_ADDR, .addr = address})

void execute(MachineCode *addr);

///----------------------------------------------------------------
/// Instructions
///----------------------------------------------------------------

int inst_add(void);
int inst_and(void);
int inst_assignvar(void);
int inst_assignconst(void);
int inst_call(void);
int inst_div(void);
int inst_eval(void);
int inst_eq(void);
int inst_gt(void);
int inst_gte(void);
int inst_ifcode(void);
int inst_lt(void);
int inst_lte(void);
int inst_mul(void);
int inst_neq(void);
int inst_not(void);
int inst_or(void);
int inst_pop(void);
int inst_prexpr(void);
int inst_print(void);
int inst_pushlast(void);
int inst_pushlit(void);
int inst_pushref(void);
int inst_sub(void);
int inst_whilecode(void);

///----------------------------------------------------------------
/// Commands
///----------------------------------------------------------------

void command_list(void);
void machine_run_command(Symbol *);

///----------------------------------------------------------------
/// Machine Stack
///----------------------------------------------------------------

typedef union Datum {
  double  value;
  Symbol *symbol;
} Datum;

void machine_startup(void);
void machine_reset_program(void);

///----------------------------------------------------------------
/// Other
///----------------------------------------------------------------

/** global error handling */
__attribute__((format(printf, 1, 2))) void warning(const char *msg, ...);
__attribute__((format(printf, 1, 2))) void exec_error(const char *msg, ...);

/** builtins */
void install_hoc_builtins(void);

#endif  // HOC_INC
hoc.y

%{
///----------------------------------------------------------------
/// C Preamble
///----------------------------------------------------------------

/*
 * "higher-order calculator" - Version 5
 * Adapted from "The UNIX Programming Environment"
 */

#include <stdio.h>
#include <stdbool.h>
#include <ctype.h>
#include <signal.h>
#include <setjmp.h>
#include <stdarg.h>
#include "hoc.h"

///----------------------------------------------------------------
/// global state
///----------------------------------------------------------------

jmp_buf begin;

///----------------------------------------------------------------
/// local declarations
///----------------------------------------------------------------

#define YACC_SUCCESS 0
int yylex(void);
void yyerror(const char *s);
void check_call_args(const Symbol *s, int actual);

/*
 * Grammar points:
 *
 * - EXPRESSIONS provide a value on the top of the satck.
 * - STATEMENTS are not expressions, and do not provide a value.
 */

%}

%union {
  double hoc_value;
  Symbol *hoc_symbol;
    Addr hoc_address;
}

%token  <hoc_value>     NUMBER
%token  <hoc_symbol>    VAR CONST BUILTIN UNDEF
%token  <hoc_symbol>    IF_KW ELSE_KW PRINT_KW WHILE_KW COMMAND
%type   <hoc_symbol>    assignable
%type   <hoc_address>   assign call end expr if stmt stmtlist while
%right  '='       /* right-associative, much like C */
%left   OR
%left   AND
%left   EQ GT GTE LT LTE NEQ
%left   '+' '-'   /* left-associative, same precedence */
%left   '*' '/'   /* left-assoc; higher precedence */
%right  '^'       /* exponents */
%left   UNARY_MINUS NOT /* Even higher precedence
                        than mul or div. Can't use '-', as we've used
                        it already for subtraction. */
%%
list:           /*  nothing */
        |       list terminator
        |       list stmt terminator
                {
                    install_instruction(STOP_INST);
                    YYACCEPT;
                }
        |       list assign terminator
                {
                    install_instruction(inst_pop);
                    install_instruction(STOP_INST);
                    YYACCEPT;
                }
        |       list expr terminator
                {
                    install_instruction(inst_print);
                    install_instruction(STOP_INST);
                    YYACCEPT;
                }
        |       list error terminator { yyerrok; }
        ;
stmt:           expr { install_instruction(inst_pop); }
        |       '{' stmtlist '}' { $$ = $2; }
        |       while cond stmt end
                {
                    $1[0].type = CT_INST;
                    $1[0].inst = inst_whilecode;
                    $1[1].type = CT_ADDR;
                    $1[1].addr = $3;
                    $1[2].type = CT_ADDR;
                    $1[2].addr = $4;
                }
        |       if cond stmt end
                {
                    $1[0].type = CT_INST;
                    $1[0].inst = inst_ifcode;
                    $1[1].type = CT_ADDR;
                    $1[1].addr = $3;
                    $1[3].type = CT_ADDR;
                    $1[3].addr = $4;
                }
        |       if cond stmt end ELSE_KW stmt end
                {
                    $1[0].type = CT_INST;
                    $1[0].inst = inst_ifcode;
                    $1[1].type = CT_ADDR;
                    $1[1].addr = $3;
                    $1[2].type = CT_ADDR;
                    $1[2].addr = $6;
                    $1[3].type = CT_ADDR;
                    $1[3].addr = $7;
                }
        |       PRINT_KW expr
                {
                    install_instruction(inst_prexpr);
                    $$ = $2;
                }
        |       COMMAND
                {
                    machine_run_command($1);
                    $$ = progp;
                }
        ;
stmtlist:       /* nothing */ { $$ = progp; }
        |       stmtlist stmt terminator { $$ = $1; }
        |       stmtlist terminator { $$ = $1; }
        ;
while:          WHILE_KW
                {
                    $$ = install_instruction(STOP_INST);
                    install_instruction(STOP_INST);
                    install_instruction(STOP_INST);
                }
        ;
if:             IF_KW
                {
                    $$ = install_instruction(STOP_INST);
                    install_instruction(STOP_INST);
                    install_instruction(STOP_INST);
                    install_instruction(STOP_INST);
                }
        ;
cond:           '(' expr ')'
                {
                    install_instruction(STOP_INST);
                }
        ;
end:            /* nothing */
                {
                    install_instruction(STOP_INST);
                    $$ = progp;
                }
        ;
/*
 * Every expr construct, when *executed*, must push a value onto the
 * stack
 */
expr:           NUMBER
                {
                    $$ = install_instruction(inst_pushlit);
                    install_literal($1);
                }
        |       '@'
                {
                    $$ = install_instruction(inst_pushlast);
                }
        |       assignable
                {
                    $$ = install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_eval);
                }
        |       assign
        |       call
        |       expr '+' expr   { install_instruction(inst_add); }
        |       expr '-' expr   { install_instruction(inst_sub); }
        |       expr '*' expr   { install_instruction(inst_mul); }
        |       expr '/' expr   { install_instruction(inst_div); }
        |       expr LT expr    { install_instruction(inst_lt); }
        |       expr LTE expr   { install_instruction(inst_lte); }
        |       expr GT expr    { install_instruction(inst_gt); }
        |       expr GTE expr   { install_instruction(inst_gte); }
        |       expr EQ expr    { install_instruction(inst_eq); }
        |       expr NEQ expr   { install_instruction(inst_neq); }
        |       expr AND expr   { install_instruction(inst_and); }
        |       expr OR expr    { install_instruction(inst_or); }
        |       expr '^' expr
                {
                    install_instruction(inst_pushref);
                    install_ref(lookup("pow"));
                    install_instruction(inst_call);
                }
        |       '-'expr         %prec UNARY_MINUS /* prec makes us bind tighter than subtraction */
                {
                    install_instruction(inst_pushlit);
                    install_literal(-1);
                    install_instruction(inst_mul);
                    $$=$2;
                }
        |       NOT expr %prec UNARY_MINUS
                {
                    install_instruction(inst_not);
                    $$=$2;
                }
        |       '(' expr ')' { $$ = $2; }
        ;
assign:         assignable '=' expr
                {
                    install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_assignvar);
                    $$ = $3; /* instructions begin with RHS */
                }
        |       assignable ':' '=' expr
                {
                    install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_assignconst);
                    $$ = $4; /* instructions begin with RHS */
                }
        ;
call:           BUILTIN '(' ')'
                {
                    check_call_args($1, 0);
                    $$ = install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_call);
                }
        |       BUILTIN '(' expr ')'
                {
                    check_call_args($1, 1);
                    install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_call);
                    $$ = $3; /* Block starts with pushing args */
                }
        |       BUILTIN '(' expr ',' expr ')'
                {
                    check_call_args($1, 2);
                    $$ = install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_call);
                    $$ = $3; /* Block starts with pushing args */
                }
        ;
assignable: VAR | CONST
        ;
terminator: '\n' | ';'
        ;
%% // end of grammar


/* error tracking */
char *program_name;
int line_number = 1;

int main(int argc, char *argv[]) {
    (void)argc;

    program_name = argv[0];
    install_hoc_builtins();
    machine_startup();

    setjmp(begin);
    machine_reset_program();
    while (yyparse() == YACC_SUCCESS && !feof(stdin)) {
        execute(NULL);
        machine_reset_program();
    }
    return 0;
}

/**
 * "Looks ahead 1 character for a possibility. If it's found, returns
 * `match`. If not:
 * - places character back onto input stream for next read
 * - returns the `otherwise` result.
 */
static int lookahead(char expect, int match, int otherwise) {
    int c = getchar();
    if (c == expect) {
        return match;
    } else {
        ungetc(c, stdin);
        return otherwise;
    }
}

/* our simple, hand-rolled lexer. */
int yylex(void) {
    static bool hit_eof = false;
    int c;

    // Skip space
    do {
        c = getchar();
    } while (c == ' ' || c == '\t');


    /* Newline / EOF handling:
     * The first time we see an EOF, we "fake" a newline,
     * so that the list production can recognize any in-progress
     * statement. The next call will return the EOF.
     */
    if (c == EOF && !hit_eof) {
        ungetc(c, stdin);
        hit_eof = true;
        c = '\n';
    }
    if (c == '\n') {
        line_number++;
    } else {}

    // Get numbers
    if (c == '.' || isdigit(c)) {
        ungetc(c, stdin);
        scanf("%lf", &yylval.hoc_value);
        return NUMBER;
    }

    if (isalpha(c)) {
        Symbol *s;

        char buf[SYMBOL_NAME_LIMIT + 1];
        size_t nread = 0;
        do {
            buf[nread++] = c;
            c = getchar();
        } while (nread < SYMBOL_NAME_LIMIT && (isalpha(c) || isdigit(c)));

        // Just in case we exceeded the limit
        while ((isalpha(c) || isdigit(c))) {
            c = getchar();
        }

        // at this point, we have a non-alphanumeric 'c'
        ungetc(c, stdin);

        buf[nread] = '\0';
        if ((s=lookup(buf)) == NULL) {
            s = install_symbol(buf, UNDEF, 0.0);
        }
        yylval.hoc_symbol = s;
        return (s->type == UNDEF ? VAR : s->type);
    }

    switch (c) {
      case '<':
          return lookahead('=', LTE, LT);
      case '>':
          return lookahead('=', GTE, GT);
      case '=':
          return lookahead('=', EQ, '=');
      case '!':
          return lookahead('=', NEQ, NOT);
      case '|':
          return lookahead('|', OR, '|');
      case '&':
          return lookahead('&', AND, '&');
      default:
          return c;
    }
}

void yyerror(const char *s) {
  warning("%s", s);
}

/*
 * Verify Symbol's declared arg count matches actual,
 * or display a helpful error message with mismatch
 */
void check_call_args(const Symbol *s, int actual) {
    int expected = s->data.func.args;
    if (expected != actual) {
        exec_error("Wrong number of arguments for %s: expected %d, got %d",
        s->name, expected, actual);
    }
}

#define print_error_prefix() fprintf(stderr, "%s: ", program_name)
#define print_error_suffix() fprintf(stderr, " (on line %d)\n", line_number)

void warning(const char *msg, ...) {
  va_list args;
  va_start(args, msg);

  print_error_prefix();
  vfprintf(stderr, (msg), args);
  print_error_suffix();

  va_end(args);
}

void exec_error(const char *msg , ...) {
  va_list args;
  va_start(args, msg);

  print_error_prefix();
  vfprintf(stderr, (msg), args);
  print_error_suffix();

  va_end(args);

  longjmp(begin, 0);
}
machine.c

#include "hoc.h"
#include "hoc.tab.h"
#include <assert.h>
#include <stdbool.h>
#include <stddef.h>  // NULL
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

/*
 * MACHINE STATE
 * =============
 *
 * We have our "program" stored as a list of MachineCode
 * objects, each encoding one of:
 *   - an instruction
 *   - a literal value
 *   - a symbol reference
 *
 * We "execute" our program by setting the program counter (pc)
 * pointer to the first MachineCode (which *must* be an instruction),
 * and handling each instruction we find. Within an instruction, the
 * program counter might be updated so that by the *end* of any
 * instruction, the program counter is always pointing to another
 * instruction.
 *
 *
 * The Stack
 * ---------
 *
 * Each instruction manipulates the stack, pushing,
 * popping, or combining values from the stack.  The stack elements
 * are Datum objects, which act much like MachineCodes, but with two
 * differences.
 *
 * 1. Datums cannot represent an instruction; they resolve to a value
 * 2. The type of a datum can always be resolved from their context,
 *    and so no "type" field is necessary.
 *
 * ┌────────────────────────────────────────────────────┐
 * │   == PROG ==                 == STACK ==           │
 * │ ┌─────────────┐◀───┐       ┌─────────────┐         │
 * │ │ MachineCode │    pc      │    Datum    │         │
 * │ ├─────────────┤            ├─────────────┤         │
 * │ │ MachineCode │            │    Datum    │         │
 * │ ├─────────────┤            ├─────────────┤         │
 * │ │ MachineCode │   progp    │    Datum    │ stackp  │
 * │ ├─────────────┤◀────┘      ├─────────────┤ ◀──┘    │
 * │ │     ...     │            │     ...     │         │
 * │ └─────────────┘            └─────────────┘         │
 * └────────────────────────────────────────────────────┘
 *
 */

#define PROGSIZE 2000
static MachineCode prog[PROGSIZE];
Addr               progp;  // Next free location for code installation
static Addr        pc;     // the current location in the executing program
static Addr        highest_addr_written = prog;  // Used in LISTing contents

// stack
#define STACKSIZE 256
static Datum  stack[STACKSIZE];            // the machine stack
static Datum *stackp;                      // next free spot on the stack
static Datum  last_popped = {.value = 0};  // The last value popped
static void   stack_push(Datum d);
static Datum  stack_pop(void);

/** Initialize machine state for startup */
void machine_startup(void) {
  srand(time(NULL));  // use current time as seed for random generator
  machine_reset_program();
}

/** Initialize execution state for a new program */
void machine_reset_program(void) {
  stackp = stack;
  progp = prog;
}

///----------------------------------------------------------------
/// Code Installation
///----------------------------------------------------------------

Addr install_code(MachineCode mc) {
  if (progp >= prog + PROGSIZE) {
    exec_error("Max program size exceeded");
  }

  if (progp > highest_addr_written) {
    highest_addr_written = progp;
  }

  *progp = mc;
  return progp++;  // return location of THIS instruction
}

///----------------------------------------------------------------
/// Execution
///----------------------------------------------------------------

void execute(MachineCode *addr) {
  pc = (addr == NULL) ? prog : addr;

  while (!is_stop_inst(pc)) {
    if (pc >= progp) {
      exec_error("PC was past end-of-program");
    }
    if (pc->type != CT_INST) {
      exec_error("Programming error, type '%c' was not INST!", pc->type);
    }

    (*(*pc++).inst)();  // increment PC and execute inst
  }
}

/** Push new value onto stack */
static void stack_push(Datum d) {
  if (stackp >= stack + STACKSIZE) {
    exec_error("Stack overflow");
  }
  *(stackp++) = d;  // copy
}

static Datum stack_pop(void) {
  if (stackp <= stack) {
    exec_error("Stack underflow");
  }
  Datum d = *(--stackp);
  return (last_popped = d);
}

///----------------------------------------------------------------
/// Instructions
///----------------------------------------------------------------

#define dat(x) ((Datum){.value = x})

int inst_pushlit(void) {
  assert(pc->type == CT_LITERAL);

  Datum d;
  d.value = pc->literal;
  stack_push(d);
  pc++;
  return 0;
}

int inst_pushref(void) {
  assert(pc->type == CT_SYM_REF);

  Datum d;
  d.symbol = pc->symbol;
  stack_push(d);
  pc++;
  return 0;
}

int inst_pushlast(void) {
  stack_push(last_popped);
  return 0;
}

int inst_eval(void) {
  Datum d = stack_pop();  // must be symbol
  switch (d.symbol->type) {
    case UNDEF:
      exec_error("Undefined variable: '%s'", d.symbol->name);
      break;
    case BUILTIN:
      exec_error("Builtin '%s' cannot be evaluated", d.symbol->name);
      break;
    default:  // VAR,CONST
      stack_push(dat(d.symbol->data.val));
      break;
  }
  return 0;
}

int inst_whilecode(void) {
  /*
    ┌─────────────────────────────┐
    │          whilecode          │
    ├─────────────────────────────┤
    │        address(body)        │
    ├─────────────────────────────┤
    │  address(stmt after body)   │
    ├─────────────────────────────┤
    │            cond             │
    │             ..              │
    ├─────────────────────────────┤
    │            STOP             │
    ├─────────────────────────────┤
    │                             │
    │            body             │
    │            ....             │
    ├─────────────────────────────┤
    │            STOP             │
    ├─────────────────────────────┤
    │       stmt after body       │
    └─────────────────────────────┘
  */
  Addr body_addr = (pc++)->addr;
  Addr end_addr = (pc++)->addr;
  Addr cond_addr = pc;

  while (true) {
    execute(cond_addr);
    if (stack_pop().value) {  // guaranteed: conditions are expressions
      execute(body_addr);
    } else {
      break;
    }
  }

  pc = end_addr;
  return 0;
}

int inst_ifcode(void) {
  /*
    ┌─────────────────────────────┐
    │           ifcode            │
    ├─────────────────────────────┤
    │       address(ifbody)       │
    ├─────────────────────────────┤
    │      address(elsebody)      │
    ├─────────────────────────────┤
    │   address(stmt after if)    │
    ├─────────────────────────────┤
    │            cond             │
    │             ..              │
    ├─────────────────────────────┤
    │            STOP             │
    ├─────────────────────────────┤
    │                             │
    │           ifbody            │
    │            ....             │
    ├─────────────────────────────┤
    │            STOP             │
    ├─────────────────────────────┤
    │                             │
    │          elsebody           │
    │            ....             │
    ├─────────────────────────────┤
    │            STOP             │
    ├─────────────────────────────┤
    │        stmt after if        │
    └─────────────────────────────┘
  */
  Addr        ifbody_addr = pc[0].addr;
  MachineCode maybe_elsebody = pc[1];
  Addr        end_addr = pc[2].addr;
  Addr        cond_addr = pc + 3;

  execute(cond_addr);
  Datum d = stack_pop();  // must be value
  if (d.value) {
    execute(ifbody_addr);
  } else if (maybe_elsebody.type == CT_ADDR) {
    execute(maybe_elsebody.addr);
  }

  pc = end_addr;
  return 0;
}

int inst_call(void) {
  Datum s = stack_pop();  // must be symbol
  if (s.symbol->type != BUILTIN) {
    exec_error("Cannot call non-function '%s'", s.symbol->name);
  }

  double result;
  switch (s.symbol->data.func.args) {
    case 0:
      result = s.symbol->data.func.call0();
      break;
    case 1: {
      Datum arg = stack_pop();
      result = s.symbol->data.func.call1(arg.value);
      break;
    }
    case 2: {
      Datum arg2 = stack_pop();
      Datum arg1 = stack_pop();
      result = s.symbol->data.func.call2(arg1.value, arg2.value);
      break;
    }
  }

  stack_push(dat(result));
  return 0;
}

int inst_assignvar(void) {
  Datum lhs = stack_pop();  // must be symbol
  Datum rhs = stack_pop();  // must be value
  if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
    exec_error("Cannot assign to symbol '%s'", lhs.symbol->name);
  }
  lhs.symbol->data.val = rhs.value;
  lhs.symbol->type = VAR;
  stack_push(dat(rhs.value));
  return 0;
}

int inst_assignconst(void) {
  Datum lhs = stack_pop();  // must be symbol
  Datum rhs = stack_pop();  // must be value
  if (lhs.symbol->type == CONST) {
    exec_error("Cannot reassign constant '%s'", lhs.symbol->name);
  } else if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
    exec_error("Cannot assign to symbol '%s'", lhs.symbol->name);
  }
  lhs.symbol->data.val = rhs.value;
  lhs.symbol->type = CONST;
  stack_push(dat(rhs.value));
  return 0;
}

int inst_pop(void) {
  stack_pop();
  return 0;
}

int inst_print(void) {
  Datum d = stack_pop();
  printf("\t%.8g\n", d.value);
  return 0;
}

int inst_prexpr(void) {
  Datum d = stack_pop();
  printf("%.8g\n", d.value);
  return 0;
}

int inst_add(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value + rhs.value));
  return 0;
}

int inst_and(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value
  stack_push(dat(lhs.value && rhs.value));
  return 0;
}

int inst_or(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value
  stack_push(dat(lhs.value || rhs.value));
  return 0;
}

int inst_not(void) {
  Datum d = stack_pop();  // must be value
  d.value = !d.value;
  stack_push(d);
  return 0;
}

int inst_lt(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value < rhs.value));
  return 0;
}

int inst_lte(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value <= rhs.value));
  return 0;
}

int inst_gt(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value > rhs.value));
  return 0;
}

int inst_gte(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value >= rhs.value));
  return 0;
}

int inst_eq(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value == rhs.value));
  return 0;
}

int inst_neq(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value != rhs.value));
  return 0;
}

int inst_sub(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value - rhs.value));
  return 0;
}

int inst_div(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  if (rhs.value == 0) {
    exec_error("Division by zero");
  } else {
    stack_push(dat(lhs.value / rhs.value));
  }
  return 0;
}

int inst_mul(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value
  stack_push(dat(lhs.value * rhs.value));
  return 0;
}

///----------------------------------------------------------------
/// Commands
///----------------------------------------------------------------

void machine_run_command(Symbol *command_symbol) {
  command_symbol->data.command_func();
}

#define check_inst(inst_name)                                            \
  else if (c.inst == inst_##inst_name) do {                              \
    printf(inst_template, line, (void *)(prog + i), "INST", #inst_name); \
  }                                                                      \
  while (false)

void command_list(void) {
// Lists contents of prog until we hit two stop instructions in a row
#define header_template "%6s | %11s | %10s | %-30s\n"
#define inst_template "%6ld | %11p | %10s | %-30s"
#define rightline_template "%6ld | %11p | %10s | %30s"
#define lit_template "%6ld | %11p | %10s | %30f"
#define addr_template "%6ld | %11p | %10s | %11p         (Line %3ld)"

  printf(header_template, "LINE", "ADDRESS", "TYPE", "CODE");
  printf("------------------------------------------------------------------"
         "\n");
  for (size_t i = 0; prog + i <= highest_addr_written; i++) {
    size_t      line = i + 1;
    MachineCode c = prog[i];
    void       *cp = &prog[i];
    switch (c.type) {
      case (CT_LITERAL):
        printf(lit_template, line, cp, "L", c.literal);
        break;
      case (CT_SYM_REF):
        printf(rightline_template, line, cp, "SYM", c.symbol->name);
        break;
      case (CT_ADDR): {
        size_t addr_line = (c.addr - prog) + 1;
        printf(addr_template, line, cp, "ADDR", (void *)c.addr, addr_line);
        break;
      }
      case (CT_INST):
        if (is_stop_inst(&c)) {
          printf(inst_template,
                 line,
                 cp,
                 "INST",
                 "------------ STOP ------------");
        }
        check_inst(add);
        check_inst(and);
        check_inst(assignvar);
        check_inst(assignconst);
        check_inst(call);
        check_inst(div);
        check_inst(eq);
        check_inst(eval);
        check_inst(gt);
        check_inst(gte);
        check_inst(ifcode);
        check_inst(lt);
        check_inst(lte);
        check_inst(mul);
        check_inst(neq);
        check_inst(not );
        check_inst(or);
        check_inst(pop);
        check_inst(prexpr);
        check_inst(print);
        check_inst(pushlast);
        check_inst(pushlit);
        check_inst(pushref);
        check_inst(sub);
        check_inst(whilecode);
        else {
          printf(inst_template, line, cp, "INST", "UNKNOWN!?");
        }
        break;
      default:
        printf("Unknown Code Type: %c\n", c.type);
    }
    printf("\n");
  }
}
math.c

#include "hoc.h"
#include <errno.h>
#include <fenv.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>  // rand

#define HOC_RAND_MAX 1000000L

static double check_math_err(double result, const char *op);

double Abs(double x) {
  return check_math_err(fabs(x), "fabs");
}

double Acos(double x) {
  return check_math_err(acos(x), "acos");
}

double Atan(double x) {
  return check_math_err(atan(x), "atan");
}

double Atan2(double x, double y) {
  return check_math_err(atan2(x, y), "atan2");
}

double Cos(double x) {
  return check_math_err(cos(x), "cos");
}

double Exp(double x) {
  return check_math_err(exp(x), "exp");
}

double Integer(double x) {
  return (double)(long)x;
}

double Lg(double x) {
  return check_math_err(log2(x), "lg");
}

double Ln(double x) {
  return check_math_err(log(x), "log");
}

double Log10(double x) {
  return check_math_err(log10(x), "log10");
}

double Pow(double x, double y) {
  return check_math_err(pow(x, y), "exponentiation");
}

double Random(void) {
  // We have HOC_RAND_MAX buckets
  const int bucket_size = RAND_MAX / HOC_RAND_MAX;
  int       result;
  do {
    result = rand() / bucket_size;
  } while (result >= HOC_RAND_MAX);

  return result + 1;
}

double Sin(double x) {
  return check_math_err(sin(x), "sin");
}

double Sqrt(double x) {
  return check_math_err(sqrt(x), "sqrt");
}

static double check_math_err(double result, const char *op) {
  static const char *domain_msg = "argument outside domain";
  static const char *range_msg = "result outside range";
  static const char *other_msg = "floating-point exception occurred";

  const char *msg = NULL;
  if ((math_errhandling & MATH_ERREXCEPT) && fetestexcept(FE_ALL_EXCEPT)) {
    // Special case: Inexact results are not errors.
    if (fetestexcept(FE_INEXACT)) {
      goto done;
    }

    if (fetestexcept(FE_INVALID)) {
      msg = domain_msg;
    } else if (fetestexcept(FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW)) {
      msg = range_msg;
    } else {  // unknown
      msg = other_msg;
    }

    feclearexcept(FE_ALL_EXCEPT);
  } else if (errno) {
    if (errno == EDOM) {
      msg = domain_msg;
    } else if (errno == ERANGE) {
      msg = range_msg;
    } else {
      msg = other_msg;
    }
    errno = 0;
  }

  if (msg != NULL) {
    exec_error("math error during %s: %s", op, msg);
  }

done:
  return result;
}
symbol.c

#include "hoc.h"
#include "hoc.tab.h"  // generated from yacc -d on our grammar
#include <stddef.h>   // NULL
#include <stdlib.h>   // malloc
#include <string.h>   // strcmp

static Symbol *symbol_table = NULL;

void *emalloc(size_t nbytes);

Symbol *lookup(const char *name) {
  Symbol *current = symbol_table;
  while (current != NULL) {
    if (strcmp(current->name, name) == 0) {
      return current;
    }
    current = current->next;
  }

  return NULL;
}

Symbol *install_symbol(const char *name, short type, double value) {
  size_t  name_len = strlen(name);
  Symbol *s = emalloc(sizeof(Symbol)                       // actual symbol
                      + (sizeof(char)) * (name_len + 1));  // room for name

  s->name = (char *)(s + 1);
  strcpy(s->name, name);
  s->name[name_len] = '\0';

  s->type = type;
  s->data.val = value;
  s->next = symbol_table;
  symbol_table = s;
  return s;
}

void *emalloc(size_t nbytes) {
  void *ptr = malloc(nbytes);
  if (ptr == NULL) {
    exec_error("Out of memory!");
  }
  return ptr;
}