oddly accurate

hoc5

hoc6: Subroutines & Strings

The last version of hoc defined in UPE, hoc6 provides some major features that complete its transition from calculator to minimally-useful scripting language.

User-defined subroutines

Users can now define their own subroutines to create reusable behavior. The implementation changes for this feature are probably—with the exception of our transition from hoc3 to hoc4—the largest overall.

Improved Print Statement & String Literals

The print statement is much improved in hoc6. We can print multiple comma-separated expressions, and even more importantly, string literals!

User Input Support

hoc6 enables hoc programs to read input data from files and/or the console. Together with the enhanced print statement, hoc programs can now prompt the user for input, read pre-calculated data from files, and engage interactively with users.

Feature: User-Defined Subroutines

Subroutines are divided into two categories: functions, which are required to return a value, and procedures, which are required not to. You must decide what kind of subroutine you’re writing up-front, because they use different keywords:

The func keyword defines a function:

func theAnswer() {
    return 42
}

A simple hoc function

Whereas the proc keyword is used for procedures:

proc tellMeTheAnswer() {
    print 42
}

A simple hoc procedure

Procedures are still allowed to use the return keyword in order to exit early, but it must be a “bare return”, with no value.

Subroutines may call other subroutines or themselves; they can also use variables freely. Note, however, that the variables defined inside a subroutine are not scoped; they’ll still be available outside of the routine!

The following example returns the factorial of the global variable x. If x isn’t defined, we’ll get an error when we call it.

func factorialOfX() {
    if (x <= 0) {
      return 1
    }
    f = x
    result = 1
    while (f > 1) {
      result = result * f
      f = f - 1
    }
    return result
}

A hoc function that uses variables

You can see that all of the variables used inside the function are still available after it returns.

factorialOfX()
./hoc6/hoc: Undefined variable x (on line 15)
x = 5
factorialOfX()
    120
x
    5
f
    1
result
    120

Using Parameters

When defining a subroutine in hoc6, there is no list of formal parameters. Instead, the function/procedure body can freely use the syntax $1, $2, etc. to refer to the first, second, etc. argument. We can use this to make a much nicer factorial function:

func fac() {
  if ($1 <= 0) {
    return 1;
  } else {
    return $1 * fac($1-1)
  }
}

fac(k) computes k!

When executing your subroutine, hoc will only verify that such an argument was actually passed when you try to use it.

fac(11)
    39916800
fac()
./hoc6/hoc: Not enough arguments passed to subroutine: fac (on line 10)

Assigning to Parameters

The arguments to a subroutine are passed by value; we are allowed to modify their values without changing them in the caller. Notice how we change $1 at the beginning of compare() below.

func compare() {
  $1 = $1 - $2
  if ($1 < 0) {
    return -1;
  } else if ($1 == 0) {
    return 0;
  }
  return 1;
}

compare(x, y) returns -1, 0, or 1, for x < y, x = y, or x > y.

Feature: Enhanced Print Statements

hoc6 has a much-improved print statement. Not only can multiple expressions be printed, e.g:

print PI, sin(PI), cos(PI)

We can also now print string literals, mixed with arbitrary expressions.

print "The sine value of ", PI, " is ", int(sin(PI)), "."

Importantly, this is the only functionality supported for strings. They may not be:

Feature: Reading User Input

In hoc6, we add a new builtin:

read(varName)

This allows our hoc programs to read user input into a variable and check that the input was valid in a single call.

Specifying Input Sources

First, when starting hoc, we can specify a set of one or more input files, and/or stdin. The read(varName) function will read from one file at a time; when a file is exhausted, hoc transparently moves to the next file.

We’d use the command ./hoc6 datafile to read input from datafile, and ./hoc6 - to read from stdin (presumably, the console).

This can be combined arbitrarily. The following shippet first reads variables from two static files, and once they’re exhausted, will read any remaining data from stdin.

./hoc6 input-file1 input-file2 -
read(): Behavior for Multiple Files

Using Input Data in hoc6

To read a double from our current input sources, you can use the read(myVar) statement. hoc responds to the read request as follows:

  1. hoc will attempt to read a double from the current input source

  2. If the data is valid, it will be stored into the variable named myVar, and read(myVar) will return 1. If we are out of data, myVar is set to 0 and read(myVar) returns 0. If the data was invalid, we get an exec_error.

This allows us to write programs that interactively use stdin during execution, such as:

> ./hoc -
{
  print "Enter some numbers; press EOF when finished: \n"
  sum = 0
  while (read(x)) {
    sum = sum + x
  }
  print "The sum is: ", sum, "\n"
}
Enter some numbers; press EOF when finished:
1 2 3 4 5 6 7 8
<Ctrl-D pressed>
The sum is: 36
<We can continue using hoc here>

Implementation: Defining Subroutines

We can think of a user-defined subroutine as a named set of machine instructions that are permanently installed in the hoc virtual machine. When we define a subroutine, we install the instructions, but we don’t execute them. This is a significant change; until now, hoc has executed all top-level statements as soon as they’re recognized.

We’ll break down the changes into meaningful areas, as described below. step-by-step. Since procedures and functions only differ in their return values, our discussion applies to both.

  1. Definition Syntax - We need to support the syntax for creating named blocks of code
  2. Subroutine Addresses - Because subroutines live in our machine’s program storage area, we need to be able to refer to “addresses” inside the prog area.
  3. Reserving prog space - Previously, our hoc machine overwrote whatever code had been installed in prog as soon as it was executed. We need a way to mark subroutine instructions as “reserved” so that they are not overwritten by new statements.

Step 1: Declaration Syntax

First, let’s add the ability to recognize a subroutine definition. We have two kinds of subroutines, so we should examine their components:

func subrName() {
  <statements>
}

Some components of this syntax we already have; the block of statements can nicely be handled by stmtlist, for example.

However, we’ll need to add the func & proc keywords, as well as adding parser actions to associate the subroutine’s name Symbol with its instructions.

New Keywords

Like loops and conditionals, it’ll be helpful to have new tokens—PROC_KW and FUNC_KW—for our “introductory” keywords proc and func. This will let us take action before any other components of the declaration are parsed.

-%token  <hoc_symbol>    IF_KW ELSE_KW PRINT_KW WHILE_KW
+%token  <hoc_symbol>    IF_KW ELSE_KW PRINT_KW WHILE_KW FUNC_KW PROC_KW

New keywords for subroutine declarations hoc6/hoc.y

The keywords must also be added to our language builtins.

static struct hoc_keyword {
  const char *name;
  int         token_type;
} keywords[] = {
    {"if", IF_KEYWORD},
    ...
+   {"func", FUNC_KW},
+   {"proc", PROC_KW},
    {NULL, 0},
};

New keywords for subroutines hoc6/builtins.c

Subroutine Names & Symbols

Once we’ve found a func or proc keyword, we want to parse the subroutine’s name, and create an associated Symbol object. To that end, we’ll need a new Symbol type; in fact, we’ll define two, so that we can handle the differences in return-value behavior. For lack of a better alternative, we’ll name them UPROC and UFUNC, for procedures and functions respectively.

  %token  <hoc_value>    NUMBER
- %token  <hoc_symbol>   VAR CONST BUILTIN UNDEF
+ %token  <hoc_symbol>   VAR CONST BUILTIN UNDEF UFUNC UPROC
  %type   <hoc_symbol>   assignable

New token types for subroutine declarations hoc6/hoc.y

The rules for naming a function will be the same as those for naming any other symbol, so we can reuse the VAR token for parsing a subroutine’s name. However, when we eventually call a subroutine, we’ll need to be able to differentiate them from variables and builtin functions, so we create a small wrapper, subr_decl, to override the installed Symbol’s type. This is the same pattern we used for assignable before, to handle VAR vs CONST symbols.

First, we’ll declare that, like our assignable wrapper, subr_decl procuces a Symbol.

  %token  <hoc_value>    NUMBER
- %type   <hoc_symbol>   assignable
+ %type   <hoc_symbol>   assignable subr_decl

subr_decl produces a UFUNC or UPROC hoc6/hoc.y

The production for a subr_decl is easy, since we have our introductory keywords letting us know when a declaration has begun. Our only job is to override the Symbol.type installed by the lexer.

+ subr_decl:      PROC_KW VAR
+                 {
+                     $2->type = UPROC;
+                     $$ = $2;
+                 }
+         |       FUNC_KW VAR
+                 {
+                     $2->type = UFUNC;
+                     $$ = $2;
+                 }
+         ;

The subr_decl production updates Symbol types hoc6/hoc.y

Q: What’s the value of $2->type before this production runs? A: The yylex() function installs it as UNDEF.

Recognizing Subroutine Definitions

We can complete our language support for subroutine definnitions by recognizing their overall shape, including the body.

Modeling ourselves after our if and while constructs, we note that these declarations should be statements, and we can use the stmt production to recognize their bodies.

Like all statements, procedure declarations return the address at which they begin. These declarations are entirely defined by their body, so we’ll return the address of their body stmt.

  stmt:           expr { install_instruction(inst_pop); }
          |       '{' stmtlist '}' { $$ = $2; }
+         |       subr_decl '(' ')' stmt
+                 {
+                     $$ = $4;
+                 }

A subroutine declaration is a new kind of stmt hoc6/hoc.y

Step 2: Subroutine Addresses

Since user-defined subroutines are really just a set of instructions, a Symbol referring to a subroutine is really holding the address of the associated instructions.

It seems like the time to add an Addr member to the Symbol.data union, but unfortunately, this causes a new problem. Remember that the MachineCode struct already contains a Symbol pointer, because that’s how we represent symbol-table references. This causes cyclic dependencies between the Symbol and MachineCode types.

We can solve this by moving our forward declaration of MachineCode & Addr above our Symbol definition in hoc.h, and let the compiler resolve the full struct layout.

+ ///----------------------------------------------------------------
+ /// Addresses (Shared by multiple types)
+ ///----------------------------------------------------------------
+ typedef struct MachineCode MachineCode;
+ typedef MachineCode       *Addr;

...

  struct Symbol {
    char *name;
    short type;  // Generated from our token types
    Symbol *next;
    union {
      double             val;
+     Addr               addr;
      struct BuiltinFunc func;
    } data;
  };

+ ///----------------------------------------------------------------
+ /// Machine Code
+ ///----------------------------------------------------------------
...
- typedef struct MachineCode MachineCode;
- typedef MachineCode       *Addr;

hoc6/hoc.h

Step 3: Installing Subroutines

We need a way to “install” routines into our machine permanently, so that we can execute them multiple times. The biggest obstacle to this process is our machine_reset_program() function, which resets progp every time we execute a parsed statement. We need a way to prevent our subroutine bodies from being overwritten.

One workable method is to mark all space from prog to the end of the procedure as “reserved”. Instead of resetting progp back to the beginning of program memory, we’ll use a new variable, prog_start, which indicates the first non-reserved word that we can use.

  MachineCode prog[PROGSIZE];
+ Addr        prog_start = prog;  // No installation above this address
  Addr        progp;              // Next free location for code installation
  static Addr pc;                 // the current location in the executing program

hoc6/machine.c

The prog* variables have the following relationship:

Our new prog_start variable

Now, when we define a routine, we’ll preserve its code by shifting prog_start to just after its body. This is handled by reserve_subr, a new machine function. That function also updates the given Symbol to point to the location where the subroutine begins.

/**
 * Reserves all currently-installed code as a subroutine, under Symbol
 * `subr_sym`. The value of `subr_sym` will be updated to the address
 * of the installed routine. Ensures that the code will not be
 * overwritten by future installations, even after a machine reset.
 */
void reserve_subr(Symbol *subr_sym);

hoc6/hoc.h

  Addr install_code(MachineCode mc) {
    *progp = mc;
    Addr result = progp++;  // return location of THIS instruction
    return result;
  }

+ void reserve_subr(Symbol *subr_sym) {
+    subr_sym->data.addr = prog_start;
+    prog_start = progp;
+ }

hoc6/machine.c

Then, we make machine_reset_program() aware of prog_start, so that we never again overwrite any code between prog and prog_start.

void machine_reset_program(void) {
  stackp = stack;
- progp = progp;
+ progp = prog_start;

hoc6/machine.c

Finally, we update our parser action for subroutine-declaration statements to call our new reservation function and populate the Symbol value. Now, writing a subroutine declaration will install code, but won’t execute any!

  stmt:      expr { install_instruction(inst_pop); }
        |    '{' stmtlist '}' { $$ = $2; }
        |    subr_decl '(' ')' stmt
             {
+                reserve_subr($1);
                 $$ = $4;
             }

hoc.y: Reserving space for procedures hoc6/hoc.y

Q: This isn’t 100% true. What instruction(s) are read / executed after parsing stmt, and where do they come from? A: The machine will examine at least 1 instruction: STOP. It’s installed by the list: list stmt production, and since prog_start is updated before the list production runs, it will be evaluated and then complete.

Breakpoint: Test your Implementation

If you are following along, now is a good time to test your implementation. We can do that by adding a temporary expr production that understands how to produce a value for an installed declaration.

%%
expr: ...
...      |   UPROC
             {
                 $$ = install_instruction(inst_pushlit);
                 install_literal($1->data.addr - prog);
             }
         |   UFUNC
             {
                 $$ = install_instruction(inst_pushlit);
                 install_literal($1->data.addr - prog);
             }

Now, using the function/procedure name as a variable will evaluate to its address, relative to prog. As expected, a more complex body (like calling a builtin), corresponds to a larger block of instructions.

./hoc6/hoc
proc f1() { x = 123 + 345 }
proc f2() { print sin(cos(x+42)) }
proc f3() { y = 10 }
f1
    0
f2
    10
f3
    24

Implementation: Calling Subroutines

hoc now correctly installs our subroutines into permanent storage. We now focus on calling those subroutines, which requires a coordinated dance between the caller and called code.

The basic idea of executing subroutines in our hoc VM involves a combination of two cooperating instructions:

We’ll add this functionality in stages:

  1. Representing Subroutine Calls & Callers - Whenever we call a procedure, we must keep track of where we will resume execution afterwards. Since subroutines may call other subroutines (or themselves), we need to track an arbitrary stack of callers.

  2. Language Changes

  3. Executing Subroutines

  4. Returning to Callers

Step 1: Representing “Calls”

The bare minimum required for executing a reserved block of code —ignoring, for now, arguments and return values—is the ability to jump to the subroutine’s address, then jump back the caller after executing. The origin / destination data for a call is encoded into a call frame. Unlike most of our objects, the Frame structure has no use outside of the machine module, and so we can define it inside machine.c.

typedef struct Frame {
  Symbol *subr_called;  // the UFUNC/UPROC we have jumped to
  Addr    ret_pc;       // return location
} Frame;

hoc6/machine.c

Below is an example of how relationship between the subrexec instruction, the Frame it creates, and the uprocret instruction, which uses the information to return to the caller.

Relationship between an “execute” call, its Frame, and the Symbol involved.

The Call Stack

Because subroutines may call other subroutines (or themselves), a single Frame of call data is not sufficient. Instead, this information is often organized into a stack of per-call data, imaginatively named the call stack. Since a Frame is pushed onto the stack every time a subroutine is invoked, and popped when a routine returns to its caller, the currently-executing routine always has access to its own data at the top of the stack.

#define FRAMELIMIT 100  // Recursion depth limit
static Frame        frames[FRAMELIMIT];
static const Frame *OVERFLOW_FRAME = frames + FRAMELIMIT;
static Frame       *fp = frames;   // Next frame to use

hoc6/machine.c

We also add some simple functions for manipulating the frame stack.


/**
 * Push a new frame for a subroutine call onto the frame stack, or fail on
 * overflow.
 *
 * The frame is initialized as a call to Symbol `s`, returning to address
 * `caller_pc`. Caller must initialize all other frame data.
 */
static Frame *frame_push(Symbol *s, Addr caller_pc) {
  if (fp == OVERFLOW_FRAME) {
    exec_error("Call-Stack Overflow while calling '%s'", s->name);
  }

  fp->subr_called = s;
  fp->ret_pc = caller_pc;
  return fp++;
}

/** Pop most recent Frame from the frame stack, or fail on underflow */
static Frame *frame_pop(void) {
  if (fp == frames) {
    exec_error("Call-Stack Underflow while attempting to return");
  }
  return --fp;
}

/** Access top of frame stack, or fail if no Frame exists */
static Frame *frame_peek(void) {
  if (fp == frames) {
    exec_error("Call-Stack Underflow while attempting to peek at top frame");
  }
  return fp - 1;
}

hoc6/machine.c

Step 2: The subrexec Instruction

With our new call stack, we can implement the subrexec instruction, which performs the work to actually change our PC location and push a new Frame onto our stack. This instruction is always two words long, since we also need the Symbol which points to our procedure.

The subrexec instruction format

To actually run the subroutine, we use a recursive execute() call, using Symbol->data.addr as the pc value.

int inst_subrexec(void) {
  Symbol *sym = (pc++)->symbol;
  Frame *f = frame_push(sym, pc);

  execute(sym->data.addr);
  return 0;
}

hoc6/machine.c

Sidenote: Why Recurse? (Or skip

While it may seem as though we could simply change the value of the PC in order to “jump” to the subroutine, a nested execute() call is more compatible with with our implementation of ifcode and whilecode. Since those instructions use recursion, a direct-jump subroutine implementation without using recursion would lead to executing code in C stack frames we would not expect.

As an example, consider the following hoc program.

proc sillyCode() {
  if (2) {
    if (3) {
      print 1
      return
    }
    print 4
  }
  print 5
}
{
 if(1) {
   sillyCode()
   print 2
 }
 print 3
}
If you assume a non-recursive implementation of the subrexec and the uprocret instructions, what would this program print?

Most likely, something like:

1
2
4
5
3

Because our ifcode instructions recurse, we’re inside 4 execute() functions by the time we reach print 1:

  1. Main execute() from our REPL
  2. execute() inside inst_ifcode for if (1).
  3. execute() inside inst_ifcode for if (2).
  4. execute() inside inst_ifcode for if (3).

Now, after we finish print 1, we return, and we assume that merely jumps to the instructions for print 2. Since that’s the end of a stmtlist block, the following instruction will be a STOP.

Now things get odd. This will exit the fourth layer of the execute() function, bringing us back to the inst_ifcode we were running for if (3)inside our procedure! And that will begin running print 4! There are many problematic situations we can create for ourselves in this manner.

Step 3: Subroutine-Call Syntax

To install the subrexec instruction, we must add the language rules for calling a function or procedure. For now, we’ll disallow specifying any arguments, so the address we produce for the parser will just be the instruction for calling the procedure / function. Notice that procedure calls, which don’t return a value, must be statements, as they don’t leave a value on the stack.

%%
stmt: ...
     |        UPROC '(' ')'
              {
                  $$ = install_instruction(inst_subrexec);
                  install_ref($1);
              }

hoc.y: Calling procedures hoc6/hoc.y

Function calls are expressions, and so they can be used inside conditions, etc. (The job of placing a value onto the data stack will be handled by the return statement of the function; trying to call a function at this point will cause errors!)

%%
expr: ...
     |        UFUNC '(' ')'
              {
                  $$ = install_instruction(inst_subrexec);
                  install_ref($1);
              }

hoc.y: Calling functions hoc6/hoc.y

Step 4: Returning to our Caller

Our procedures must, at some point, return to their caller. We need a facility to “jump back” to our calling code. Since we’re executing recursively, this means we need to add a way to signal that it’s time to return from our current execute() call, and update the pc value to match our return address.

For hoc6, we will use a simple implementation, adding an is_returning flag that is examined by execute(). Whenever that flag is true, the VM knows it should just be unwinding nested execute calls, so the execute() function returns immediately, without incrementing the pc value again or executing anything.

That flag should stay true until we find ourselves back int the inst_subrexec function, since we’ve returned from the called subroutine. At that point, we can safely set the flag to false, knowing that the return instruction has already updated the PC.

You can see a visual representation of this flow below:

  1. subrexec runs within some execute() call (dotted). It finds the procedure address, and starts its own execute() flow.
  2. This flow runs instructions; eventually we reach a uprocret instruction
  3. uprocret sets pc = frame.ret_pc, and sets the is_returning flag
  4. execute() sees the flag is enabled, and returns to subrexec
  5. subrexec disables is_returning, and returns to its enclosing execute() call. the pc value is still set to the value from uprocret and not updated - execution continues correctly.
Execution Flow for procedures

Machine Changes

First, we must add the flag for is_returning:

  #define FRAMELIMIT 100  // Recursion depth limit
  #define OVERFLOW_FRAME (frames + FRAMELIMIT)
  static Frame  frames[FRAMELIMIT];
  static Frame *fp = frames;   // Next frame to use
+ static bool   is_returning;  // true when returning from a user subroutine

hoc6/machine.c

Next, we’ll add the uprocret instruction; it must:

int inst_uprocret(void) {
  Frame *f = frame_pop();
  is_returning = true;
  pc = f->ret_pc;
  return 0;
}

A “return” instruction hoc6/machine.c

Finally, we update execute to respect this state.

void execute(Addr start_addr) {
  pc = start_addr;
+ is_returning = false;
- while (!is_stop_inst(pc)) {
+ while (!is_stop_inst(pc) && !is_returning) {

hoc6/machine.c

An Issue: Loops and Conditionals

There’s one other aspect of our execution flow that must be considered before we’re done. To see it, consider the following hoc program, and decide what output will result when running it.

tester = 0
proc earlyReturnTest() {
  if (tester > 0) {
    print 2
    return
  }
  print 1
}
Q: What’s the output? A: It will print both 2 and 1!

The reason for this bug lies in our current implementation of the ifcode and whilecode instructions, and the nested execution flow we’re using. Because ifcode and whilecode also overwrite the pc value after their nested execute() calls, they may override the pc value just set by uprocret!

Consider the following flow:

Problems with uprocret and ifcode

Looking back to our example, the return keyword inside our function sets pc and is_returning. We immediately return to the ifcode function, whose nested execute just ended. But the inst_ifcode function also sets the PC value, overriding our return location.

The relevant parts of hoc5’s inst_ifcode() are commented below:

int inst_ifcode(void) {
  Addr ifbody_addr = pc[0].addr;
  ...
  Addr end_addr = pc[2].addr;
  Addr cond_addr = pc + 3;
  execute(cond_addr);
  Datum d = stack_pop();
  if (d.value) {
    execute(ifbody_addr);  // uprocret changes our PC inside here
  }
  pc = end_addr;           // and it's overwritten here...  :(
  return 0;
}

In hoc5, inst_ifcode unconditionally overwrites pc to end_addr

How can we make our return keyword force its way past any enclosing if and while blocks, until it reaches its parent subrexec instruction?

“Bubbling” Returns via is_returning Checks

As the UPE authors suggest, a simple (if hacky) fix, is to make these instructions aware of the our is_returning state, since they are part of our ‘pc-change’ flow. Before making any manual changes to the pc value, we check whether we’re actually returning (and so should stop executing). This lets us “bubble up” from a return statement, through any enclosing ifs or whiles, to the subroutine we’re returning from! Here’s our fixed ifcode flow:

is_returning aware conditional logic

The actual changes are minor:

inst_ifcode
  int inst_ifcode(void) {
    ...
    execute(cond_addr);
    Datum d = stack_pop();  // must be value
    if (d.value) {
      execute(ifbody_addr);
    } else if (maybe_elsebody.type == CT_ADDR) {
      execute(maybe_elsebody.addr);
   }
}

-   pc = end_addr;
+   if (!is_returning) {
+     pc = end_addr;
+   }
    return 0;
  }

hoc6/machine.c

inst_whilecode
  int inst_whilecode(void) {
    Addr body_addr = (pc++)->addr;
    Addr end_addr = (pc++)->addr;
    Addr cond_addr = pc;

    while (true) {
+     if (is_returning) {
+       return 0;  // PC is already set to the appropriate address
+     }

      execute(cond_addr);
      if (stack_pop().value) {
        execute(body_addr);
      } else {
        break;
      }
    }

    // condition false
    pc = end_addr;
    return 0;
  }

hoc6/machine.c

Adding return Statements

To allow users to trigger the return, we add a return keyword to our token declarations and language builtins:

- %token  <hoc_symbol>  IF_KW ... FUNC_KW PROC_KW
+ %token  <hoc_symbol>  IF_KW ... FUNC_KW PROC_KW RETURN_KW

hoc6/hoc.y

static struct {
    const char *name;
    int         token_type;
  } keywords[] = {
      {"if", IF_KW},
      {"else", ELSE_KW},
      {"func", FUNC_KW},
      {"proc", PROC_KW},
      {"print", PRINT_KW},
      {"while", WHILE_KW},
+     {"return", RETURN_KW},
      {NULL, 0},
  };

hoc6/hoc.h

And a return statement to install the uprocret machine instruction. The return statement has an interesting new feature: it should only be used while defining a subroutine.

Adding the keyword itself is straightforward; we add the RET_KEYWORD to hoc.y and the associated "return" string to builtins.c.

But in order to prevent users from writing a return statement outside of a subroutine, we’ll need to know, while executing the parser action, whether we’re currently defining a subroutine or not.

To do this, we’ll add a new piece of parser state: current_subr.

/** Tracks the symbol for the subroutine currently being parsed (if any). */
Symbol *current_subr = NULL;

hoc6/hoc.y

While our parser is parsing a subroutine, we want this variable to track the Symbol of that subroutine. Once we’ve finished parsing the routine, we must reset it to NULL. This is most easily done inside our subr_decl action:

  subr_decl:      PROC_KW VAR
                  {
                      $2->type = UPROC;
+                     current_subr = $2;
                      $$ = $2;
                  }
          |       FUNC_KW VAR
                  {
                      $2->type = UFUNC;
+                     current_subr = $2;
                      $$ = $2;
                  }
          ;

hoc6/hoc.y

And then removing the definition after parsing the body

  stmt: ...
    |             subr_decl '(' ')' stmt
                  {
                     reserve_subr($1);
+                    current_subr = NULL;
                     $$ = $4;
                  }

hoc6/hoc.y

Once we have access to the current subroutine, enforcing that return statements can’t happen outside of them is trivial.

stmt: expr { install_instruction(inst_pop); }
...
+     |  RETURN_KW
+        {
+             if (current_subr == NULL) {
+                 exec_error("Syntax error - "
+                            "`return` keyword used outside of a subroutine");
+             }
+             if (current_subr->type != UPROC) {
+                 exec_error("Syntax error - "
+                            "bare return used outside of a procedure");
+             }
+            $$ = install_instruction(inst_uprocret);
+        }

hoc6/hoc.y

“Fall Through” Returns

It’s all well and good to allow users to return from their procedures, but what if they forget, or their procedure simply runs until it ends? We need to make absolutely sure that our procedures will return to their caller’s location. To do this, we can simply force every procedure to end with a uprocret instruction, during our parser action for procedure declarations. At this point, the procedure body has been installed, so we can safely add one more instruction to the end of the procedure’s body before reserving space for it in our machine.

stmt:  ...
    |   subr_name '(' ')' stmt
        {
+           /*
+            * Note: the order of these two lines matters!
+            * `uprocret` should be part of the subr body,
+            * so reserve space *after* installing it.
+            */
+           install_instruction(inst_uprocret);
            reserve_subr($1);
            current_subr = NULL;
            $$ = $4;
        }

Forcing returns from procedures hoc6/hoc.y

Breakpoint: Test your Implementation

This is another good place to test your implementation; no additional code should be required for defining & calling procedures.

Implementation: Passing Arguments

At this point, we can declare procedures and then call them. Next, we will implement passing data into our subroutines, via arguments, and then accessing that data inside our subroutine bodies.

Arguments to our subroutines will be managed on our data stack, must be sure to add and remove them on entrance and exit.

Step 1: Designing Argument-Passing

First, we must decide how to encode arguments for our machine to find when executing subrexec. We’ve seen one way of passing arguments already in hoc3, when using BUILTIN functions. Those functions had a fixed number of arguments; those values were stored on the data stack, and each inst_callN variant knew how many to remove.

Our subroutines, on the other hand, allow an arbitrary number of arguments. During a call, arguments will be placed onto the (data) stack, and stay there for the entire duration of the call. In order for the subroutines to use the arguments, we must be able to locate them on the stack; furthermore, to avoid invalid memory accesses, we should check that an argument exists before trying to access it.

We’ll encode the number of arguments and their stack location inside our Frame during our subrexec call. The number of arguments will be generated as part of the subrexec instruction, to ensure that we only access paramters that actually exist. The stack location is only known at call time, so we’ll update our subrexec instruction to populate that value.

Then, our subroutine bodies will be able to generate instructions that can lookup argument values, using this data from the current Frame.

Call-Frames with argument information

First, we’ll update our Frame struct with fields to represent arguments passed.

  typedef struct Frame {
    Symbol *subr_called;  // the UFUNC/UPROC we have jumped to
    Addr    ret_pc;       // return location
+   Datum  *argp;         // pointer to first argument
+   int     arg_count;    // number of arguments
  } Frame;

hoc6/machine.c

Step 2: Argument-Passing Syntax

Arguments add two additional responsibilities to our parser. The first is some additional complexity around the addresses of our procedure-call statements. Consider the hoc statement:

callMyProcedure()

Previously, this stmt generated a single, two-word instruction, subrexec, and the stmt production for the call produced the address of the first word.

%%
stmt: ...
     |        UPROC '(' ')'
              {
                  $$ = install_instruction(inst_subrexec);
                  install_ref($1);
              }

Now, if we instead call a subroutine that requires arguments, like:

add3(6, 14, 7)

Then the address of the subrexec instruction is not sufficient. Like BUILTINs, we require the address where our argument expressions begin to be generated.

The second need is our requirement that the parser should tell us how many arguments were passed in a given call, so that it can generate our new version of subrexec, which includes the number of arguments.

You can see the required information in the diagram below:

Required: argument address and counts

We need a new piece of syntax for tracking the start address and number of argument expressions in a call.

Tracking Integer Values

To track a “count” value, we’ll add yet another new value type to our parser – integer. This’ll be useful for cases where we want to refer to an “index” or “count”, rather than trying to use a double to specify a whole number value, such as “3 arguments passed” or “get argument for parameter 2”.

  %union {
    double hoc_value;
    Symbol *hoc_symbol;
      Addr hoc_address;
+     int  hoc_integer;
  }
  %type   <hoc_symbol>    assignable subr_name

hoc6/hoc.y

New Syntax: Argument Lists

Since we need two new pieces of information, we’ll create two new non-terminals to represent them inside our subroutine calls.

First, we’ll create a non-terminal, arglist, that accepts a list of expressions, and produces the number of expressions in that list.

  %type   <hoc_symbol>    assignable subr_decl
+ %type   <hoc_integer>   arglist

hoc6/hoc.y

For our arglist production, the expressions themselves are still installed by the expr nonterminal; the arglist action only cares about how many expressions it has seen. It counts them by using itself recursively; we also include a special case for a single argument.

%%
arglist:        /* nothing */ { $$ = 0; }
        |       expr { $$ = 1; }
        |       arglist ',' expr { $$ = $1 + 1; }

hoc6/hoc.y

Tracking Argument Addresses

Next, we need to track the address where our argument expressions begin to be installed. We’ve done this kind of thing before, with the end production in our if and while statements. We create a new args_start production to to “mark our place” when we start parsing an argument list.

It produces an address:

- %type <hoc_address>   assign call end expr if stmt stmtlist while
- %type <hoc_address>   args_start assign call end expr if stmt stmtlist while

hoc6/hoc.y

And it operates almost like the end production; however, it doesn’t install anything.

%%
args_start:     /* nothing */ { $$ = progp; }
        ;

hoc6/hoc.y

Finally, we can add these nonterminals into our procedure and function call syntax.

Procedure Calls

  stmt:           expr { install_instruction(inst_pop); }
          |       '{' stmtlist '}' { $$ = $2; }
-         |        UPROC '(' ')'
+         |        UPROC '(' args_start arglist ')'
                  {
-                     $$ = install_instruction(inst_subrexec);
+                     install_instruction(inst_subrexec);
                      install_ref($1);
+                     install_literal($4);
+                     $$ = $3;
                  }

hoc6/hoc.y

Function Calls

  expr:           NUMBER
  ...
-         |       UFUNC '(' ')'
+         |       UFUNC '(' args_start arglist ')'
                  {
-                     $$ = install_instruction(inst_subrexec);
+                     install_instruction(inst_subrexec);
                      install_ref($1);
+                     install_literal($4);
+                     $$ = $3;
                  }

hoc6/hoc.y

Locating Arguments on the Stack

Then we’ll need to update inst_subrexec to handle the new information in our instruction format; upon reaching an subrexec call, we’ve got the arguments for the call at the top of the stack, so we set our argp value to point to the first argument’s Datum. Since we’ve got a stack of arguments, the first argument is exactly arg_count indices behind our stackp pointer.

Generating argp in subrexec
  int inst_subrexec(void) {
    Symbol *sym = (pc++)->symbol;
+   double  arg_count = (pc++)->literal;

    Frame *f = frame_push(sym, pc);
+   f->arg_count = arg_count;
+   f->argp = stackp - f->arg_count;

    execute(sym->data.addr);
    is_returning = false;
    return 0;
  }

hoc6/machine.c

Step 3: Evaluating Parameters

In order to actually use our arguments within the body of a procedure or function, we need to transform references like $1, $2, etc. into stack accesses, which either fetch or overwrite a value on the stack.

argget

The basic idea will be to parse a parameter reference $k into a new argget instruction. This instruction is two words long; the second word, k, specifies the parameter number we want to access. The machine will:

The argget instruction
  1. Verify that k is within the current frame’s arg_count
  2. Find the argument value on the stack using the current frame’s argp
  3. Push a copy of the argument value onto the top of the stack
Getting an Argument

argset

Assigning a value to an argument will work similarly; the value at the top of the stack replaces the Datum value for argument k. It’s then put back at the top of the stack, since assignments are expressions.

Setting an Argument
Language Changes: Using Arguments

First, we’ll need a way to recognize parameter references, and turn the reference $k into the literal integer k.

Lucky for us, we just added a new “integer” value-type into our language, so we can associate a new PARAM_NUMBER token with that type. (Remember: It’s capitalized because it’s a terminal.)

  %type   <hoc_symbol>    assignable udef_name
  %type   <hoc_integer>   arglist
+ %token  <hoc_integer>   PARAM_NUMBER

hoc6/hoc.y

Then, we can update our yylex() function to handle parameters as well. A $ with no prefix or other surrounding characters is only valid as a parameter reference, so we can place this conditional logic anywhere before our final switch statement.

If scanf fails to read an integer, or we get a non-positive integer value, we consider it a lexing failure. This allows more informative error messages, but whether to perform this much semantic validation inside your lexer is a matter of opinion.

  int yylex(void) {
    int c;
...

    if (c == '\n') {
      line_number++;
    }

+   // Parse parameters
+   if (c == '$') {
+     if (scanf("%d", &yylval.hoc_integer) != 1) {
+         exec_error("Invalid parameter specification", "");
+     } else if (yylval.hoc_integer <= 0) {
+         exec_error("Parameters must be nonnegative", "");
+     }
+     return PARAM_NUMBER;
+   }

hoc6/hoc.y

We can use our PARAM_NUMBER token to write new expr productions for evaluating or assigning a parameter. Like the productions for return statements, these should only be allowed while defining a procedure.

expr:
...
+    |   PARAM_NUMBER
+        {
+            if (current_subr == NULL) {
+                exec_error("Syntax error - "
+                           "Parameter '$%d' not allowed here", $1);
+            }
+            $$ = install_instruction(inst_argget);
+            install_literal($1);
+        }
+    |   PARAM_NUMBER '=' expr
+        {
+            if (current_subr == NULL) {
+                exec_error("Syntax error - "
+                           "Parameter '$%d' not allowed here", $1);
+            }
+            $$ = install_instruction(inst_argset);
+            install_literal($1);
+        }
     |       expr '+' expr   { install_instruction(inst_add); }

hoc6/hoc.y

Machine Changes: argget and argset

The inst_argget and inst_argset instructions use the Frame.argp pointer to access the correct location in the stack. The only arithmetic required is converting the parameter numbers into zero-indexed offsets.

int inst_argget(void) {
  int    arg_number = (pc++)->literal;
  Frame *current_frame = frame_peek();
  if (current_frame->arg_count < arg_number) {
    exec_error("Subroutine '%s' uses parameter %d, but only got %d.",
               current_frame->subr_called->name,
               arg_number,
               current_frame->arg_count);
  }
  stack_push(current_frame->argp[arg_number - 1]);
  return 0;
}

int inst_argset(void) {
  int    arg_number = (pc++)->literal;
  Frame *current_frame = frame_peek();
  if (current_frame->arg_count < arg_number) {
    exec_error("Subroutine '%s' uses parameter %d, but only got %d.",
               current_frame->subr_called->name,
               arg_number,
               current_frame->arg_count);
  }
  Datum d = stack_pop();
  current_frame->argp[arg_number - 1] = d;
  stack_push(d);
  return 0;
}

hoc6/machine.c

Removing Arguments on Return

There’s one last detail to handle related to argument values. We need to remove them when we return from our subroutines. To that end, we add a small helper function to the machine module, so that both procedures and functions can remove their arguments once complete.


/**
 * Reset any machine state indicated by the given frame. This includes:
 * - Removing any arguments from the stack for the call.
 * - Resetting the program counter
 * - Setting global "returning" flag
 */
static void return_from_frame(Frame *f) {
  // remove args from stack
  while (f->arg_count-- > 0) {
    stack_pop();
  }
  is_returning = true;
  pc = f->ret_pc;
}

hoc6/machine.c

Our uprocret instruction can then be made a little simpler.

int inst_uprocret(void) {
    Frame *f = frame_pop();
-   is_returning = true;
-   pc = f->ret_pc;
+   undo_frame_state(f);
    return 0;
+ }

hoc6/machine.c

Implementation: Functions & Return Values

Finally, we can make the finishing touches that allow us to make functions work. Since calling a function produces a value, a function call must be an expression rather than a statement. Furthermore, a function must include the return keyword (with a value), which we’ll need to check at runtime rather than parse-time.

Step 1: Return Values & the Stack

The ufuncret instruction is used for returning from a function, it should be used just after our return value has been produced and pushed onto the stack. There’s just one problem: there’s a bunch of arguments underneath it, and they need to go! So, the ufuncret instruction removes the top stack value, saves it, then calls return_from_frame to remove the arguments from the stack. After that’s complete, we can re-push the expected return value onto the top of the stack, which will be available to the caller once control returns to them.

int inst_ufuncret(void) {
  Frame *f = frame_pop();
  Datum  result = stack_pop();
  undo_frame_state(f);
  stack_push(result);
  return 0;
}

hoc6/machine.c

Step 2: Adding a value-return statement

The format of a value-return is simply return <expr>, which means that the expression will already have pushed the value onto the data stack by the time we install the return instruction. This makes the action for value-return statements simple; it’s only real job is verifying that we’re actually inside a function; otherwise, we disallow the installation of a ufuncret instruction.

    |       RETURN_KW expr
+         {
+             if (current_subr == NULL) {
+                 exec_error("Syntax error - "
+                            "return keyword used outside of a subroutine");
+             }
+             if (current_subr->type != UFUNC) {
+                 exec_error("Syntax error: -"
+                            "`return <expr>` used outside of a function");
+             }
+             install_instruction(inst_ufuncret);
+             $$ = $2;
+         }
+

hoc6/hoc.y

Step 3: Enforcing Return Values

When a function is installed, our parser will still end it with the uprocret instruction, since we use the same code to install both types of subroutine. Unlike procedures, however, functions should never reach that instruction. We should always hit a ufuncret instruction first.

uprocret; installed for safety

To make sure users understand what they did wrong, we will update the uprocret instruction to raise an error at runtime if it detects that we’re inside a function, since that means we forgot to return a value

int inst_uprocret(void) {
    Frame *f = frame_pop();

+   if (f->def_called->type != UPROC) {
+     exec_error("Function '%s' does not return a value", f->subr_called->name);
+   }

    undo_frame_state(f);
    return 0;
  }

hoc6/machine.c

And with that, our function implementation should be complete!

Implementation: Enhanced Print Statements

We add a (very) minimal implementation of strings for use in print statements. This is their only allowed use; in particular, they cannot be used as variables. In this version of hoc, they also leak memory.

The implementation adds a new string-printing instruction, which has a two-word layout:

The prstr instruction layout

This only requires support for parsing and installing strings into our machine’s memory.

Step 1: Add Dynamic Strings

We begin by writing a (very) small dynamic-string module, hoc6/str.c, based on the common len + capacity implementation. For the sake of brevity (hah), we don’t discuss the implementation here, but we’ll come back to strings in ehoc. Its interface is all we’re interested in.

typedef struct str str;

/** Allocate a string; runtime error on allocation failure */
str *str_new(void);

/** Append a character to the end of a string */
bool str_append(str *s, char c);

/**
 * Get a "view" of string `s` for use as a valid, NUL-terminated C
 * string. The pointer returned is non-owning, and will be
 * invalidated by any manipulations made to the string.
 *
 * In particular, if the caller wishes to store the resulting char
 * buffer for other uses, it should be copied.
 */
const char *str_view(str *s);

/** Clear the string's contents */
void str_clear(str *s);

/** Deallocate `s`, freeing its memory. */
void str_free(str *s);

The dynamic string interfacehoc6/hoc.h

We’ll use this library to allocate storage for strings as they’re parsed.

Step 2: Lexing & Parsing Changes for Strings

The biggest change required for string parsing is recognizing string literals and allocating strings for them during the lexing process.

First we add the STRING token

  %union {
    double hoc_value;
    Symbol *hoc_symbol;
      Addr hoc_address;
      int hoc_integer;
+     str *hoc_string;
  }

  %token  <hoc_value>     NUMBER
  %token  <hoc_symbol>    VAR CONST BUILTIN UNDEF UFUNC UPROC COMMAND
  /* These keyword symbols are efectively singletons */
  %token  <hoc_symbol>    IF_KW ELSE_KW PRINT_KW WHILE_KW FUNC_KW PROC_KW RETURN_KW READ_KW
  %type   <hoc_symbol>    assignable udef_name
  %type   <hoc_integer>   arglist
  %token  <hoc_integer>   PARAM_NUMBER
+ %token  <hoc_string>    STRING

hoc6/hoc.y

We’ll force all strings to be double-quoted (though they may contain escaped quotes), and so once we see a double-quote, we can collect characters until we reach the next one.

  int yylex(void) {
    ...
    // Parse parameters
    ...
+   // Parse strings
+   if (c == '"') {
+     str *s = str_new();
+     while (true) {
+       c = getchar();
+       if (c == '"') {
+         break;
+       }
+       if (c == EOF || c == '\n') {
+         exec_error("String wasn't closed: '%s'", str_view(s));
+       }
+       str_append(s, escaped(c));
+     }
+     yylval.hoc_string = s;
+     return STRING;
+   }
+
    // Get numbers
    ...

hoc6/hoc.y

The only interesting detail is how to escape special characters; the escaped function allows us to handle this transparently.

/**
 * If c is '\': Returns next input character, string-escaped.
 * Otherwise: returns c
 */
int escaped(char c) {
  // Each escaped character is followed by its output char
  static char translations[] = "b\b"
                               "f\f"
                               "n\n"
                               "t\t";

  if (c != '\\') {
    return c;
  }

  c = getchar();
  char *cp = strchr(translations, c);

  // islower() used so that we don't match the output chars or NUL
  if (islower(c) && cp != NULL) {
    return cp[1];
  }

  return c;
}

hoc6/hoc.y

Sidenote: Allowing arbitrary-length symbols

While we’re working on yylex(), we might as well fixup our other use of static strings in hoc: Symbol names. We only allowed 100-character identifiers before, whereas we could easily support more than that. We replace our use of a static buffer with a str object instead. (Note that because install_symbol copies the string passed to it, we are safe to pass it a str_view here.)

If you want to leave the identifier limit in place, you could do so by leaving the hoc.h constant and while condition check in place. Otherwise, you can remove SYMBOL_NAME_LIMIT from hoc.h as well.

We’ll use a single static str for all lexing calls; there’s no reason to allocate & free memory for these repeatedly. Its internal memory use will expand to the maximum identifier length used during hoc’s runtime.

  int yylex(void) {
+   static str *symbol_str = NULL;
    int c;

hoc6/hoc.y

When we find the beginning of a symbol, we’ll reset our string, instead of setting up a buffer.

    // Get numbers
    ...
    if (isalpha(c)) {
      Symbol *s;

-     char buf[SYMBOL_NAME_LIMIT + 1];
-     size_t nread = 0;
+     // reset symbol string for new input
+     if (symbol_str == NULL) {
+       symbol_str = str_new();
+     } else {
+       str_clear(symbol_str);
+     }
  ...

hoc6/hoc.y

Then, we can remove the SYMBOL_NAME_LIMIT checks and replace character updates with str functions.

      do {
-         buf[nread++] = c;
+         str_append(symbol_str, c);
          c = getchar();
-     } while (nread < SYMBOL_NAME_LIMIT && (isalpha(c) || isdigit(c)));
+     } while (isalpha(c) || isdigit(c));

-     // Just in case we exceeded the limit
-     while ((isalpha(c) || isdigit(c))) {
-         c = getchar();
-     }
-
      // at this point, we have a non-alphanumeric 'c'
      ungetc(c, stdin);

-     buf[nread] = '\0';
+     const char *buf = str_view(symbol_str);
      if ((s=lookup(buf)) == NULL) {
          s = install_symbol(buf, UNDEF, 0.0);
      }
      yylval.hoc_symbol = s;
      return (s->type == UNDEF ? VAR : s->type);
  }

hoc6/hoc.y

Step 3: Allow MachineCode to hold strings

We’ll add yet another potential type for MachineCode elements, allowing a word of our VM memory to point to a string.

struct MachineCode {
  CodeType type;
  union {
    double  literal;
    Inst    inst;
    Symbol *symbol;
    Addr    addr;
+   str    *string;
  };
};

MachineCode with str addition hoc6/hoc.h

Step 4: The prstr instruction

Once our parser can install prstr instructions, actually printing the string is trivial. We use our str_view method to create a C string from the string pointer; since we print it immediately, we don’t need to do any defensive copying. (Make sure to add a prototype for inst_prstr to hoc.h.)

int inst_prstr(void) {
  str *s = (pc++)->string;
  printf("%s", str_view(s));
  return 0;
}

hoc6/machine.c

(Note the lack of \n in the output format string. This will be useful in the next output feature: printing multiple expressions)

Step 5: Our Improved print Statement

Now to combine our string-printing functionality with the existing expression-printing.

We’ll replace our print expr statement with a generalized prlist nonterminal.

stmt:    ...
-       |     PRINT_KW expr
-             {
-                 install_instruction(inst_prexpr);
-                 $$ = $2;
-             }
+       |     PRINT_KW prlist
+             {
+                 $$ = $2;
+             }
...

hoc6/hoc.y

This nonterminal handles any nonzero number of STRING or expr elements, and prints each of them. We can base our list-handling rules on our arglist production.

%%
prlist:         expr
                {
                    install_instruction(inst_prexpr);
                    $$ = $1;
                }
        |       STRING
                {
                    $$ = install_instruction(inst_prstr);
                    install_string($1);
                }
        |       prlist ',' expr
                {
                    install_instruction(inst_prexpr);
                    $$ = $1;
                }
        |       prlist ',' STRING
                {
                    install_instruction(inst_prstr);
                    install_string($3);
                    $$ = $1;
                }

hoc6/hoc.y

Reading User Input

To complete our minimal I/O implementation, we’ll add features to read data from input files or the user’s terminal. In fact, we’ll allow specifying multiple files on the command line, and proceed from one to the next as they’re exhausted.

The UPE authors add the input-reading code to hoc.y. For hoc6, we’ll instead add a new inputfiles module, though the functionality is the same. We’ll revisit input/output organization in adhoc.

The call-graph for our user-input flow follows the following simplified diagram:

User Input Organization

The main program initializes our input sources based on the command-line arguments, using the init_inputfiles interface:

/**
 * Initializes the inputfiles module with the first `filecount`
 * members of the `filenames` array; Each should be a file path.
 */
void init_inputfiles(char **argv, int filecount);

hoc6/hoc.h

To actually read data, we’ll use a new varread instruction, installed by the parser when we see a read expression. That machine instruction will use the only other function in the inputfiles module: read_double().

/**
 * Returns true if a double value could be read from the runtime
 * input files, false otherwise. `dest` will be set to the value read,
 * if any, or zero.
 *
 * If input data exists but is in an invalid format, causes an
 * execution error.
 */
bool read_double(double *dest);

hoc6/hoc.h

Step 1: User-Input Layer

The user input layer of hoc is built around a set of zero or more input files, specified on the command line. Any one of these files may be stdin, which is specified by the filename -.

As clients use the read_double function to read data, internally, the input layer reads from the the “current inputfile”. Whenever an EOF is read, the layer transparently switches to the next input file and restarts the read, so that the caller is unaware.

State

Since the user specified a list of input files on the command line, most of our state revolves around pointers into the filenames array. We also need a FILE* for the current input file.

static FILE  *curr_inputfile;  // the FILE* we're reading from
static char **next_filename;   // next filename to read
static char **end_namep;       // represents "no more files"

hoc6/inputfiles.c

Initialization

Our inputfiles module must have its global state initialized before it can be used; our initialization function properly sets curr_inputfile and curr_namep.

void init_inputfiles(char **filenames, int filecount) {
  curr_inputfile = NULL;
  next_filename = filenames;
  end_namep = filenames + filecount;
  next_inputfile();  // ensures curr_inputfile is either valid or NULL
}

hoc6/inputfiles.c

At the beginning of main, we call init_inputfiles() with any elements of theargvarray after the program name, assuming all command-line arguments are input files. This ensures that all calls to our input functions have a validcurr_inputfile, and if it'sNULL`, we are out of input.


  int main(int argc, char *argv[]) {
    program_name = argv[0];
+   init_inputfiles(argv + 1, argc-1);
    install_hoc_builtins();
    ...

hoc6/hoc.y

Switching input files

We call next_inputfile to switch to the next input file whenever we read an EOF. It returns true if another input file is found, or false if all specified inputs are exhausted.

/**
 * Change curr_inputfile to point to the next input file, if
 * any. Closes any existing input file (unless that file points to
 * stdin).
 *
 * Returns true if there is more input to read; false otherwise.
 */
static bool next_inputfile(void) {
  if (curr_inputfile != NULL && curr_inputfile != stdin) {
    fclose(curr_inputfile);
  }
  curr_inputfile = NULL;

  do {
    if (next_filename == end_namep) {
      return false;
    }

    // sync curr_inputfile with curr_namep
    if (strcmp(*next_filename, "-") == 0) {
      curr_inputfile = stdin;
    } else {
      curr_inputfile = fopen(*next_filename, "r");
      if (curr_inputfile == NULL) {
        warning("Could not open input filename: '%s'", *next_filename);
      }
    }
    next_filename += 1;
  } while (curr_inputfile == NULL);

  return true;
}

hoc6/inputfiles.c

Reading Literals

Finally, the read_double function is implemented to hide the list of inputs from callers, and instead handles EOF conditions itself.

bool read_double(double *dest) {
  // curr_inputfile is NULL when all input is exhausted
  if (curr_inputfile == NULL) {
    warning("Cannot read input data - No input files remaining");
    return false;
  }

  while (true) {
    switch (fscanf(curr_inputfile, "%lf", dest)) {
      case EOF:
        if (!next_inputfile()) {
          return false;
        }
        break;  // otherwise try again
      case 0:
        exec_error("Could not read double from '%s' - invalid data format",
                   *curr_namep);
      default:
        return true;
    }
  }
}

hoc6/inputfiles.c

Step 2: Language Changes: The read expression

After adding a read keyword to builtins.c, we can add a final expression production for our new syntax: read(destVariable). The varread instruction we install has two words; the second is the destination variable.

  expr:           NUMBER
...
          |       assign
          |       call
+         |       READ_KW '(' VAR ')'
+                 {
+                     $$ = install_instruction(inst_varread);
+                     install_ref($3);
+                 }

hoc6/hoc.y

Step 3: Machine Implementation

Our final instruction for hoc6 is varread, which uses the read_double interface to assign a variable value and push a success boolean onto the stack.

int inst_varread(void) {
  Symbol *dest = (pc++)->symbol;
  bool    success = read_double(&dest->data.val);
  stack_push(dat(success));
  if (!success) {
    dest->data.val = 0;
  }
  dest->type = VAR;
  return 0;
}

hoc6/machine.c

What’s Next

There are a huge number of opportunities to explore in hoc at this point! Naming just a few:

Hopefully you’re as excited as I was upon reaching the end of hoc6 to keep exploring– there is so much low hanging fruit! Can you build a language you’d actually want to use in a real task?

If there’s enough interest, I’ll add more details to the adhoc source code where some of these ideas are explored. Please feel free to drop me a line; feedback is always welcome!

Source Code

Makefile

CFLAGS ?= -std=c2x -Wall -Wextra -pedantic -Wformat -Wformat-extra-args -Wformat-nonliteral
YFLAGS ?= -d
LDFLAGS=-lm
objects := builtins.o math.o symbol.o machine.o str.o hoc.o inputfiles.o

.PHONY:all
all: hoc

hoc: $(objects)
    cc $(CFLAGS) $(objects) $(LDFLAGS) -o hoc

hoc.c hoc.tab.h: hoc.y
    yacc $(YFLAGS) hoc.y -o hoc.tab.c
    mv hoc.tab.c hoc.c

%.o: hoc.h hoc.tab.h %.c
    cc $(CFLAGS) -c -o $@ $*.c

.PHONY:clean
clean:
    rm -f hoc.tab.h
    rm -f $(objects)
    rm -f hoc.c
    rm -f hoc
builtins.c

#include "hoc.h"
#include "hoc.tab.h"
#include <math.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>  // rand

/* Defined in math.c */
extern double Abs(double), Acos(double), Atan(double), Atan2(double, double),
    Cos(double), Exp(double), Integer(double), Lg(double), Ln(double),
    Log10(double), Pow(double, double), Random(void), Sin(double), Sqrt(double);

static struct {
  const char *name;
  double      value;
} constants[] = {
    {"PI", 3.14159265358979323846},
    {"E", 2.71828182845904523536},
    {NULL, 0},
};

static struct {
  const char *name;
  int         args;
  union {
    double (*call0)(void);
    double (*call1)(double);
    double (*call2)(double, double);
  };
} builtins[] = {
    {"abs", 1, .call1 = Abs},
    {"acos", 1, .call1 = Acos},
    {"atan", 1, .call1 = Atan},
    {"atan2", 2, .call2 = Atan2},
    {"cos", 1, .call1 = Cos},
    {"exp", 1, .call1 = Exp},
    {"int", 1, .call1 = Integer},
    {"lg", 1, .call1 = Lg},
    {"ln", 1, .call1 = Ln},
    {"log10", 1, .call1 = Log10},
    {"pow", 2, .call2 = Pow},
    {"rand", 0, .call0 = Random},
    {"sin", 1, .call1 = Sin},
    {"sqrt", 1, .call1 = Sqrt},
    {NULL, 0, .call0 = NULL},
};

static struct {
  const char *name;
  int         token_type;
} keywords[] = {
    {"if", IF_KW},
    {"else", ELSE_KW},
    {"func", FUNC_KW},
    {"proc", PROC_KW},
    {"print", PRINT_KW},
    {"read", READ_KW},
    {"while", WHILE_KW},
    {"return", RETURN_KW},
    {NULL, 0},
};

extern void command_list(void);

static struct {
  const char *name;
  void (*command_func)(void);
} commands[] = {
    {"LIST", command_list},
    {NULL, 0},
};

void install_hoc_builtins(void) {
  for (int i = 0; constants[i].name != NULL; i++) {
    install_symbol(constants[i].name, CONST, constants[i].value);
  }
  for (int i = 0; builtins[i].name != NULL; i++) {
    Symbol *s = install_symbol(builtins[i].name, BUILTIN, 0.0);
    // TODO: Why not have separate install_value vs install_function?
    s->data.func.args = builtins[i].args;
    switch (builtins[i].args) {
      case 0:
        s->data.func.call0 = builtins[i].call0;
        break;
      case 1:
        s->data.func.call1 = builtins[i].call1;
        break;
      case 2:
        s->data.func.call2 = builtins[i].call2;
        break;
      default:
        fprintf(
            stderr, "Invalid arg count for symbol '%s'\n", builtins[i].name);
        exit(1);
    }
  }

  for (int i = 0; keywords[i].name != NULL; i++) {
    install_symbol(keywords[i].name, keywords[i].token_type, 0.0);
  }

  for (int i = 0; commands[i].name != NULL; i++) {
    Symbol *s = install_symbol(commands[i].name, COMMAND, 0.0);
    s->data.command_func = commands[i].command_func;
  }
}
hoc.h

#ifndef HOC_INC
#define HOC_INC 1

/*
 * Global types & declarations for use in hoc
 */
#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>

///----------------------------------------------------------------
/// strings
///----------------------------------------------------------------

typedef struct str str;

/** Allocate a string; runtime error on allocation failure */
str *str_new(void);

/** Append a character to the end of a string */
bool str_append(str *s, char c);

/**
 * Get a "view" of string `s` for use as a valid, NUL-terminated C
 * string. The pointer returned is non-owning, and will be
 * invalidated by any manipulations made to the string.
 *
 * In particular, if the caller wishes to store the resulting char
 * buffer for other uses, it should be copied.
 */
const char *str_view(str *s);

/** Clear the string's contents */
void str_clear(str *s);

/** Deallocate `s`, freeing its memory. */
void str_free(str *s);

///----------------------------------------------------------------
/// Addresses (Shared by multiple types)
///----------------------------------------------------------------
typedef struct MachineCode MachineCode;
typedef MachineCode       *Addr;

///----------------------------------------------------------------
/// Symbols
///----------------------------------------------------------------
typedef struct Symbol Symbol;

/** Function descriptor for builtins: these always return doubles */
struct BuiltinFunc {
  int args;
  union {
    /* We use an anonymous union for the function pointer; there's
       no meaningful name for the union as a field */
    double (*call0)(void);
    double (*call1)(double);
    double (*call2)(double, double);
  };
};

struct Symbol {
  short   type;  // Generated from our token types
  char   *name;
  Symbol *next;
  union {
    double             val;
    Addr               addr;
    struct BuiltinFunc func;
    void (*command_func)(void);
  } data;
};

Symbol *install_symbol(const char *name, short type, double value);
Symbol *lookup(const char *name);

///----------------------------------------------------------------
/// Machine Code
///----------------------------------------------------------------
typedef int (*Inst)(void);

typedef enum CodeType {
  CT_INST = 'I',
  CT_LITERAL = 'L',
  CT_SYM_REF = 'S',
  CT_ADDR = 'A',
  CT_CHRSTR = 'C',
} CodeType;

struct MachineCode {
  CodeType type;
  union {
    double  literal;
    Inst    inst;
    Symbol *symbol;
    Addr    addr;
    str    *string;
  };  // anonymous union; no need to name it
};

#define STOP_INST NULL
#define is_stop_inst(mc) ((mc)->type == CT_INST && (mc)->inst == STOP_INST)

///----------------------------------------------------------------
/// Machine Program
///----------------------------------------------------------------
extern MachineCode prog[];
extern Addr        prog_start;

/**
 * Installs the given `MachineCode` object into the next available
 * location in the machine's current program. A runtime error will
 * occur if the program is full.
 *
 * Returns the address at which the code was installed.
 */
Addr install_code(MachineCode mc);

/** The next available address in the machine's current program */
extern Addr progp;

/**
 * Reserves all currently-installed code as a subroutine, under Symbol
 * `subr_sym`. The value of `subr_sym` will be updated to the address
 * of the installed routine. Ensures that the code will not be
 * overwritten by future installations, even after a machine reset.
 */
void reserve_subr(Symbol *subr_sym);

#define install_literal(lit) \
  install_code((struct MachineCode){.type = CT_LITERAL, .literal = lit})

#define install_ref(sp) \
  install_code((struct MachineCode){.type = CT_SYM_REF, .symbol = sp})

#define install_instruction(instr) \
  install_code((struct MachineCode){.type = CT_INST, .inst = instr})

#define install_address(address) \
  install_code((struct MachineCode){.type = CT_ADDR, .addr = address})

#define install_string(s) \
  install_code((struct MachineCode){.type = CT_CHRSTR, .string = s})

void execute(Addr start_addr);

///----------------------------------------------------------------
/// Instructions
///----------------------------------------------------------------

int inst_add(void);
int inst_and(void);
int inst_argget(void);
int inst_argset(void);
int inst_assignvar(void);
int inst_assignconst(void);
int inst_call(void);
int inst_div(void);
int inst_eval(void);
int inst_eq(void);
int inst_gt(void);
int inst_gte(void);
int inst_ifcode(void);
int inst_lt(void);
int inst_lte(void);
int inst_mul(void);
int inst_neq(void);
int inst_not(void);
int inst_or(void);
int inst_pop(void);
int inst_prexpr(void);
int inst_print(void);
int inst_prstr(void);
int inst_pushlast(void);
int inst_pushlit(void);
int inst_pushref(void);
int inst_sub(void);
int inst_subrexec(void);
int inst_ufuncret(void);
int inst_uprocret(void);
int inst_varread(void);
int inst_whilecode(void);

///----------------------------------------------------------------
/// Commands
///----------------------------------------------------------------

void command_list(void);
void machine_run_command(Symbol *);

///----------------------------------------------------------------
/// Machine Stack
///----------------------------------------------------------------

typedef union Datum {
  double  value;
  Symbol *symbol;
} Datum;

void machine_startup(void);
void machine_reset_program(void);

///----------------------------------------------------------------
/// Runtime Functions
///----------------------------------------------------------------

/** input files */

/**
 * Initializes the inputfiles module with the first `filecount`
 * members of the `filenames` array; Each should be a file path.
 */
void init_inputfiles(char **filenames, int filecount);

/**
 * Returns true if a double value could be read from the runtime
 * input files, false otherwise. `dest` will be set to the value read,
 * if any, or zero.
 *
 * If input data exists but is in an invalid format, causes an
 * execution error.
 */
bool read_double(double *dest);

/** global error handling */
__attribute__((format(printf, 1, 2))) void warning(const char *msg, ...);
__attribute__((format(printf, 1, 2))) void exec_error(const char *msg, ...);

/** builtins */
void install_hoc_builtins(void);

#endif  // HOC_INC
hoc.y

%{
///----------------------------------------------------------------
/// C Preamble
///----------------------------------------------------------------

/*
 * "higher-order calculator" - Version 5
 * From "The UNIX Programming Environment"
 */

#include <stdbool.h>
#include <stdio.h>
#include <ctype.h>
#include <signal.h>
#include <setjmp.h>
#include <stdarg.h>
#include <string.h>
#include "hoc.h"

///----------------------------------------------------------------
/// global state
///----------------------------------------------------------------

/** Tracks the symbol for the subroutine currently being parsed (if any). */
Symbol *current_subr = NULL;

jmp_buf begin;

///----------------------------------------------------------------
/// local declarations
///----------------------------------------------------------------

#define YACC_SUCCESS 0
int yylex(void);
void yyerror(const char *s);
void check_call_args(const Symbol *s, int actual);

%}

/*
 * Grammar points:
 *
 * - EXPRESSIONS provide a value on the top of the satck.
 * - STATEMENTS are not expressions, and do not provide a value.
 */

%union {
  double hoc_value;
  Symbol *hoc_symbol;
    Addr hoc_address;
    int hoc_integer;
    str *hoc_string;
}

%token  <hoc_value>     NUMBER
%token  <hoc_symbol>    VAR CONST BUILTIN UNDEF UFUNC UPROC COMMAND
/* These keyword symbols are efectively singletons */
%token  <hoc_symbol>    IF_KW ELSE_KW PRINT_KW WHILE_KW FUNC_KW PROC_KW RETURN_KW READ_KW
%type   <hoc_symbol>    assignable subr_decl
%type   <hoc_integer>   arglist
%token  <hoc_integer>   PARAM_NUMBER
%token  <hoc_string>    STRING
%type   <hoc_address>   args_start assign call end expr if prlist stmt stmtlist while
%right  '='       /* right-associative, much like C */
%left   OR
%left   AND
%left   EQ GT GTE LT LTE NEQ
%left   '+' '-'   /* left-associative, same precedence */
%left   '*' '/'   /* left-assoc; higher precedence */
%right  '^'       /* exponents */
%left   UNARY_MINUS NOT /* Even higher precedence
                        than mul or div. Can't use '-', as we've used
                        it already for subtraction. */
%%
start:          list
                {
                    install_instruction(STOP_INST);
                    YYACCEPT;
                }
        ;
list:       /*  nothing */
        |       list terminator
        |       list stmt terminator
                {
                    install_instruction(STOP_INST);
                    YYACCEPT;
                }
        |       list assign terminator
                {
                    install_instruction(inst_pop);
                    install_instruction(STOP_INST);
                    YYACCEPT;
                }
        |       list expr terminator
                {
                    install_instruction(inst_print);
                    install_instruction(STOP_INST);
                    YYACCEPT;
                }
        |       list error terminator { yyerrok; }
        ;
assign:         assignable '=' expr
                {
                    $$ = $3;
                    install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_assignvar);
                }
        |       assignable ':' '=' expr
                {
                    $$ = $4;
                    install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_assignconst);
                }
        ;
arglist:        /* nothing */ { $$ = 0; }
        |       expr { $$ = 1; }
        |       arglist ',' expr { $$ = $1 + 1; }
        ;
stmt:           expr { install_instruction(inst_pop); }
        |       '{' stmtlist '}' { $$ = $2; }
        |       subr_decl '(' ')' stmt
                {
                   /*
                    * Note: the order of these two lines matters!
                    * `uprocret` should be part of the subr body,
                    * so reserve space *after* installing it.
                    */
                    install_instruction(inst_uprocret);
                    reserve_subr($1);
                    current_subr = NULL;
                    $$ = $4;
                }
        |       UPROC '(' args_start arglist ')'
                {
                    install_instruction(inst_subrexec);
                    install_ref($1);
                    install_literal($4);
                    $$ = $3;
                }
        |       RETURN_KW
                {
                    if (current_subr == NULL) {
                        exec_error("Syntax error - "
                                   "`return` keyword used outside of a subroutine");
                    }
                    if (current_subr->type != UPROC) {
                        exec_error("Syntax error - "
                                   "bare return used outside of a procedure");
                    }
                    $$ = install_instruction(inst_uprocret);
                }
        |       RETURN_KW expr
                {
                    if (current_subr == NULL) {
                        exec_error("Syntax error - "
                                   "return keyword used outside of a subroutine");
                    }
                    if (current_subr->type != UFUNC) {
                        exec_error("Syntax error - "
                                   "`return <expr>` used outside of a function");
                    }
                    install_instruction(inst_ufuncret);
                    $$ = $2;
                }
        |       while cond stmt end
                {
                    $1[0].type = CT_INST;
                    $1[0].inst = inst_whilecode;
                    $1[1].type = CT_ADDR;
                    $1[1].addr = $3;
                    $1[2].type = CT_ADDR;
                    $1[2].addr = $4;
                }
        |       if cond stmt end
                {
                    $1[0].type = CT_INST;
                    $1[0].inst = inst_ifcode;
                    $1[1].type = CT_ADDR;
                    $1[1].addr = $3;
                    $1[3].type = CT_ADDR;
                    $1[3].addr = $4;
                }
        |       if cond stmt end ELSE_KW stmt end
                {
                    $1[0].type = CT_INST;
                    $1[0].inst = inst_ifcode;
                    $1[1].type = CT_ADDR;
                    $1[1].addr = $3;
                    $1[2].type = CT_ADDR;
                    $1[2].addr = $6;
                    $1[3].type = CT_ADDR;
                    $1[3].addr = $7;
                }
        |       COMMAND
                {
                    machine_run_command($1);
                    $$ = progp;
                }
        |       PRINT_KW prlist
                {
                    $$ = $2;
                }
        ;
prlist:         expr
                {
                    install_instruction(inst_prexpr);
                    $$ = $1;
                }
        |       STRING
                {
                    $$ = install_instruction(inst_prstr);
                    install_string($1);
                }
        |       prlist ',' expr
                {
                    install_instruction(inst_prexpr);
                    $$ = $1;
                }
        |       prlist ',' STRING
                {
                    install_instruction(inst_prstr);
                    install_string($3);
                    $$ = $1;
                }
        ;
stmtlist:       /* nothing */ { $$ = progp; }
        |       stmtlist stmt terminator { $$ = $1; }
        |       stmtlist terminator
        ;
cond:           '(' expr ')'
                {
                    install_instruction(STOP_INST);
                }
        ;
while:          WHILE_KW
                {
                    $$ = install_instruction(STOP_INST);
                    install_instruction(STOP_INST);
                    install_instruction(STOP_INST);
                }
        ;
if:             IF_KW
                {
                    $$ = install_instruction(STOP_INST);
                    install_instruction(STOP_INST);
                    install_instruction(STOP_INST);
                    install_instruction(STOP_INST);
                }
        ;
end:            /* nothing */
                {
                    install_instruction(STOP_INST);
                    $$ = progp;
                }
        ;
call:           BUILTIN '(' ')'
                {
                    check_call_args($1, 0);
                    $$ = install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_call);
                }
        |       BUILTIN '(' expr ')'
                {
                    check_call_args($1, 1);
                    install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_call);
                    $$ = $3; /* Block starts with pushing args */
                }
        |       BUILTIN '(' expr ',' expr ')'
                {
                    check_call_args($1, 2);
                    $$ = install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_call);
                    $$ = $3; /* Block starts with pushing args */
                }
        ;
subr_decl:      FUNC_KW VAR
                {
                    current_subr = $2;
                    $2->type=UFUNC;
                    $$ = $2;
                }
        |       PROC_KW VAR
                {
                    current_subr = $2;
                    $2->type=UPROC;
                    $$ = $2;
                }
        ;
/*
 * Every expr construct, when *executed*, must push a value onto the
 * stack
 */
expr:           NUMBER
                {
                    $$ = install_instruction(inst_pushlit);
                    install_literal($1);
                }
        |       '@'
                {
                    $$ = install_instruction(inst_pushlast);
                }
        |       assignable
                {
                    $$ = install_instruction(inst_pushref);
                    install_ref($1);
                    install_instruction(inst_eval);
                }
        |       assign
        |       call
        |       READ_KW '(' VAR ')'
                {
                    $$ = install_instruction(inst_varread);
                    install_ref($3);
                }
        |       UFUNC '(' args_start arglist ')'
                {
                    install_instruction(inst_subrexec);
                    install_ref($1);
                    install_literal($4);
                    $$ = $3;
                }
        |       PARAM_NUMBER
                {
                    if (current_subr == NULL) {
                        exec_error("Syntax error - "
                                   "Parameter references not allowed here");
                    }
                    $$ = install_instruction(inst_argget);
                    install_literal($1);
                }
        |       PARAM_NUMBER '=' expr
                {
                    if (current_subr == NULL) {
                        exec_error("Syntax error - "
                                   "Parameter references not allowed here");
                    }
                    $$ = install_instruction(inst_argset);
                    install_literal($1);
                }
        |       expr '+' expr   { install_instruction(inst_add); }
        |       expr '-' expr   { install_instruction(inst_sub); }
        |       expr '*' expr   { install_instruction(inst_mul); }
        |       expr '/' expr   { install_instruction(inst_div); }
        |       expr LT expr    { install_instruction(inst_lt); }
        |       expr LTE expr   { install_instruction(inst_lte); }
        |       expr GT expr    { install_instruction(inst_gt); }
        |       expr GTE expr   { install_instruction(inst_gte); }
        |       expr EQ expr    { install_instruction(inst_eq); }
        |       expr NEQ expr   { install_instruction(inst_neq); }
        |       expr AND expr   { install_instruction(inst_and); }
        |       expr OR expr    { install_instruction(inst_or); }
        |       expr '^' expr
                {
                    install_instruction(inst_pushref);
                    install_ref(lookup("pow"));
                    install_instruction(inst_call);
                }
        |       '-'expr         %prec UNARY_MINUS /* prec makes us bind tighter than subtraction */
                {
                    install_instruction(inst_pushlit);
                    install_literal(-1);
                    install_instruction(inst_mul);
                    $$=$2;
                }
        |       NOT expr %prec NOT
                {
                    install_instruction(inst_not);
                    $$=$2;
                }
        |       '(' expr ')' { $$ = $2; }
        ;

args_start:     /* nothing */ { $$ = progp; }
        ;

terminator:     '\n'
        |       ';'
        ;
assignable:     VAR
        |       CONST

%% // end of grammar


/* error tracking */
char *program_name;
int line_number = 1;
bool reached_eof = false;

int main(int argc, char *argv[]) {
  program_name = argv[0];
  init_inputfiles(argv + 1, argc - 1);
  install_hoc_builtins();

  machine_startup();

  setjmp(begin);
  machine_reset_program();
  while (!reached_eof && yyparse() == YACC_SUCCESS) {
    execute(NULL);
    machine_reset_program();
  }
  return 0;
}

/**
 * "Looks ahead 1 character for a possibility. If it's found, returns
 * `match`. If not:
 * - places character back onto input stream for next read
 * - returns the `otherwise` result.
 */
static int lookahead(char expect, int match, int otherwise) {
  int c = getchar();
  if (c == expect) {
    return match;
  } else {
    ungetc(c, stdin);
    return otherwise;
  }
}

/**
 * If c is '\': Returns next input character, string-escaped.
 * Otherwise: returns c
 */
int escaped(char c) {
  // Each escaped character is followed by its output char
  static char translations[] = "b\b"
                               "f\f"
                               "n\n"
                               "t\t";

  if (c != '\\') {
    return c;
  }

  c = getchar();
  char *cp = strchr(translations, c);

  // islower() used so that we don't match the output chars or NUL
  if (islower(c) && cp != NULL) {
    return cp[1];
  }

  return c;
}

/* our simple, hand-rolled lexer. */
int yylex(void) {
  static str *symbol_str = NULL;
  int         c;

  // Skip space
  do {
    c = getchar();
  } while (c == ' ' || c == '\t');

  // Skip comments
  if (c == '/' && lookahead('/', 0, '/') == 0) {
    while (c != '\n' && c != EOF) {
      c = getchar();
    }
    ungetc(c, stdin);
  }

  /* Newline / EOF handling
   *
   * The first time we see an EOF, we "fake" a newline,
   * so that the list production can recognize any in-progress
   * statement. The next call will return the EOF.
   */
  if (c == EOF && !reached_eof) {
    ungetc(c, stdin);
    reached_eof = true;
    c = '\n';
  }

  if (c == '\n') {
    line_number++;
  }

  // Parse parameters
  if (c == '$') {
    if (scanf("%d", &yylval.hoc_integer) != 1) {
      exec_error("Invalid parameter specification");
    } else if (yylval.hoc_integer <= 0) {
      exec_error("Parameter number '%d' invalid - cannot be negative",
                 yylval.hoc_integer);
    }
    return PARAM_NUMBER;
  }

  // Parse strings
  if (c == '"') {
    str *s = str_new();
    while (true) {
      c = getchar();
      if (c == '"') {
        break;
      }
      if (c == EOF || c == '\n') {
        exec_error("String wasn't closed: '%s'", str_view(s));
      }
      str_append(s, escaped(c));
    }
    yylval.hoc_string = s;
    return STRING;
  }

  // Get numbers
  if (c == '.' || isdigit(c)) {
    ungetc(c, stdin);
    scanf("%lf", &yylval.hoc_value);
    return NUMBER;
  }

  if (isalpha(c)) {
    Symbol *s;

    // reset symbol string for new input
    if (symbol_str == NULL) {
      symbol_str = str_new();
    } else {
      str_clear(symbol_str);
    }

    do {
      str_append(symbol_str, c);
      c = getchar();
    } while (isalpha(c) || isdigit(c));

    // Just in case we exceeded the limit
    while ((isalpha(c) || isdigit(c))) {
      c = getchar();
    }

    // at this point, we have a non-alphanumeric 'c'
    ungetc(c, stdin);

    const char *buf = str_view(symbol_str);
    if ((s = lookup(buf)) == NULL) {
      s = install_symbol(buf, UNDEF, 0.0);
    }
    yylval.hoc_symbol = s;
    return (s->type == UNDEF ? VAR : s->type);
  }

  if (isalpha(c)) {
    Symbol     *s;
    static str *sbuf = NULL;

    // setup string for new input
    if (sbuf == NULL) {
      sbuf = str_new();
    } else {
      str_clear(sbuf);
    }
    do {

      c = getchar();
    } while (isalpha(c) || isdigit(c));
    ungetc(c, stdin);

    const char *buf = str_view(sbuf);
    if ((s = lookup(buf)) == NULL) {
      s = install_symbol(buf, UNDEF, 0.0);
    }
    yylval.hoc_symbol = s;
    return (s->type == UNDEF ? VAR : s->type);
  }

  switch (c) {
    case '<':
      return lookahead('=', LTE, LT);
    case '>':
      return lookahead('=', GTE, GT);
    case '=':
      return lookahead('=', EQ, '=');
    case '!':
      return lookahead('=', NEQ, NOT);
    case '|':
      return lookahead('|', OR, '|');
    case '&':
      return lookahead('&', AND, '&');
    default:
      return c;
  }
}

void yyerror(const char *s) {
  warning("%s", s);
  if (current_subr != NULL) {
    current_subr->type = UNDEF;
    current_subr = NULL;
  }
}

/*
 * Verify Symbol's declared arg count matches actual,
 * or display a helpful error message with mismatch
 */
void check_call_args(const Symbol *s, int actual) {
  int expected = s->data.func.args;
  if (expected != actual) {
    exec_error("Wrong number of arguments for %s: expected %d, got %d",
               s->name,
               expected,
               actual);
  }
}

#define print_error_prefix() fprintf(stderr, "%s: ", program_name)
#define print_error_suffix() fprintf(stderr, " (on line %d)\n", line_number)

void warning(const char *msg, ...) {
  va_list args;
  va_start(args, msg);

  print_error_prefix();
  vfprintf(stderr, (msg), args);
  print_error_suffix();

  va_end(args);
}

void exec_error(const char *msg, ...) {
  va_list args;
  va_start(args, msg);

  print_error_prefix();
  vfprintf(stderr, (msg), args);
  print_error_suffix();

  va_end(args);

  longjmp(begin, 0);
}
inputfiles.c

#include "hoc.h"
#include <stdbool.h>
#include <stdio.h>
#include <string.h>

///----------------------------------------------------------------
/// global state
///----------------------------------------------------------------

static FILE  *curr_inputfile;  // the FILE* we're reading from
static char **next_filename;   // next filename to read
static char **end_namep;       // represents "no more files"

/**
 * Change curr_inputfile to point to the next input file, if
 * any. Closes any existing input file (unless that file points to
 * stdin).
 *
 * Returns true if there is more input to read; false otherwise.
 */
static bool next_inputfile(void) {
  if (curr_inputfile != NULL && curr_inputfile != stdin) {
    fclose(curr_inputfile);
  }
  curr_inputfile = NULL;

  do {
    if (next_filename == end_namep) {
      return false;
    }

    // sync curr_inputfile with curr_namep
    if (strcmp(*next_filename, "-") == 0) {
      curr_inputfile = stdin;
    } else {
      curr_inputfile = fopen(*next_filename, "r");
      if (curr_inputfile == NULL) {
        warning("Could not open input filename: '%s'", *next_filename);
      }
    }
    next_filename += 1;
  } while (curr_inputfile == NULL);

  return true;
}

void init_inputfiles(char **filenames, int filecount) {
  curr_inputfile = NULL;
  next_filename = filenames;
  end_namep = filenames + filecount;
  next_inputfile();  // ensures curr_inputfile is either valid or NULL
}

bool read_double(double *dest) {
  // curr_inputfile is NULL only when all input is exhausted
  if (curr_inputfile == NULL) {
    warning("Cannot read input data - No input files remaining");
    return false;
  }

  while (true) {
    switch (fscanf(curr_inputfile, "%lf", dest)) {
      case EOF:
        if (!next_inputfile()) {
          return false;
        }
        break;  // otherwise try again
      case 0:
        exec_error("Could not read double from '%s' - invalid data format",
                   *next_filename);
      default:
        return true;
    }
  }
}
machine.c

#include "hoc.h"
#include "hoc.tab.h"
#include <assert.h>
#include <stdbool.h>
#include <stddef.h>  // NULL
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

/*
 * MACHINE STATE
 * =============
 *
 *
 *    == PROG ==              == FRAMES ==              == STACK ==
 *  ┌─────────────┐◀───┐     ┌─────────────┐          ┌─────────────┐
 *  │ MachineCode │    pc    │    Frame    │          │    Datum    │
 *  ├─────────────┤          ├─────────────┤          ├─────────────┤
 *  │ MachineCode │          │    Frame    │          │    Datum    │
 *  ├─────────────┤          ├─────────────┤          ├─────────────┤
 *  │ MachineCode │   progp  │    Frame    │ framep   │    Datum    │ stackp
 *  ├─────────────┤◀────┘    ├─────────────┤◀───┘     ├─────────────┤ ◀──┘
 *  │     ...     │          │     ...     │          │     ...     │
 *  └─────────────┘          └─────────────┘          └─────────────┘
 *
 *
 * The Program
 * -----------
 * We have our "program" stored as a list of MachineCode
 * objects, each encoding one of:
 *   - an instruction
 *   - a literal value
 *   - a symbol reference
 *
 * We "execute" our program by setting the program counter (pc)
 * pointer to the first MachineCode (which *must* be an instruction),
 * and handling each instruction we find. Within an instruction, the
 * program counter might be updated so that by the *end* of any
 * instruction, the program counter is always pointing to another
 * instruction.
 *
 *
 * The Stack
 * ---------
 *
 * Each instruction manipulates the stack, pushing,
 * popping, or combining values from the stack.  The stack elements
 * are Datum objects, which act much like MachineCodes, but with two
 * differences.
 *
 * 1. Datums cannot represent an instruction; they resolve to a value
 * 2. The type of a datum can always be resolved from their context,
 *    and so no "type" field is necessary.
 *
 *
 * The Frame Stack
 * ---------------
 *
 * When a user-defined subroutine is called, a new Frame is
 * placed on a separate "frame stack". The frames contain enough
 * information to:
 *
 *   - "Jump" to the location of the subroutine
 *       - The `def_called` field is a Symbol with the start Addr
 *
 *   - Locate any arguments on the (data) stack
 *       - The argp/arg_count fields locate the arguments passed
 *
 *   - Return to previous prog location after the subroutine finishes
 *       - Via the `ret_pc` field
 *
 *
 *         PROG                                    STACK
 *    ┌─────────────┐                         ┌─────────────┐
 * ┌─▶│ MachineCode │                         │    Datum    │
 * │  ├─────────────┤      FRAME              ├─────────────┤
 * │  │ MachineCode │     ┌─────────────┐     │    Datum    │
 * │  ├─────────────┤     │def_called:  │     ├─────────────┤
 * │  │     ...     │     │"myfunc"     │     │    Datum    │
 * │  ├─────────────┤     ├─────────────┤     ├─────────────┤
 * │  │ MachineCode │  ┌──┤ret_pc       │  ┌─▶│    Datum    │
 * │  ├─────────────┤  │  ├─────────────┤  │  ├─────────────┤
 * │  │ MachineCode │  │  │argp         ├──┘  │    Datum    │
 * │  ├─────────────┤  │  ├─────────────┤   ─▶├─────────────┤
 * │  │ MachineCode │◀─┘  │arg_count: 2 │─ ┘  │    Datum    │
 * │  ├─────────────┤     └─────────────┘     ├─────────────┤
 * │  │     ...     │                         │      .      │
 * │  └─────────────┘                         │      .      │
 * │                                          └─────────────┘
 * │                       SYMBOL TABLE
 * │  ┌─────────────────────────────────────────────────────┐
 * │  │ ┌──────────────┐                                    │
 * │  │ │name: "myfunc"│                                    │
 * │  │ ├──────────────┤                                    │
 * └──┼─│addr          │                                    │
 *    │ └──────────────┘                                    │
 *    └─────────────────────────────────────────────────────┘
 */

/*
 * Prog
 */
#define PROGSIZE 2000
MachineCode prog[PROGSIZE];
Addr        prog_start = prog;  // No installation above this address
Addr        progp;              // Next free location for code installation
static Addr pc;                 // the current location in the executing program

static Addr highest_addr_written = prog;  // Used in LISTing contents

/*
 * Data stack
 */
#define STACKSIZE 256
static Datum  stack[STACKSIZE];            // the machine stack
static Datum *stackp;                      // next free spot on the stack
static Datum  last_popped = {.value = 0};  // The last value popped
static void   stack_push(Datum d);
static Datum  stack_pop(void);

/*
 * Frames and Frame stack
 */
typedef struct Frame {
  Symbol *subr_called;  // the UFUNC/UPROC we have jumped to
  Addr    ret_pc;       // return location
  Datum  *argp;         // pointer to first argument
  int     arg_count;    // number of arguments
} Frame;

#define FRAMELIMIT 100  // Recursion depth limit
static Frame        frames[FRAMELIMIT];
static const Frame *OVERFLOW_FRAME = frames + FRAMELIMIT;
static Frame       *fp = frames;   // Next frame to use
static bool         is_returning;  // true when returning from a user subroutine

///----------------------------------------------------------------
/// Initialization
///----------------------------------------------------------------

/** Initialize machine state for startup */
void machine_startup(void) {
  srand(time(NULL));  // use current time as seed for random generator
  machine_reset_program();
}

/** Initialize machine state for a new program */
void machine_reset_program(void) {
  is_returning = false;
  stackp = stack;
  progp = prog_start;
}

///----------------------------------------------------------------
/// Code Installation
///----------------------------------------------------------------

Addr install_code(MachineCode mc) {
  if (progp >= prog + PROGSIZE) {
    exec_error("Max program size exceeded");
  }

  *progp = mc;
  Addr result = progp++;  // return location of THIS instruction
  if (result > highest_addr_written) {
    highest_addr_written = result;
  }
  return result;
}

void reserve_subr(Symbol *subr_sym) {
  subr_sym->data.addr = prog_start;
  prog_start = progp;
}

///----------------------------------------------------------------
/// Execution
///----------------------------------------------------------------

void execute(Addr start_addr) {
  is_returning = false;
  pc = (start_addr == NULL) ? prog_start : start_addr;
  while (!is_stop_inst(pc) && !is_returning) {
    if (pc >= progp) {
      exec_error("PC was past end-of-program");
    }
    if (pc->type != CT_INST) {
      exec_error("Programming error, type '%c' was not INST!", pc->type);
    }

    (*(*pc++).inst)();  // increment PC and execute inst
  }
}

/** Push new value onto stack */
static void stack_push(Datum d) {
  if (stackp >= stack + STACKSIZE) {
    exec_error("Stack overflow");
  }
  *(stackp++) = d;  // copy
}

/** Pop most recent value from stack, or fail on underflow */
static Datum stack_pop(void) {
  if (stackp <= stack) {
    exec_error("Stack underflow");
  }
  Datum d = *(--stackp);
  return (last_popped = d);
}

/**
 * Push a new frame for a subroutine call onto the frame stack, or fail on
 * overflow.
 *
 * The frame is initialized as a call to Symbol `s`, returning to address
 * `caller_pc`. Caller must initialize all other frame data.
 */
static Frame *frame_push(Symbol *s, Addr caller_pc) {
  if (fp == OVERFLOW_FRAME) {
    exec_error("Call-Stack Overflow while calling '%s'", s->name);
  }

  fp->subr_called = s;
  fp->ret_pc = caller_pc;
  return fp++;
}

/** Pop most recent Frame from the frame stack, or fail on underflow */
static Frame *frame_pop(void) {
  if (fp == frames) {
    exec_error("Call-Stack Underflow while attempting to return");
  }
  return --fp;
}

/** Access top of frame stack, or fail if no Frame exists */
static Frame *frame_peek(void) {
  if (fp == frames) {
    exec_error("Call-Stack Underflow while attempting to peek at top frame");
  }
  return fp - 1;
}

/**
 * Reset any machine state indicated by the given frame. This includes:
 * - Removing any arguments from the stack for the call.
 * - Resetting the program counter
 * - Setting global "returning" flag
 */
static void return_from_frame(Frame *f) {
  for (int i = f->arg_count; i > 0; i--) {
    stack_pop();  // remove arg from stack
  }
  is_returning = true;
  pc = f->ret_pc;
}

///----------------------------------------------------------------
/// Instructions
///----------------------------------------------------------------

#define dat(x) ((Datum){.value = x})

int inst_pushlit(void) {
  assert(pc->type == CT_LITERAL);

  Datum d;
  d.value = pc->literal;
  stack_push(d);
  pc++;
  return 0;
}

int inst_pushref(void) {
  assert(pc->type == CT_SYM_REF);

  Datum d;
  d.symbol = pc->symbol;
  stack_push(d);
  pc++;
  return 0;
}

int inst_pushlast(void) {
  stack_push(last_popped);
  return 0;
}

int inst_eval(void) {
  Datum d = stack_pop();  // must be symbol
  switch (d.symbol->type) {
    case UNDEF:
      exec_error("Undefined variable: '%s'", d.symbol->name);
      break;
    case BUILTIN:
    case UFUNC:
    case UPROC:
      exec_error("Callable '%s' cannot be evaluated", d.symbol->name);
      break;
    default:  // VAR,CONST
      stack_push(dat(d.symbol->data.val));
      break;
  }
  return 0;
}

int inst_whilecode(void) {
  /*
    ┌─────────────────────────────┐
    │          whilecode          │
    ├─────────────────────────────┤
    │        address(body)        │
    ├─────────────────────────────┤
    │  address(stmt after body)   │
    ├─────────────────────────────┤
    │            cond             │
    │             ..              │
    ├─────────────────────────────┤
    │            STOP             │
    ├─────────────────────────────┤
    │            body             │
    │            ....             │
    ├─────────────────────────────┤
    │            STOP             │
    ├─────────────────────────────┤
    │       stmt after body       │
    └─────────────────────────────┘
  */
  Addr body_addr = (pc++)->addr;
  Addr end_addr = (pc++)->addr;
  Addr cond_addr = pc;

  while (true) {
    if (is_returning) {
      return 0;  // PC is already set to the appropriate address
    }

    execute(cond_addr);
    if (stack_pop().value) {
      execute(body_addr);
    } else {
      break;
    }
  }

  // condition false
  pc = end_addr;
  return 0;
}

int inst_ifcode(void) {
  /*
    ┌─────────────────────────────┐
    │           ifcode            │
    ├─────────────────────────────┤
    │       address(ifbody)       │
    ├─────────────────────────────┤
    │      address(elsebody)      │
    ├─────────────────────────────┤
    │   address(stmt after if)    │
    ├─────────────────────────────┤
    │            cond             │
    │             ..              │
    ├─────────────────────────────┤
    │            STOP             │
    ├─────────────────────────────┤
    │           ifbody            │
    │            ....             │
    ├─────────────────────────────┤
    │            STOP             │
    ├─────────────────────────────┤
    │          elsebody           │
    │            ....             │
    ├─────────────────────────────┤
    │            STOP             │
    ├─────────────────────────────┤
    │        stmt after if        │
    └─────────────────────────────┘
  */
  Addr        ifbody_addr = pc[0].addr;
  MachineCode maybe_elsebody = pc[1];
  Addr        end_addr = pc[2].addr;
  Addr        cond_addr = pc + 3;

  execute(cond_addr);
  Datum d = stack_pop();  // must be value
  if (d.value) {
    execute(ifbody_addr);
  } else if (maybe_elsebody.type == CT_ADDR) {
    execute(maybe_elsebody.addr);
  }

  if (!is_returning) {
    pc = end_addr;
  }

  return 0;
}

int inst_call(void) {
  Datum s = stack_pop();  // must be symbol
  if (s.symbol->type != BUILTIN) {
    exec_error("Cannot call non-builtin '%s'", s.symbol->name);
  }

  double result;
  switch (s.symbol->data.func.args) {
    case 0:
      result = s.symbol->data.func.call0();
      break;
    case 1: {
      Datum arg = stack_pop();
      result = s.symbol->data.func.call1(arg.value);
      break;
    }
    case 2: {
      Datum arg2 = stack_pop();
      Datum arg1 = stack_pop();
      result = s.symbol->data.func.call2(arg1.value, arg2.value);
      break;
    }
  }

  stack_push(dat(result));
  return 0;
}

int inst_assignvar(void) {
  Datum lhs = stack_pop();  // must be symbol
  Datum rhs = stack_pop();  // must be value
  if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
    exec_error("Cannot assign to symbol '%s'", lhs.symbol->name);
  }
  lhs.symbol->data.val = rhs.value;
  lhs.symbol->type = VAR;
  stack_push(dat(rhs.value));
  return 0;
}

int inst_assignconst(void) {
  Datum lhs = stack_pop();  // must be symbol
  Datum rhs = stack_pop();  // must be value
  if (lhs.symbol->type == CONST) {
    exec_error("Cannot reassign constant '%s'", lhs.symbol->name);
  } else if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
    exec_error("Cannot assign to symbol '%s'", lhs.symbol->name);
  }
  lhs.symbol->data.val = rhs.value;
  lhs.symbol->type = CONST;
  stack_push(dat(rhs.value));
  return 0;
}

int inst_pop(void) {
  stack_pop();
  return 0;
}

int inst_print(void) {
  Datum d = stack_pop();
  printf("\t%.8g\n", d.value);
  fflush(stdout);
  return 0;
}

int inst_prexpr(void) {
  Datum d = stack_pop();
  printf("%.8g", d.value);
  fflush(stdout);
  return 0;
}

int inst_prstr(void) {
  str *s = (pc++)->string;
  printf("%s", str_view(s));
  fflush(stdout);
  return 0;
}

int inst_add(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value + rhs.value));
  return 0;
}

int inst_and(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value && rhs.value));
  return 0;
}

int inst_or(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value || rhs.value));
  return 0;
}

int inst_not(void) {
  Datum d = stack_pop();  // must be value
  d.value = !d.value;
  stack_push(d);
  return 0;
}

int inst_lt(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value < rhs.value));
  return 0;
}

int inst_lte(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value <= rhs.value));
  return 0;
}
int inst_gt(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value > rhs.value));
  return 0;
}

int inst_gte(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value >= rhs.value));
  return 0;
}

int inst_eq(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value == rhs.value));
  return 0;
}

int inst_neq(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value != rhs.value));
  return 0;
}

int inst_sub(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  stack_push(dat(lhs.value - rhs.value));
  return 0;
}

int inst_div(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value

  if (rhs.value == 0) {
    exec_error("Division by zero");
  } else {
    stack_push(dat(lhs.value / rhs.value));
  }
  return 0;
}

int inst_mul(void) {
  Datum rhs = stack_pop();  // must be value
  Datum lhs = stack_pop();  // must be value
  stack_push(dat(lhs.value * rhs.value));
  return 0;
}

int inst_subrexec(void) {
  Symbol *sym = (pc++)->symbol;
  double  arg_count = (pc++)->literal;

  Frame *f = frame_push(sym, pc);
  f->arg_count = (int)arg_count;
  f->argp = stackp - f->arg_count;

  execute(sym->data.addr);
  is_returning = false;
  return 0;
}

int inst_argget(void) {
  int    arg_number = (int)(pc++)->literal;
  Frame *current_frame = frame_peek();
  if (current_frame->arg_count < arg_number) {
    exec_error("Subroutine '%s' uses parameter %d, but only got %d.",
               current_frame->subr_called->name,
               arg_number,
               current_frame->arg_count);
  }
  stack_push(current_frame->argp[arg_number - 1]);
  return 0;
}

int inst_argset(void) {
  int    arg_number = (int)(pc++)->literal;
  Frame *current_frame = frame_peek();
  if (current_frame->arg_count < arg_number) {
    exec_error("Subroutine '%s' uses parameter %d, but only got %d.",
               current_frame->subr_called->name,
               arg_number,
               current_frame->arg_count);
  }
  Datum d = stack_pop();
  current_frame->argp[arg_number - 1] = d;
  stack_push(d);
  return 0;
}

int inst_uprocret(void) {
  Frame *f = frame_pop();
  if (f->subr_called->type != UPROC) {
    exec_error("Function '%s' does not return a value", f->subr_called->name);
  }

  return_from_frame(f);
  return 0;
}

int inst_ufuncret(void) {
  Frame *f = frame_pop();
  Datum  result = stack_pop();
  return_from_frame(f);
  stack_push(result);
  return 0;
}

int inst_varread(void) {
  Symbol *dest = (pc++)->symbol;
  bool    success = read_double(&dest->data.val);
  stack_push(dat(success));
  if (!success) {
    dest->data.val = 0;
  }
  dest->type = VAR;
  return 0;
}

///----------------------------------------------------------------
/// Commands
///----------------------------------------------------------------

void machine_run_command(Symbol *command_symbol) {
  command_symbol->data.command_func();
}

#define check_inst(inst_name)                            \
  else if (c.inst == inst_##inst_name) do {              \
    printf(inst_template, line, cp, "INST", #inst_name); \
  }                                                      \
  while (false)

void command_list(void) {
// Lists contents of prog until we hit two stop instructions in a row
#define header_template "%6s | %11s | %10s | %-30s\n"
#define inst_template "%6ld | %11p | %10s | %-30s"
#define rightline_template "%6ld | %11p | %10s | %30s"
#define sym_addr_template "%6ld | %11p | %10s | %-7.7s: %11p (Line %3ld)"
#define lit_template "%6ld | %11p | %10s | %30f"
#define str_template "%6ld | %11p | %10s | %30.30s"
#define addr_template "%6ld | %11p | %10s | %11p         (Line %3ld)"

  printf(header_template, "LINE", "ADDRESS", "TYPE", "CODE");
  printf("------------------------------------------------------------------"
         "\n");
  for (size_t i = 0; prog + i <= highest_addr_written; i++) {
    size_t      line = i + 1;
    MachineCode c = prog[i];
    void       *cp = &prog[i];
    if (prog + i == prog_start && prog != prog_start) {
      printf("---------------------------->"
             " END SUBR "
             "<----------------------------\n");
    }
    switch (c.type) {
      case (CT_LITERAL):
        printf(lit_template, line, cp, "L", c.literal);
        break;
      case (CT_SYM_REF):
        switch (c.symbol->type) {
          case UFUNC:
          case UPROC: {
            size_t addr_line = (c.symbol->data.addr - prog) + 1;
            printf(sym_addr_template,
                   line,
                   cp,
                   "SUBR",
                   c.symbol->name,
                   (void *)c.symbol->data.addr,
                   addr_line);
            break;
          }
          default:
            printf(rightline_template, line, cp, "SYM", c.symbol->name);
        }
        break;
      case (CT_ADDR): {
        size_t addr_line = (c.addr - prog) + 1;
        printf(addr_template, line, cp, "ADDR", (void *)c.addr, addr_line);
        break;
      }
      case (CT_CHRSTR): {
        printf(str_template, line, cp, "CHAR", str_view(c.string));
        break;
      }
      case (CT_INST):
        if (is_stop_inst(&c)) {
          printf(inst_template,
                 line,
                 cp,
                 "INST",
                 "------------ STOP ------------");
        }
        check_inst(add);
        check_inst(and);
        check_inst(argget);
        check_inst(argset);
        check_inst(assignvar);
        check_inst(assignconst);
        check_inst(call);
        check_inst(div);
        check_inst(eq);
        check_inst(eval);
        check_inst(gt);
        check_inst(gte);
        check_inst(ifcode);
        check_inst(lt);
        check_inst(lte);
        check_inst(mul);
        check_inst(neq);
        check_inst(not );
        check_inst(or);
        check_inst(pop);
        check_inst(prexpr);
        check_inst(prstr);
        check_inst(print);
        check_inst(pushlast);
        check_inst(pushlit);
        check_inst(pushref);
        check_inst(sub);
        check_inst(subrexec);
        check_inst(ufuncret);
        check_inst(uprocret);
        check_inst(varread);
        check_inst(whilecode);
        else {
          printf(inst_template, line, cp, "INST", "UNKNOWN!?");
        }
        break;
      default:
        printf("Unknown Code Type: %c\n", c.type);
    }
    printf("\n");
  }
}
math.c

#include "hoc.h"
#include <errno.h>
#include <fenv.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>  // rand

#define HOC_RAND_MAX 1000000L

static double check_math_err(double result, const char *op);

double Abs(double x) {
  return check_math_err(fabs(x), "fabs");
}

double Acos(double x) {
  return check_math_err(acos(x), "acos");
}

double Atan(double x) {
  return check_math_err(atan(x), "atan");
}

double Atan2(double x, double y) {
  return check_math_err(atan2(x, y), "atan2");
}

double Cos(double x) {
  return check_math_err(cos(x), "cos");
}

double Exp(double x) {
  return check_math_err(exp(x), "exp");
}

double Integer(double x) {
  return (double)(long)x;
}

double Lg(double x) {
  return check_math_err(log2(x), "lg");
}

double Ln(double x) {
  return check_math_err(log(x), "log");
}

double Log10(double x) {
  return check_math_err(log10(x), "log10");
}

double Pow(double x, double y) {
  return check_math_err(pow(x, y), "exponentiation");
}

double Random(void) {
  // We have HOC_RAND_MAX buckets
  const int bucket_size = RAND_MAX / HOC_RAND_MAX;
  int       result;
  do {
    result = rand() / bucket_size;
  } while (result >= HOC_RAND_MAX);

  return result + 1;
}

double Sin(double x) {
  return check_math_err(sin(x), "sin");
}

double Sqrt(double x) {
  return check_math_err(sqrt(x), "sqrt");
}

static double check_math_err(double result, const char *op) {
  static const char *domain_msg = "argument outside domain";
  static const char *range_msg = "result outside range";
  static const char *other_msg = "floating-point exception occurred";

  const char *msg = NULL;
  if ((math_errhandling & MATH_ERREXCEPT) && fetestexcept(FE_ALL_EXCEPT)) {
    // Special case: Inexact results are not errors.
    if (fetestexcept(FE_INEXACT)) {
      goto done;
    }

    if (fetestexcept(FE_INVALID)) {
      msg = domain_msg;
    } else if (fetestexcept(FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW)) {
      msg = range_msg;
    } else {  // unknown
      msg = other_msg;
    }

    feclearexcept(FE_ALL_EXCEPT);
  } else if (errno) {
    if (errno == EDOM) {
      msg = domain_msg;
    } else if (errno == ERANGE) {
      msg = range_msg;
    } else {
      msg = other_msg;
    }
    errno = 0;
  }

  if (msg != NULL) {
    exec_error("math error during %s: %s", op, msg);
  }

done:
  return result;
}
str.c

#include "hoc.h"

#include <stdbool.h>
#include <stddef.h>
#include <stdlib.h>

/*
 * Invariants (Given instance `s`):
 * 1. s.len <= s.capacity - 1;
 * 2. sizeof(s.buf) == s.capacity
 *
 * These invariants exist so that anytime we want a view of the string
 * for use with C functions, there is capacity within `s.buf` to
 * append a NUL character.
 *
 * Any function that mutates the string should ensure these invariants.
 */
struct str {
  char  *buf;
  size_t len;
  size_t capacity;
};

#define INIT_CAP 20 /** Initial capacity of a string */

str *str_new(void) {
  str *s = malloc(sizeof(str));
  if (s == NULL) {
    exec_error("Out of memory!");
  }
  s->capacity = INIT_CAP;
  s->buf = malloc(s->capacity);  // Ensures invariant 1
  if (s->buf == NULL) {
    free(s);
    exec_error("Out of memory!");
  }
  s->len = 0;  // Ensure invariant 2
  return s;
}

bool str_append(str *s, char c) {
  if (s->len == s->capacity - 1) {
    /*
     * If we'd break our invariant by adding a character,
     * grow our buffer capacity and update fields to match
     */
    char *tmp = realloc(s->buf, s->capacity * 2);
    if (tmp == NULL) {
      return false;
    }
    s->buf = tmp;
    s->capacity = s->capacity * 2;
  }
  s->buf[s->len++] = c;
  return true;
}

const char *str_view(str *s) {
  s->buf[s->len] = '\0';
  return s->buf;
}

void str_clear(str *s) {
  s->len = 0;
}

void str_free(str *s) {
  free(s->buf);
  free(s);
}
symbol.c

#include "hoc.h"
#include "hoc.tab.h"  // generated from yacc -d on our grammar
#include <stddef.h>   // NULL
#include <stdlib.h>   // malloc
#include <string.h>   // strcmp

static Symbol *symbol_table = NULL;

void *emalloc(size_t nbytes);

Symbol *lookup(const char *name) {
  Symbol *current = symbol_table;
  while (current != NULL) {
    if (strcmp(current->name, name) == 0) {
      return current;
    }
    current = current->next;
  }

  return NULL;
}

Symbol *install_symbol(const char *name, short type, double value) {
  size_t  name_len = strlen(name);
  Symbol *s = emalloc(sizeof(Symbol)                       // actual symbol
                      + (sizeof(char)) * (name_len + 1));  // room for name

  s->name = (char *)(s + 1);
  strcpy(s->name, name);
  s->name[name_len] = '\0';

  s->type = type;
  s->data.val = value;
  s->next = symbol_table;
  symbol_table = s;
  return s;
}

void *emalloc(size_t nbytes) {
  void *ptr = malloc(nbytes);
  if (ptr == NULL) {
    exec_error("Out of memory!");
  }
  return ptr;
}