oddly accurate

hoc: A programming language in 6 steps

Hello! This page is a tutorial on building your own programming language and interpreter in C.

The language, hoc, is largely based on the language of the same name developed in the 1984 classic The UNIX Programming Environment (UPE), by Kernighan and Pike. As done in UPE, we’ll actually build at least six working versions of the langauge—each will build on the previous version, and they’ll get successively more interesting and useful as we go. You only need a basic understanding of the C language to follow along.

If that’s all you need to hear; great! Make sure you have a C compiler, make, and the bison package installed, then go on to hoc1! If you want to know more, then read on..

What’s hoc?

In Chapter 8 of their 1984 classic The UNIX Programming Environment, Kernighan and Pike developed a tiny programming language called hoc, short for higher order calculator. The chapter is a great example of iterative development in C/UNIX, as well as a fun exercise in C development.

hoc in a nutshell

While its design goal was to be a relatively simple calculator, hoc contains enough language constructs to make it both useful and at least as much fun as programming shell scripts. It has functions, loops, conditional logic, and a tiny math library. It uses floating-point numbers as its core data type, as is expected of a calculator. For example, here’s a small program written in hoc6, the last version of the language we’ll build:

// Return star count for given row ($1); 1 is top
func numStarsInRow() {
    return $1 * 2 - 1
}

// prints ($1) spaces to screen
proc printSpaces() {
    while ($1 > 0) {
      print " "
      $1 = $1 - 1
    }
}

// print triangle of height ($1)
proc drawTriangle() {
    currRow = 1
    while (currRow <= $1) {
      numSpaces = $1 - currRow

      printSpaces(numSpaces)
      star = 1
      while (star <= numStarsInRow(currRow)) {
        print "*"; star = star + 1
      }
      printSpaces(numSpaces)

      currRow = currRow + 1
      print "\n"
    }
}

drawTriangle(5)

Running this program produces a triangle:

> ./hoc < drawTriangle.hoc
    *
   ***
  *****
 *******
*********

The stages of hoc

Each stage of hoc grows incrementally; however—as in all good programs—there is one major refactoring step, which occurs between hoc3 and hoc4. At that point, we’ll transition from “direct evaluation” of the language to a “virtual machine” model of execution.

hoc1: The first verison of hoc introduces the core expressions of the language, and serves as an introduction to building language parsers with yacc (or really, its modern equivalent bison). Once it’s complete, you’ll have all that knowledge – plus a really bad calculator!

hoc2: We’ll add single-letter variables, some symbolic calculation, and extend the expression syntax slightly.

hoc3: Our final “direct-evaluation” version of the language adds variables with more than 1 letter (“Ah, luxury”), and adds support for calling “builtin” functions for the language. This version adds most of our tiny math library.

hoc4: This is where the real fun begins; we rewrite the evaluation portion of hoc to run inside a little virtual machine.

hoc5: Adds conditional and looping constructs, as well as allowing us to print (numbers) to stdout.

hoc6: A real whopper– Adds user-defined functions / procedures, user input support, string literals, and comments.

Differences From the Original hoc

The book is fantastic, but its source code contains a few “pre-ANSI C” idioms that are fairly-seriously deprecated or outright disallowed in C23. The goal of my adaptation was compilation without warnings using a modern C standard. Along the way, there were a few changes I wanted to make for clarity, leading to the version in front of you. The code still follows the book’s progression, with a few additions that I’ll try to note along the way.

In my opinion, the book is still a fantastic resource to follow the authors’ thought processes and development approach, and it contains a lot of advice around iterative development and building software. This guide takes a more leisurely approach to the topic, going into more depth in some areas (e.g. LALR parsing, invariants for code generation), and completely ignoring others (Makefiles and lex, for example.)

This is the explanation that I wanted as I was building hoc for the first time, and I hope that some of you find the additional perspective useful as well!

Assumed Prerequisites

Not many; at least a small amount of familiarity with C; if you have written a struct and know how printf/scanf works, you should be fine. Everything else should be covered as we go.

How to follow along:

If you want to type-it-yourself as you follow along, then you can simply start reading at hoc1; each stage has its complete source (and associated Makefile) for you to use directly from the page.

You can clone the hoc repository and browse the complete source as you read the sections; the top-level Makefile can run any version of hoc for you; simply use one of the targets make run-hoc1 through make run-hoc6. For example:

> make run-hoc4
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C hoc4
make[1]: Nothing to be done for `all'.

=== Now Running hoc4 ===
a = 12
b = 2
b / a
    0.16666667

Contact Me / Report an Issue

I’d love to hear about any issues you find as you work through the project–just drop me a line at rick@oddlyaccurate.com.