hoc: A programming language in 6 steps
Hello! This page is a tutorial on building your own programming language and interpreter in C.
The language, hoc
, is largely based on the
language of the same name developed in the 1984 classic The
UNIX Programming Environment (UPE), by Kernighan and Pike.
As done in UPE, we’ll actually build at least
six working versions of the langauge—each will
build on the previous version, and they’ll get successively more
interesting and useful as we go. You only need a basic
understanding of the C language to follow along.
If that’s all you need to hear; great! Make sure you have a C
compiler, make
, and the bison
package
installed, then go on to hoc1
! If you want to
know more, then read on..
What’s hoc
?
In Chapter 8 of their 1984 classic The UNIX Programming
Environment, Kernighan and Pike developed a tiny
programming language called hoc
, short for
higher order
calculator. The chapter is a great example of
iterative development in C/UNIX, as well as a fun exercise in C
development.
hoc
in a nutshell
While its design goal was to be a relatively simple
calculator, hoc
contains enough language constructs
to make it both useful and at least as much fun as
programming shell scripts. It has functions, loops, conditional
logic, and a tiny math library. It uses floating-point numbers
as its core data type, as is expected of a calculator. For
example, here’s a small program written in hoc6
,
the last version of the language we’ll build:
// Return star count for given row ($1); 1 is top
() {
func numStarsInRowreturn $1 * 2 - 1
}
// prints ($1) spaces to screen
() {
proc printSpaceswhile ($1 > 0) {
" "
print 1 = $1 - 1
$}
}
// print triangle of height ($1)
() {
proc drawTriangle= 1
currRow while (currRow <= $1) {
= $1 - currRow
numSpaces
(numSpaces)
printSpaces= 1
star while (star <= numStarsInRow(currRow)) {
"*"; star = star + 1
print }
(numSpaces)
printSpaces
= currRow + 1
currRow "\n"
print }
}
(5) drawTriangle
Running this program produces a triangle:
> ./hoc < drawTriangle.hoc
*
***
*****
*******
*********
The stages of hoc
Each stage of hoc grows incrementally; however—as in all good
programs—there is one major refactoring step, which occurs
between hoc3
and hoc4
. At that point,
we’ll transition from “direct evaluation” of the language to a
“virtual machine” model of execution.
hoc1
:
The first verison of hoc
introduces the core
expressions of the language, and serves as an
introduction to building language parsers with yacc (or really,
its modern equivalent bison). Once it’s
complete, you’ll have all that knowledge – plus a really bad
calculator!
hoc2
:
We’ll add single-letter variables, some symbolic calculation,
and extend the expression syntax slightly.
hoc3
:
Our final “direct-evaluation” version of the language adds
variables with more than 1 letter (“Ah,
luxury”), and adds support for calling
“builtin” functions for the language. This version adds most of
our tiny math library.
hoc4
:
This is where the real fun begins; we rewrite the evaluation
portion of hoc
to run inside a little virtual
machine.
hoc5
:
Adds conditional and looping constructs, as well as allowing us
to print (numbers) to stdout.
hoc6
: A
real whopper– Adds user-defined functions / procedures, user
input support, string literals, and comments.
Differences From the
Original hoc
The book is fantastic, but its source code contains a few “pre-ANSI C” idioms that are fairly-seriously deprecated or outright disallowed in C23. The goal of my adaptation was compilation without warnings using a modern C standard. Along the way, there were a few changes I wanted to make for clarity, leading to the version in front of you. The code still follows the book’s progression, with a few additions that I’ll try to note along the way.
In my opinion, the book is still a fantastic resource to
follow the authors’ thought processes and development approach,
and it contains a lot of advice around iterative development and
building software. This guide takes a more leisurely approach to
the topic, going into more depth in some areas (e.g. LALR
parsing, invariants for code generation), and completely
ignoring others (Makefiles and lex
, for
example.)
This is the explanation that I wanted as I was
building hoc
for the first time, and I hope that
some of you find the additional perspective useful as well!
Assumed Prerequisites
Not many; at least a small amount of familiarity with C; if
you have written a struct and know how
printf
/scanf
works, you should be
fine. Everything else should be covered as we go.
How to follow along:
If you want to type-it-yourself as you follow along, then you can simply start reading at hoc1; each stage has its complete source (and associated Makefile) for you to use directly from the page.
You can clone the
hoc
repository and browse the complete source
as you read the sections; the top-level Makefile
can run any version of hoc for you; simply use one of the
targets make run-hoc1
through make run-hoc6
. For
example:
> make run-hoc4
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C hoc4
make[1]: Nothing to be done for `all'.
=== Now Running hoc4 ===
a = 12
b = 2
b / a
0.16666667
Contact Me / Report an Issue
I’d love to hear about any issues you find as you work through the project–just drop me a line at rick@oddlyaccurate.com.