hoc4
has exactly the same grammar as
hoc3
, but its execution proceeds in an entirely different
way. From hoc1
–hoc3
, our interpreter evaluated
expressions immediately, and when an expression terminator was reached,
we had a double
-valued answer. hoc4
, on the
other hand, separates the process of parsing hoc statements
from the process of executing them.
Instead, hoc4 includes a small “virtual machine” module, which understands very simple instructions, and operates on a stack of data (much like a stack-based calculator).
The parser’s new job will be to convert hoc expressions into machine
instructions and install them into the virtual machine. Once an
expression is completely parsed, the list
production will
install some final instructions to print the result and/or stop the
machine. At this point, we have a tiny “program” now installed into our
machine.
After the program has been installed, we ask the machine to
execute it. It evaluates each instruction, in order,
using its stack to keep track of intermediate values (much like we used
the parsing stack for this purpose in
hoc1
–hoc3
.
hoc
VM(Note: While the symbol table is still separate from the machine, the machine uses it, and so we include it in our diagrams.)
The hoc4
virtual machine has two separate storage areas.
One is used to store instructions, called the “program storage” (or
program). Each intermediate value the program generates is stored in a
stack until it’s used (by a subsequent
instruction).
Using the machine requires two “stages” - we first install instructions, then execute them.
During the installation process, the machine merely appends
instructions to its program storage, without any interpretation or
verification. When we call execute()
, the machine sets its
program counter (pc
) to the beginning of
program storage, executes whatever instruction is stored there, and then
moves its pc
forward. Each instruction may push data onto
the stack or pop data from it. As each instruction executes, the
pc
will move forward by at least one, though,
because some instructions consume more than one “word” of program space,
it’s more accurate to say that the pc
will be placed at the
beginning of the next instruction. Let’s look at two examples
to ground the discussion.
As a first example, consider a hoc program to print the number
42
. Since expressions are always printed in
hoc4
, we can just write:
42
(Assuming we have a newline at the end) The same program, written for our virtual machine, looks like:
pushlit
42
print
stop
As it executes each instruction, we can examine how the state of the machine changes.
pushlit
42
42
onto the stack. (Notice PC
changed by 2.)
print
stop
The pushlit
instruction is an example of an instruction
that takes up more than one “word” of the memory in prog
.
Every pushlit
instruction must be followed by a literal
number stored in memory, and executing the instruction will examine that
number, then move the PC to the next word of memory.
The program
instructions may also include
Symbol
s as part of their variables. Since our language
comes with some builtin Symbols already, we’ll use those to illustrate.
Consider this one-line program to compute & display the cosine of
pi:
cos(PI)
This program corresponds to the instructions below; notice that we
first push arguments onto the stack, then the
BUILTIN
Symbol for cos
. This is handy, as
that’s also our parsing order!
pushref
PI
eval
pushref
cos
call
print
stop
While we don’t show it here, we know that the symbol table already
has values for both PI
and cos
.
pushref
PI
pushlit
, the pushref
instruction pushes a
value onto the stack. In this case, the value is a Symbol
.
The stack can hold both types of values.eval
Symbol
to the cos
function; it operates on double
values. Before a variable’s
value can be used, it should be eval
uated. This pops the
symbol from the stack, looks up its value in the symbol
table, and then pushes that value back onto the stack.pushref
cos
cos
symbol onto the stack, so that we can use the call
instruction.
call
call
instruction will first pop 1 value from the stack.
It must always be a BUILTIN
. It checks the number
of arguments the builtin uses, and pops that number of
values from the stack. It then calls the
Symbol
's associated function with those
arguments. The result of calling the BUILTIN
is pushed onto the stack.
print
-1
from stack and prints it to the screen
stop
hoc
VM Instruction
SetWe’ve seen some of the instructions the hoc4
VM
understands; here are the rest.
We have three instructions for adding values to our stack of data:
pushlit
double
value, which is pushed onto our machine’s stack.
pushref
Symbol
, which is pushed
onto our machine’s stack
pushlast
@
expression)
We have two instructions that remove values from the
stack; we use pop
to discard the
topmost element, or print
to
display it (after removal).
The core arithmetic instructions, implemented in our parser during
hoc3
, are now machine instructions in hoc4: add
sub
, mul
, and div
.
Every instruction requires that there are at least two values on the stack. The two values will be popped from the stack, the operationi s performed, and the result is placed back onto the stack.
eval
assignvar
double
and a VAR
-type
Symbol
from the stack, and assigns the double’s value to
the symbol.
assignconst
assignvar
, but marks the
Symbol
as const
.
call
Symbol
at the top of the
stack, pops the number of arguments required to call the
function, and pushes the result of evaluating it.
Before we change anything about hoc
’s language
operation, we can build (and presumably test) the machine
module independently.
The earlier diagram tells us the majority of the state we’ll need.
Our virtual machine’s current program is simply an
array of MachineCode
objects. Conceptually, we can think of
MachineCode
objects as “words” of our machine’s memory. To
ease development and debugging, though, each word is
typed. A word may be any one of:
Symbol
references.Our MachineCode
objects contain the type of the
word, chosen from the CodeType
enumeration, and an
associated value; an anonymous union suffices to hold a value
of whichever type is appropriate. We’ll declare these types inside
hoc.h
.
typedef int (*Inst)(void);
typedef enum {
= 'I',
CT_INST = 'L',
CT_LITERAL = 'S'
CT_SYM_REF } CodeType;
typedef struct {
;
CodeType typeunion {
double literal; // type == CT_LITERAL
; // type == CT_INST
Inst inst*symbol; // type == CT_SYM_REF
Symbol };
} MachineCode;
Finally, we’re ready to begin machine.c; we’ll add a static
array prog
we can use to store the MachineCode that’s
installed.
#define PROGSIZE 2000 // Static program limit (for now)
static MachineCode prog[PROGSIZE];
We’ve already mentioned the pc
, which tracks the next
word to execute. In addition, we’ll need some state for tracking the
size of the currently-installed program instructions, so that we’ll know
where to add the next instruction.
static MachineCode *progp; // Next free location for code generation
static MachineCode *pc; // the current location in the executing program
The stack of data we build up in our machine is made up of
Datum
objects; these are much like
MachineCode
, with two differences:
union
.Inst
type, because we do not
allow hoc
programs to execute code from the stack.typedef union {
double value;
*symbol;
Symbol } Datum;
We use another fixed-size array for the stack as well; notice we
include a piece of state to “remember” the most-recently-popped value
(for use with the @
expression)
#define STACKSIZE 256
static Datum stack[STACKSIZE]; // the machine stack
static Datum *stackp; // next free spot on the stack
static Datum last_popped = {.value = 0}; // previous top-of-stack value
Our machine will have a single entrypoint for installing code into its program:
/**
* Installs the given `MachineCode` object into the next available
* location in the machine's current program. A runtime error will
* occur if the program is full
*/
void install_code(MachineCode mc);
The machine.c
implementation is straightforward.
void install_code(MachineCode mc) {
if (progp >= prog + PROGSIZE) {
("Max program size exceeded");
exec_error}
*(progp++) = mc;
}
We’ll need functions to manipulate the stack to simplify the individual instruction implementations.
/** Push new value onto stack */
static void stack_push(Datum d) {
if (stackp >= stack + STACKSIZE) {
("Stack overflow");
exec_error}
*(stackp++) = d; // copy
}
static Datum stack_pop(void) {
if (stackp <= stack) {
("Stack underflow");
exec_error}
= *(--stackp);
last_popped return last_popped;
}
Before we can implement execution, or even our machine instructions,
we need to determine a contract for how the machine’s state will be
updated as we move through the program. Updating the program counter is
not completely straightforward, since each instruction may be encoded in
a different number of “words” of prog.
void execute(MachineCode *addr);
We declare the following invariants:
pc
to some
element of the prog
array; probably, the first.pc
value
will be incremented so that it points to the next
element.pc
value has been incremented by however
many additional words the instruction uses. That is,f
the instruction implementations must guarantee that pc
points to a MachineCode element with
type == CT_INST
.This set of invariants gives us the following properties:
pc
at all.pc
will be
pointing to the word following the instruction
(e.g. our first argument).execute()
updates the PC before
calling our instruction, if we have a fancy instruction that needs to
override the PC
value, it can do so.This leads us to the following implementation:
void execute(MachineCode *addr) {
// this argument isn't used in hoc4
= (addr == NULL) ? prog : addr;
pc
while (!is_stop_inst(pc)) {
if (pc->type != CT_INST) {
("Programming error, type '%c' was not INST!", pc->type);
exec_error}
(*(*pc++).inst)(); // execute inst and increment PC
}
}
Our STOP
instruction &
is_stop_inst()
are declared and defined in the
hoc.h
header:
#define STOP_INST NULL
#define is_stop_inst(mc) ((mc)->type == CT_INST && (mc)->inst == STOP_INST)
Each instruction is implemented as a int
-valued function
with no arguments, since all of the input we need is contained in our
static variables.
We’ll declare them in hoc.h
, since they’ll be installed
by our parser.
int inst_add(void);
int inst_assignconst(void);
int inst_assignvar(void);
int inst_call(void);
int inst_div(void);
int inst_eval(void);
int inst_mul(void);
int inst_pop(void);
int inst_print(void);
int inst_pushlast(void);
int inst_pushlit(void);
int inst_pushref(void);
int inst_sub(void);
Each of our instructions will follow the same basic structure:
int my_instruction(void) {
// fetch data from stack
// perform operation
// update PC if necessary
// return 0
}
Two push instructions pushlit
, pushref
are
always two words long.
pushlit
42
We must be sure to increment pc
again before
returning.
int inst_pushlit(void) {
(pc->type == CT_LITERAL); // should always be true
assert
= (Datum){.value = pc->literal};
Datum d (d);
stack_push
++; // this instruction is two "words" long
pcreturn 0;
}
The pushlast
instruction doesn’t require an argument,
because it’s implemented by using the most-recently-popped value from
the stack.
int inst_pushlast(void) {
(last_popped);
stack_pushreturn 0;
}
The pop instructions (pop
, print
) are
completely straightforward; the print instruction is replicated from our
formatted print in the list
production of
hoc3
’s parser.
int inst_pop(void) {
();
stack_popreturn 0;
}
int inst_print(void) {
= stack_pop();
Datum d ("\t%.8g\n", d.value);
printfreturn 0;
}
Our arithmetic instructions (add
, sub
,
etc.) are all one word long. They each have two arguments to be
taken from the stack. The only detail to keep in mind is reversing the
argument order, since they were pushed left-to-right. For
example, the stack for 12 / 3
is constructed as
follows:
pushlit
12
pushlit
3
div
This leads to a straightforward implementation.
int inst_div(void) {
= stack_pop();
Datum rhs = stack_pop();
Datum lhs
if (rhs.value == 0) {
("Division by zero");
exec_error} else {
((Datum){.value = lhs.value / rhs.value});
stack_push}
return 0;
}
Our machine works with Symbols
directly, rather than
performing any lookup()
s itself; actually
installing symbols into the table will still be the
responsibility of the lexer (though this isn’t absolutely required).
The assignvar
instruction, for example, is a single-word
instruction that requires a symbol to be on the top of
the stack, followed by its to-be-assigned value. Notice that these
expected stack items are in the opposite order from the
div
instruction. This is due to
the order in which we parse the expressions, as we’ll see in a later section.
As an example, the program:
pushlit
42
pushref
x
assign
Will result in the following state during execution:
pushlit
42
pushref
x
assignvar
The assignvar
instruction merely updates the
Symbol
’s value, just like the assignment action did in
hoc3
.
int inst_assignvar(void) {
= stack_pop(); // must be symbol
Datum lhs = stack_pop(); // must be value
Datum rhs if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
("Cannot assign to symbol '%s'", lhs.symbol->name);
exec_error}
.symbol->data.val = rhs.value;
lhs.symbol->type = VAR;
lhs((Datum){.value = rhs.value});
stack_pushreturn 0;
}
The rest of the instructions are similar.
In hoc1
–hoc3
, our parser actions focused on
producing values, which allowed us to build up our computed
results. We used the list
production at the top of our grammar to display those results to the
user.
Now, we want our parser actions to build small programs for
our machine to run. The list
production, whenever it reaches the end of an expression, will produce
two final instructions: print
and stop
. It
also returns to our main()
function, which is then
responsible for calling the machine module’s execute()
function. The program will run, display its result(s) to the user, and
we can return to the parser to wait for more input.
This means that the main changes in hoc.y
will be to our
grammar actions; rather than generating values,
e.g.
: NUMBER
expr{
= $1; // return literal number
$$ }
We will generate VM instructions that do the same thing.
: NUMBER
expr{
// Push literal number onto stack
(inst_pushlit);
install_instruction($1);
install_literal}
To understand the implementation changes, we’ll explore example programs to see how they’re generated:
Given the hoc input:
42
We’ve seen in our machine discussion that our parser will need to install a machine program such as:
pushlit
42
print
stop
NUMBER
, Reduce to expr
Our lexer, which does not change from hoc3
, will
recognize the 42
as a NUMBER
token. This
matches our expr: NUMBER
rule, which now installs
instructions for the first half of our program:
: NUMBER
expr{
(inst_pushlit);
install_instruction($1);
install_literal}
list
actionSince this is the end of our input, our expression is reduced to a
list
; the list action always finishes our
program.
: ...
list
list expr terminator{
(inst_print);
install_instruction(STOP_INST);
install_instruction;
YYACCEPT}
STOP
instruction
after print
ing?
It may seem odd that we can add a print
instruction at
the end of every generated program; how do we know that there
will be a value on the stack to print?
But recall that expr
makes guarantees; in
hoc3
, we had a similar “top-level” list action:
: ...
list| list expr terminator
{
("\t%.8g\n", remember($2));
printf}
Here, we knew we could print, because every expr
was
required to produce a double
, and so the list
action could always be sure that $2
had a reasonable
value.
In hoc4, the list production installs a “print” instruction, which
pops a value from the stack. It can do this without
fear of underflow, because we have a similar guarantee: In
hoc4
, we write our actions so that the instructions
installed by an expression action will always
place at least one double
onto the stack. This is important
enough to repeat:
hoc4
, any parser action for an expr
nonterminal must install instructions that,
when eventually executed, will place (at least) one
value onto the machine stack.
Unfortunately, unlike our %type
declarations,
yacc
cannot check this invariant for us; we’ll need to be
careful to ensure that all of our parser actions follow this rule. As
part of this change, our expressions no longer need to produce values
for yacc
at all.
%token <hoc_value> NUMBER
%token <hoc_symbol> VAR CONST BUILTIN UNDEF-%type <hoc_value> expr assign call
%type <hoc_symbol> assignable
Evaluating variables requires a very similar set of instructions.
: ...
expr| VAR
{
(inst_pushref);
install_instruction($1);
install_ref(inst_eval);
install_instruction}
The eval
instruction at the end of the program satisfies
our guarantee: after evaluating a single-variable expression, our stack
always has a double
value on top. Variables are not always
immediately evaluated, when used as part of a larger expression, as
we’ll see in the next example.
Consider the statement:
answer = 42
hoc3
’s
OperationIn hoc3
, our parser would first resolve the right side
of the assignment, as =
is right-associative. This is fine,
because tokenizing the left-hand-side (which must
happen before the right) installs answer
into the symbol
table. The symbol itself is returned. The right-hand side is a number,
which is parsed as an expression as:
{ $$ = $1; } NUMBER
Since numbers are expressions, we have matched the
assign
rule, which worked as follows:
Parsing assignments: hoc3
Version
: assignable '=' expr
assign{
if ($1->type == CONST) {
("Cannot reassign constant '%s'", $1->name);
exec_error} else {
1->type = VAR;
$= ($1->data.val = $3);
$$ }
}
We have access to both the assignable symbol $1
and the
“right-hand side” value, $3
. So, the assign
rule can simply modify the value of our symbol.
In hoc4
, our actions have the same responsibilities.
Again, the assignable
rule has no behavior and merely
returns the Symbol
. But the expr
rule must
ensure that a value is on the stack for assignment:
: NUMBER
expr{
(inst_pushlit);
install_instruction($1);
install_literal}
When we see a number used as a standalone expression, we install instructions for pushing it onto the stack. Our program, at this point, contains:
pushlit
42
We then reach the assign
rule, which must, at some
point, install an assignvar
instruction. Recall that
assignvar
requires that the top of our VM stack looks like
(<Symbol, value-to-assign,...>
). Luckily, we’ve
promised to guarantee expr
actions always leave a double
value on the top of the stack. The assign
rule can takes
advantage of that to install:
pushref
, to add the symbol x
above the double value on the stack.assignvar
instruction, which will actually set the
symbol’s value, and place it onto the stack.: assignable '=' expr
assign{
(inst_pushref);
install_instruction($1);
install_ref(inst_assignvar);
install_instruction}
Our guarantee lets us proceed with confidence, and our program now looks like:
pushlit
42
pushref
x
assignvar
Finally, we match the list
rule, which has a special
case for assign
. As in hoc3
, we don’t want to
print the results of assignments. So, we install a
pop
instruction instead, which discards the value.
: list assign terminator
list{
(inst_pop);
install_instruction(STOP_INST);
install_instruction;
YYACCEPT}
We’ve generated the final version of our program:
pushlit42
pushref
x
assign
pop STOP
And so this tiny program updates the value of the x
Symbol, and does nothing else, just like the assignment in
hoc3
.
We combine all the ideas for our final one-liner:
x = 2 * PI
Our parser encounters the following tree:
Let’s compare the parser actions from hoc3
to
hoc4
.
2(NUMBER)
expr: NUMBER
{
$$ = $1;
}
2
PI
(CONST
)expr: CONST
{
$$ = $1->data.val;
}
3.14159
*
expr: expr * expr
{
$$ = $1 * $3;
}
6.28318
x =
assign: assignable = expr
{
...
$$ = ($1->data.val = $3);
}
6.28318
list: list assign terminator
2 (NUMBER)
expr: NUMBER
{
install_instruction(inst_pushlit);
install_literal($1);
}
pushlit
2
PI(CONST)
expr: CONST
{
install_instruction(inst_pushref);
install_ref($1);
install_instruction(inst_eval)
}
pushref
Symbol(PI)
eval
*
expr: expr * expr
{
install_instruction(inst_mul)
}
mul
x =
assign: assignable = expr
{
install_instruction(inst_pushref);
install_ref($1);
install_instruction(inst_assignvar)
}
pushref
Symbol(x)
assignvar
list: list assign terminator
{
install_instruction(inst_pop);
install_instruction(STOP_INST)
}
pop
STOP
The rest of the action changes are straightforward variations of the above.
In our previous parsers, our grammar actions contained all of the
business logic of our program. We never returned from
yyparse()
, because the parser ran until the user sent
EOF
, or the process was terminated.
But our new execution model is slightly different; we want to use
yyparse
to parse individual expressions, but then
we want to execute them ourselves with the machine interface. It
wouldn’t be very helpful to only run the programs when all input has
been received.
One problem with this idea is that yyparse()
expects to
run until the entire input stream is parsed. There are several solutions
to this, but the simplest is to use a macro, YYACCEPT
,
directly in the actions that “complete” an expression. This macro
indicates to yacc
that, though more input remains, we have
finished a parse successfully, and it may return to its caller
immediately. This is preferable to returning from an action directly, as
that can leave the parser in an undefined state.
We use this macro in our list
production, anytime we
find an input that we can execute()
.
: /* nothing */
list| list terminator
| list error terminator { yyerrok; }
| list assign terminator
{
(inst_pop);
install_instruction(STOP_INST);
install_instruction;
YYACCEPT}
| list expr terminator
{
(inst_print);
install_instruction(STOP_INST);
install_instruction;
YYACCEPT}
Now each new “list entry” will run as its own program; all of our “state” for hoc programs is tracked through variable values in the symbol table. So, each time we parse an expression successfully, we:
We will stop when the parser returns an error result, or we reach the
end of stdin
.
#define YACC_SUCCESS 0 // Yacc's success result
We’ll need to be able to reset the machine’s program storage and
pc
in between expressions. Remember that all of the
“persistent” state is saved in our symbol table.
We’ll add two more functions to our machine interface in
hoc.h
: machine_startup()
, which initializes
our machine once at program startup, and
machine_reset_program()
, to reset prog
and
pc
in between statements.
void machine_startup(void);
void machine_reset_program(void);
The functions are very similar; our startup is a handy place to initialize our random number generator.
/** Initialize machine state for startup */
void machine_startup(void) {
(time(NULL)); // use current time as seed for random generator
srand();
machine_reset_program}
/** Initialize execution state for a new program */
void machine_reset_program(void) {
= stack;
stackp = prog;
progp }
We can use those in our main loop:
int main(int argc, char *argv[]) {
...
install_hoc_builtins();+ machine_startup(); // one-time initialization
setjmp(begin);+ /* Reset the program after jumping back here on error */
+ machine_reset_program();
+ while(yyparse() == YACC_SUCCESS && !feof(stdin)) {
+ execute(NULL);
+ machine_reset_program();
}
return 0;
}+
hoc4
In the book, the authors do not create the MachineCode
abstraction, instead using an array of Inst
pointers to
hold both Inst
and Symbol
pointers. Because
pointers to different kinds of objects are not
guaranteed to be the same size, we use a union of values instead. While
the type
field in MachineCode
is not
necessary, it’s quite handy for debugging.
Since the book’s builtins all have exactly one argument, they do not require the handling we have here of multi-argument Symbol functions.
The authors also return a value of 1
directly wherever
we use YYACCEPT
. Since bison
does not
recommend bypassing its return macros, we use their documented
interface.
hoc4
was a major transition point our operation; hoc5 will build on this foundation, adding flow
control and conditional logic to our little virtual machine.
?= -std=c2x -Wall -Wextra -pedantic -Wformat -Wformat-extra-args -Wformat-nonliteral
CFLAGS ?= -d
YFLAGS =-lm
LDFLAGS:= builtins.o math.o symbol.o machine.o hoc.o
objects
.PHONY:all
: hoc
all
: $(objects)
hoc(CFLAGS) $(objects) $(LDFLAGS) -o hoc
cc $
.c hoc.tab.h: hoc.y
hoc(YFLAGS) hoc.y -o hoc.tab.c
yacc $.tab.c hoc.c
mv hoc
%.o: hoc.h hoc.tab.h %.c
(CFLAGS) -c -o $@ $*.c
cc $
.PHONY:clean
:
clean-f y.tab.h
rm -f hoc.tab.h
rm -f $(objects)
rm -f hoc.c
rm -f hoc rm
#include "hoc.h"
#include "hoc.tab.h"
#include <stdio.h>
#include <stdlib.h>
/* Defined in math.c */
extern double Abs(double), Acos(double), Atan(double), Atan2(double, double),
(double), Exp(double), Integer(double), Lg(double), Ln(double),
Cos(double), Pow(double, double), Random(void), Sin(double), Sqrt(double);
Log10
static struct {
const char *name;
double value;
} constants[] = {
{"PI", 3.14159265358979323846},
{"E", 2.71828182845904523536},
{NULL, 0},
};
static struct {
const char *name;
int args;
union {
double (*call0)(void);
double (*call1)(double);
double (*call2)(double, double);
};
} builtins[] = {
{"abs", 1, .call1 = Abs},
{"acos", 1, .call1 = Acos},
{"atan", 1, .call1 = Atan},
{"atan2", 2, .call2 = Atan2},
{"cos", 1, .call1 = Cos},
{"exp", 1, .call1 = Exp},
{"int", 1, .call1 = Integer},
{"lg", 1, .call1 = Lg},
{"ln", 1, .call1 = Ln},
{"log10", 1, .call1 = Log10},
{"pow", 2, .call2 = Pow},
{"rand", 0, .call0 = Random},
{"sin", 1, .call1 = Sin},
{"sqrt", 1, .call1 = Sqrt},
{NULL, 0, .call0 = NULL},
};
void install_hoc_builtins(void) {
for (int i = 0; constants[i].name != NULL; i++) {
(constants[i].name, CONST, constants[i].value);
install_symbol}
for (int i = 0; builtins[i].name != NULL; i++) {
*s = install_symbol(builtins[i].name, BUILTIN, 0.0);
Symbol ->data.func.args = builtins[i].args;
sswitch (builtins[i].args) {
case 0:
->data.func.call0 = builtins[i].call0;
sbreak;
case 1:
->data.func.call1 = builtins[i].call1;
sbreak;
case 2:
->data.func.call2 = builtins[i].call2;
sbreak;
default:
("Cannot start hoc: Argument count error for builtin '%s'",
exec_error[i].name);
builtins}
}
}
#ifndef HOC_INC
#define HOC_INC 1
/*
* Global types & declarations for use in hoc
*/
#include <stddef.h>
///----------------------------------------------------------------
/// Symbols
///----------------------------------------------------------------
#define SYMBOL_NAME_LIMIT 100
typedef struct Symbol Symbol;
/** Function descriptor for builtins: these always return doubles */
struct BuiltinFunc {
int args;
union {
// anon union: e.g. use `data.func.call0` directly. */
double (*call0)(void);
double (*call1)(double);
double (*call2)(double, double);
};
};
struct Symbol {
short type; // Generated from our token types
char *name;
*next;
Symbol union {
double val; // a double-valued symbol
struct BuiltinFunc func; // a callable symbol
} data;
};
*install_symbol(const char *name, short type, double value);
Symbol *lookup(const char *name);
Symbol
///----------------------------------------------------------------
/// Machine Code
///----------------------------------------------------------------
typedef int (*Inst)(void);
typedef enum { CT_INST = 'I', CT_LITERAL = 'L', CT_SYM_REF = 'S' } CodeType;
typedef struct {
;
CodeType typeunion {
double literal;
;
Inst inst*symbol;
Symbol };
} MachineCode;
#define STOP_INST NULL
///----------------------------------------------------------------
/// Machine Program
///----------------------------------------------------------------
/**
* Installs the given `MachineCode` object into the next available
* location in the machine's current program. A runtime error will
* occur if the program is full
*/
void install_code(MachineCode mc);
#define install_literal(lit) \
install_code((MachineCode){.type = CT_LITERAL, .literal = (lit)})
#define install_ref(sp) \
install_code((MachineCode){.type = CT_SYM_REF, .symbol = (sp)})
#define install_instruction(instr) \
install_code((MachineCode){.type = CT_INST, .inst = (instr)})
void execute(MachineCode *addr);
///----------------------------------------------------------------
/// Instructions
///----------------------------------------------------------------
int inst_add(void);
int inst_assignconst(void);
int inst_assignvar(void);
int inst_call(void);
int inst_div(void);
int inst_eval(void);
int inst_mul(void);
int inst_pop(void);
int inst_print(void);
int inst_pushlast(void);
int inst_pushlit(void);
int inst_pushref(void);
int inst_sub(void);
///----------------------------------------------------------------
/// Machine Stack
///----------------------------------------------------------------
typedef union {
double value;
*symbol;
Symbol } Datum;
void machine_startup(void);
void machine_reset_program(void);
///----------------------------------------------------------------
/// Runtime functions
///----------------------------------------------------------------
/** global error handling */
((format(printf, 1, 2))) void warning(const char *msg, ...);
__attribute__((format(printf, 1, 2))) void exec_error(const char *msg, ...);
__attribute__
/** builtins */
void install_hoc_builtins(void);
#endif // HOC_INC
%{
///----------------------------------------------------------------
/// C Preamble
///----------------------------------------------------------------
/*
* "higher-order calculator" - Version 4
* From "The UNIX Programming Environment"
*/
#include <stdbool.h>
#include <stdio.h>
#include <ctype.h>
#include <signal.h>
#include <setjmp.h>
#include <stdarg.h>
#include "hoc.h"
///----------------------------------------------------------------
/// global state
///----------------------------------------------------------------
;
jmp_buf begin
///----------------------------------------------------------------
/// local declarations
///----------------------------------------------------------------
#define YACC_SUCCESS 0
int yylex(void);
void yyerror(const char *s);
void check_call_args(const Symbol *s, int actual);
%}
%union {
double hoc_value;
*hoc_symbol;
Symbol }
%token <hoc_value> NUMBER
%token <hoc_symbol> VAR CONST BUILTIN UNDEF
%type <hoc_symbol> assignable
%right '=' /* right-associative, much like C */
%left '+' '-' /* left-associative, same precedence */
%left '*' '/' /* left-assoc; higher precedence */
%right '^' /* exponents */
%left UNARY_MINUS /* Even higher precedence
than mul or div. Can't use '-', as we've used
it already for subtraction. */
%%
: /* nothing */
list| list terminator
| list error terminator { yyerrok; }
| list assign terminator
{
(inst_pop);
install_instruction(STOP_INST);
install_instruction;
YYACCEPT}
| list expr terminator
{
(inst_print);
install_instruction(STOP_INST);
install_instruction;
YYACCEPT}
;
: NUMBER
expr{
(inst_pushlit);
install_instruction($1);
install_literal}
| '@'
{
(inst_pushlast);
install_instruction}
| assignable
{
(inst_pushref);
install_instruction($1);
install_ref(inst_eval);
install_instruction}
| assign
| call
| expr '+' expr { install_instruction(inst_add); }
| expr '-' expr { install_instruction(inst_sub); }
| expr '*' expr { install_instruction(inst_mul); }
| expr '/' expr { install_instruction(inst_div); }
| expr '^' expr
{
(inst_pushref);
install_instruction(lookup("pow"));
install_ref(inst_call);
install_instruction}
| '-'expr %prec UNARY_MINUS /* prec makes us bind tighter than subtraction */
{
(inst_pushlit);
install_instruction(-1);
install_literal(inst_mul);
install_instruction}
| '(' expr ')'
;
: assignable '=' expr
assign{
(inst_pushref);
install_instruction($1);
install_ref(inst_assignvar);
install_instruction}
| assignable ':' '=' expr
{
(inst_pushref);
install_instruction($1);
install_ref(inst_assignconst);
install_instruction}
;
: BUILTIN '(' ')'
call{
($1, 0);
check_call_args(inst_pushref);
install_instruction($1);
install_ref(inst_call);
install_instruction}
| BUILTIN '(' expr ')'
{
($1, 1);
check_call_args(inst_pushref);
install_instruction($1);
install_ref(inst_call);
install_instruction}
| BUILTIN '(' expr ',' expr ')'
{
($1, 2);
check_call_args(inst_pushref);
install_instruction($1);
install_ref(inst_call);
install_instruction}
;
: VAR | CONST
assignable;
: '\n' | ';'
terminator;
%% // end of grammar
/* error tracking */
char *program_name;
int line_number = 1;
int main(int argc, char *argv[]) {
(void)argc;
= argv[0];
program_name ();
install_hoc_builtins(); // one-time initialization
machine_startup
(begin);
setjmp/* Reset the program after jumping back here on error */
();
machine_reset_programwhile(yyparse() == YACC_SUCCESS && !feof(stdin)) {
(NULL);
execute();
machine_reset_program}
return 0;
}
/* our simple, hand-rolled lexer. */
int yylex(void) {
int c;
do {
= getchar();
c } while (c == ' ' || c == '\t');
if (c == EOF) {
return 0;
}
if (c == '.' || isdigit(c)) {
(c, stdin);
ungetc("%lf", &yylval.hoc_value);
scanfreturn NUMBER;
}
if (c == '\n') {
++;
line_number}
if (isalpha(c)) {
*s;
Symbol
char buf[SYMBOL_NAME_LIMIT + 1];
size_t nread = 0;
do {
[nread++] = c;
buf= getchar();
c } while (nread < SYMBOL_NAME_LIMIT && (isalpha(c) || isdigit(c)));
// Just in case we exceeded the limit
while ((isalpha(c) || isdigit(c))) {
= getchar();
c }
// at this point, we have a non-alphanumeric 'c'
(c, stdin);
ungetc
[nread] = '\0';
bufif ((s=lookup(buf)) == NULL) {
= install_symbol(buf, UNDEF, 0.0);
s }
.hoc_symbol = s;
yylvalreturn (s->type == UNDEF ? VAR : s->type);
}
return c;
}
void yyerror(const char *s) {
("%s", s);
warning}
/*
* Verify Symbol's declared arg count matches actual,
* or display a helpful error message with mismatch
*/
void check_call_args(const Symbol *s, int actual) {
int expected = s->data.func.args;
if (expected != actual) {
("Wrong number of arguments for %s: expected %d, got %d",
exec_error->name, expected, actual);
s}
}
#define print_error_prefix() fprintf(stderr, "%s: ", program_name)
#define print_error_suffix() fprintf(stderr, " (on line %d)\n", line_number)
void warning(const char *msg, ...) {
va_list args;
(args, msg);
va_start
();
print_error_prefix(stderr, (msg), args);
vfprintf();
print_error_suffix
(args);
va_end}
void exec_error(const char *msg , ...) {
va_list args;
(args, msg);
va_start
();
print_error_prefix(stderr, (msg), args);
vfprintf();
print_error_suffix
(args);
va_end
(begin, 0);
longjmp}
#include "hoc.h"
#include "hoc.tab.h"
#include <assert.h>
#include <stddef.h> // NULL
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
/*
* MACHINE STATE
* =============
*
* We have our "program" stored as a list of MachineCode
* objects, each encoding one of:
* - an instruction
* - a literal value
* - a symbol reference
*
* We "execute" our program by setting the program counter (pc)
* pointer to the first MachineCode (which *must* be an instruction),
* and handling each instruction we find. Within an instruction, the
* program counter might be updated so that by the *end* of any
* instruction, the program counter is always pointing to another
* instruction.
*
*
* The Stack
* ---------
* Each instruction manipulates the stack, pushing,
* popping, or combining values from the stack. The stack elements
* are Datum objects, which act much like MachineCodes, but with two
* differences.
*
* 1. Datums cannot represent an instruction; they resolve to a value
* 2. The type of a datum can always be resolved from their context,
* and so no "type" field is necessary.
*
* ┌────────────────────────────────────────────────────┐
* │ == PROG == == STACK == │
* │ ┌─────────────┐◀───┐ ┌─────────────┐ │
* │ │ MachineCode │ pc │ Datum │ │
* │ ├─────────────┤ ├─────────────┤ │
* │ │ MachineCode │ │ Datum │ │
* │ ├─────────────┤ ├─────────────┤ │
* │ │ MachineCode │ progp │ Datum │ stackp │
* │ ├─────────────┤◀────┘ ├─────────────┤ ◀──┘ │
* │ │ ... │ │ ... │ │
* │ └─────────────┘ └─────────────┘ │
* └────────────────────────────────────────────────────┘
*/
#define PROGSIZE 2000
static MachineCode prog[PROGSIZE]; // the installed code
static MachineCode *progp; // Next free location for code generation
static MachineCode *pc; // Current location in the executing program
#define STACKSIZE 256
static Datum stack[STACKSIZE]; // the machine stack
static Datum *stackp; // next free spot on the stack
static Datum last_popped = {.value = 0}; // previous top-of-stack value
static void stack_push(Datum d);
static Datum stack_pop(void);
#define is_stop_inst(mc) ((mc)->type == CT_INST && (mc)->inst == STOP_INST)
/** Initialize machine state for startup */
void machine_startup(void) {
(time(NULL)); // use current time as seed for random generator
srand();
machine_reset_program}
/** Initialize execution state for a new program */
void machine_reset_program(void) {
= stack;
stackp = prog;
progp }
///----------------------------------------------------------------
/// Code Installation
///----------------------------------------------------------------
void install_code(MachineCode mc) {
if (progp >= prog + PROGSIZE) {
("Max program size exceeded");
exec_error}
*(progp++) = mc;
}
///----------------------------------------------------------------
/// Execution
///----------------------------------------------------------------
void execute(MachineCode *addr) {
= (addr == NULL) ? prog : addr;
pc
while (!is_stop_inst(pc)) {
if (pc->type != CT_INST) {
("Programming error, type '%c' was not INST!", pc->type);
exec_error}
(*(*pc++).inst)(); // increment PC and execute inst
}
}
/** Push new value onto stack */
static void stack_push(Datum d) {
if (stackp >= stack + STACKSIZE) {
("Stack overflow");
exec_error}
*(stackp++) = d; // copy
}
static Datum stack_pop(void) {
if (stackp <= stack) {
("Stack underflow");
exec_error}
= *(--stackp);
last_popped return last_popped;
}
int inst_pushlit(void) {
(pc->type == CT_LITERAL);
assert
= (Datum){.value = pc->literal};
Datum d (d);
stack_push++; // this instruction is two "words" long
pcreturn 0;
}
int inst_pushref(void) {
(pc->type == CT_SYM_REF);
assert
= (Datum){.symbol = pc->symbol};
Datum d (d);
stack_push++; // this instruction is two "words" long
pcreturn 0;
}
int inst_pushlast(void) {
(last_popped);
stack_pushreturn 0;
}
int inst_eval(void) {
= stack_pop(); // must be symbol
Datum d switch (d.symbol->type) {
case UNDEF:
("Undefined variable: '%s'", d.symbol->name);
exec_errorbreak;
case BUILTIN:
("Builtin '%s' cannot be evaluated", d.symbol->name);
exec_errorbreak;
default: // VAR,CONST
((Datum){.value = d.symbol->data.val});
stack_pushbreak;
}
return 0;
}
int inst_call(void) {
= stack_pop(); // must be symbol
Datum s if (s.symbol->type != BUILTIN) {
("Cannot call non-function '%s'", s.symbol->name);
exec_error}
double result;
switch (s.symbol->data.func.args) {
case 0:
= s.symbol->data.func.call0();
result break;
case 1: {
= stack_pop();
Datum arg = s.symbol->data.func.call1(arg.value);
result break;
}
case 2: {
= stack_pop();
Datum arg2 = stack_pop();
Datum arg1 = s.symbol->data.func.call2(arg1.value, arg2.value);
result break;
}
}
((Datum){.value = result});
stack_pushreturn 0;
}
int inst_assignvar(void) {
= stack_pop(); // must be symbol
Datum lhs = stack_pop(); // must be value
Datum rhs if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
("Cannot assign to symbol '%s'", lhs.symbol->name);
exec_error}
.symbol->data.val = rhs.value;
lhs.symbol->type = VAR;
lhs((Datum){.value = rhs.value});
stack_pushreturn 0;
}
int inst_assignconst(void) {
= stack_pop(); // must be symbol
Datum lhs = stack_pop(); // must be value
Datum rhs if (lhs.symbol->type == CONST) {
("Cannot reassign constant '%s'", lhs.symbol->name);
exec_error} else if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
("Cannot assign to symbol '%s'", lhs.symbol->name);
exec_error}
.symbol->data.val = rhs.value;
lhs.symbol->type = CONST;
lhs((Datum){.value = rhs.value});
stack_pushreturn 0;
}
int inst_pop(void) {
();
stack_popreturn 0;
}
int inst_print(void) {
= stack_pop();
Datum d ("\t%.8g\n", d.value);
printfreturn 0;
}
int inst_add(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
((Datum){.value = lhs.value + rhs.value});
stack_pushreturn 0;
}
int inst_sub(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
((Datum){.value = lhs.value - rhs.value});
stack_pushreturn 0;
}
int inst_div(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
if (rhs.value == 0) {
("Division by zero");
exec_error} else {
((Datum){.value = lhs.value / rhs.value});
stack_push}
return 0;
}
int inst_mul(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs ((Datum){.value = lhs.value * rhs.value});
stack_pushreturn 0;
}
#include "hoc.h"
#include <errno.h>
#include <fenv.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h> // rand
#define HOC_RAND_MAX 1000000L
static double check_math_err(double result, const char *op);
double Abs(double x) {
return check_math_err(fabs(x), "fabs");
}
double Acos(double x) {
return check_math_err(acos(x), "acos");
}
double Atan(double x) {
return check_math_err(atan(x), "atan");
}
double Atan2(double x, double y) {
return check_math_err(atan2(x, y), "atan2");
}
double Cos(double x) {
return check_math_err(cos(x), "cos");
}
double Exp(double x) {
return check_math_err(exp(x), "exp");
}
double Integer(double x) {
return (double)(long)x;
}
double Lg(double x) {
return check_math_err(log2(x), "lg");
}
double Ln(double x) {
return check_math_err(log(x), "log");
}
double Log10(double x) {
return check_math_err(log10(x), "log10");
}
double Pow(double x, double y) {
return check_math_err(pow(x, y), "exponentiation");
}
double Random(void) {
// We have HOC_RAND_MAX buckets
const int bucket_size = RAND_MAX / HOC_RAND_MAX;
int result;
do {
= rand() / bucket_size;
result } while (result >= HOC_RAND_MAX);
return result + 1;
}
double Sin(double x) {
return check_math_err(sin(x), "sin");
}
double Sqrt(double x) {
return check_math_err(sqrt(x), "sqrt");
}
static double check_math_err(double result, const char *op) {
static const char *domain_msg = "argument outside domain";
static const char *range_msg = "result outside range";
static const char *other_msg = "floating-point exception occurred";
const char *msg = NULL;
if ((math_errhandling & MATH_ERREXCEPT) && fetestexcept(FE_ALL_EXCEPT)) {
// Special case: Inexact results are not errors.
if (fetestexcept(FE_INEXACT)) {
goto done;
}
if (fetestexcept(FE_INVALID)) {
= domain_msg;
msg } else if (fetestexcept(FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW)) {
= range_msg;
msg } else { // unknown
= other_msg;
msg }
(FE_ALL_EXCEPT);
feclearexcept} else if (errno) {
if (errno == EDOM) {
= domain_msg;
msg } else if (errno == ERANGE) {
= range_msg;
msg } else {
= other_msg;
msg }
= 0;
errno }
if (msg != NULL) {
("math error during %s: %s", op, msg);
exec_error}
:
donereturn result;
}
#include "hoc.h"
#include "hoc.tab.h" // generated from yacc -d on our grammar
#include <stddef.h> // NULL
#include <stdlib.h> // malloc
#include <string.h> // strcmp
static Symbol *symbol_table = NULL;
void *emalloc(size_t nbytes);
*lookup(const char *name) {
Symbol *current = symbol_table;
Symbol while (current != NULL) {
if (strcmp(current->name, name) == 0) {
return current;
}
= current->next;
current }
return NULL;
}
*install_symbol(const char *name, short type, double value) {
Symbol size_t name_len = strlen(name);
*s = emalloc(sizeof(Symbol) // actual symbol
Symbol + (sizeof(char)) * (name_len + 1)); // room for name
->name = (char *)(s + 1);
s(s->name, name);
strcpy->name[name_len] = '\0';
s
->type = type;
s->data.val = value;
s->next = symbol_table;
s= s;
symbol_table return s;
}
void *emalloc(size_t nbytes) {
void *ptr = malloc(nbytes);
if (ptr == NULL) {
("Out of memory!");
exec_error}
return ptr;
}