hoc5
builds on the VM model constructed for
hoc4
, enabling more complex constructs like
if/else
branching and while
loops. As we’ll
see, these flow-control constructs will be more useful if we can also
perform some limited output, so we’ll add a print
keyword
as well.
These constructs are the first step to executing larger programs,
instead of the single-line “programs” executed by hoc4
.
We’ll begin with an overview of these new features to better understand
how or implementation might change.
The first feature we’ll need for conditional logic is a set of
relational operators, so that we can compare two
values. We’ll add the standard set of ==
, !=
, <
, <=
, >
, and >=
.
These operators, like our arithmetic operators, will be compiled into
instructions which will pop two values from the stack, then push the
value 1
if the given comparison is true, and 0
otherwise.
The boolean operators, &&
and ||
, operate the same way. In
hoc5
, unlike most languages, we’ll always evaluate both
operands, instead of short-circuiting. We do want to inherit
C’s approach to boolean precedence, however. Namely, the boolean
operators should have a very low precedence, so that
expressions like:
x < y && y < z
Are evaluated as:
Instead of:
The hoc
language up to this point is meant for
line-by-line interactions. The user enters an expression, and
hoc prints a response. This “calculator-style” of interaction does not
work as well once we introduce conditional logic. Consider:
= 10
x = 1
y while (x > 0) {
= y * x
y = x - 1
x }
(y) lg
hoc
code for lg(10!)
The first two lines of this example will execute correctly in
hoc4
. Each one will be parsed into a tiny “assignment”
program, and execute()
ed in the virtual machine.
The while
loop, however, presents some brand-new
challenges. There are at least 3 hoc4
“programs” we can see
inside this snippet of code.
Furthermore, some of these programs will be executed a variable number of times:
Certainly we must always execute the instructions to evaluate the condition, since that will determine whether to enter the loop body or not.
Our loop body, and its multiple internal expressions, will only execute if the condition is true. And if it’s true, we should return to execute it again after the body.
Finally, once the condition evaluates to false, then we must skip execution of the body, and move to the instructions after the loop body.
Obviously, our parser cannot know how many iterations of the loop we will run, so the machine must be able to “jump” between the condition, body, and post-body sections on its own.
We can visualize its behavior as follows:
We’ll need a new set of instructions for this; branching
instructions, which allow us to move our pc
to a
new value and continue executing.
We’ll still need some help from the parser, since our
machine needs to be told what address in the program to
jump to. We can figure this out at parsing time, but only
after the whole loop is parsed and all of its code is
installed. So, we’ll need to be able to install the
entire while
construct, including the conditional
expression and loop body, before we execute any of it.
This will require some structural changes to our grammar.
For the most part, implementing relational operations is a
straightforward extension of our current arithmetic operations. There is
a small additional challenge regarding their syntax: the operators
themselves may be 1 or 2 characters,
and their prefixes overlap. For example, we must be able to distinguish
<
from <=
, regardless of whitespace and
surrounding digits/letters, e.g. 2 <=5
vs 2< 5
.
The relational operations are left-associative, just like arithmetic.
They are set to a lower priority than any arithmetic operation,
since we want x < y + 2
to mean
what we would expect. Similarly, we want most operations to be complete
before evaluating the OR
and AND
operators.
%type <hoc_symbol> assignable
%right '=' /* right-associative, much like C */+ %left OR
+ %left AND
+ %left EQ GT GTE LT LTE NEQ
%left '+' '-' /* left-associative, same precedence */
%left '*' '/' /* left-assoc; higher precedence */ %left UNARY_MINUS NOT /* Prefix operators have highest priority*
Next, at the end of the lexing code, we add logic to handle the case where the character we’ve read matches more than one operator.
/* our simple, hand-rolled lexer. */
int yylex(void) {
static bool hit_eof = false;
int c;
// Skip space
// Get numbers
...
if (isalpha(c)) { // Symbols
...+ switch (c) {
+ case '<':
+ return lookahead('=', LTE, LT);
+ case '>':
+ return lookahead('=', GTE, GT);
+ case '=':
+ return lookahead('=', EQ, '=');
+ case '!':
+ return lookahead('=', NEQ, NOT);
+ case '|':
+ return lookahead('|', OR, '|');
+ case '&':
+ return lookahead('&', AND, '&');
+ default:
+ return c;
+ }
+ }
+
The lookahead
function “peeks” at the next character in
our input stream. If it matches a two-character operator, then we
consume both characters and return the corresponding token. Otherwise,
we put the the character back onto the front of the input stream for the
next lexing pass, and return the otherwise
token.
/**
* "Looks ahead 1 character for a possibility. If it's found, returns
* `match`. If not:
* - places character back onto input stream for next read
* - returns the `otherwise` result.
*/
static int lookahead(char expect, int match, int otherwise) {
int c = getchar();
if (c == expect) {
return match;
} else {
(c, stdin);
ungetcreturn otherwise;
}
}
The relational operations fit nicely into our existing
expr
productions. We’ll need to add references to the new
instructions into hoc.h
as well.
expr: NUMBER
...
| expr '+' expr { install_instruction(inst_add); }
| expr '-' expr { install_instruction(inst_sub); }
| expr '*' expr { install_instruction(inst_mul); }
| expr '/' expr { install_instruction(inst_div); }+ | expr LT expr { install_instruction(inst_lt); }
+ | expr LTE expr { install_instruction(inst_lte); }
+ | expr GT expr { install_instruction(inst_gt); }
+ | expr GTE expr { install_instruction(inst_gte); }
+ | expr EQ expr { install_instruction(inst_eq); }
+ | expr NEQ expr { install_instruction(inst_neq); }
+ | expr AND expr { install_instruction(inst_and); }
+ | expr OR expr { install_instruction(inst_or); }
...
Each instruction operates just like arithmetic; the operands are already on the stack, in reverse order. We pop both, perform our operation, and push the result.
The <=
operator is below;
the rest are similar.
int inst_lte(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value <= rhs.value));
stack_pushreturn 0;
}
You should now be able to test simple relational expressions, such as
1 <= 0
or 42 > 13 && 0 < 1
.
There are several changes we’ll need to make to the language to incorporate flow-control constructs. We’ll add the functionality bit-by-bit, solving one problem at a time.
Before we do anything else, we have one easy requirement: we’ll need
to be able to recognize some keywords for our language
now; hoc5
requires all of: while
,
if
, else
, and print
.
Luckily, we already have a mechanism for matching multi-character
tokens in our language; the symbol table. We will extend our
approach to BUILTIN
tokens to handle keywords as well.
First, we’ll need to setup tokens for each of the keywords:
%token <hoc_value> NUMBER
%token <hoc_symbol> VAR CONST BUILTIN UNDEF+%token <hoc_symbol> IF_KW ELSE_KW PRINT_KW WHILE_KW
%type <hoc_symbol> assignable
Then, just as we did for our builtin constants and functions, we
create a small array of “input data” structs for use during
install_hoc_builtins
, and install them the same way.
+static struct {
+ const char *name;
+ int token_type;
+} keywords[] = {
+ {"if", IF_KW},
+ {"else", ELSE_KW},
+ {"print", PRINT_KW},
+ {"while", WHILE_KW},
+ {NULL, 0},
+};
void install_hoc_builtins(void) {
for (int i = 0; constants[i].name != NULL; i++) {
install_symbol(constants[i].name, CONST, constants[i].value);
}+
+ for (int i = 0; keywords[i].name != NULL; i++) {
+ install_symbol(keywords[i].name, keywords[i].token_type, 0.0);
+ }
for (int i = 0; builtins[i].name != NULL; i++) { ...
The next problem we’ll tackle actually occurs in the “middle” of our loop constructs. Consider the factorial example from above:
x = 10
y = 1
while (x > 0) {
y = y * x
x = x - 1
}
Let’s focus on the body of this loop. We know that we will need to
parse the entire body before we execute anything. Since our current
grammar stops after every terminated expression, there’s no way for our
current grammar to parse the body before executing it. We’ll need a new
construct for these situations. We could try to add a new kind of
expression, such as loop
. Then we could match multiple
“sub-expressions” inside our loop body, using the same kind of
production that we use for our REPL list
nonterminal.
e.g.
expr: ...
| while '(' expr ')' '{' loop_body '}'
loop_body: /* nothing */
| loop_body expr terminator
loop_body terminator
One possibility: creating a loop expression
There are two questions to answer here:
expr
in loop_body
?We could certainly account for the first question, by using our action to remove each line’s expression from the stack. The second question raises more new questions than it answers; for example, if this construct is an expression, does that mean we can use it as the right-side of an assignment?
For the following example, ask yourself:
z
have?x = 10
y = 1
z = while (x > 0) {
y = y * x
x = x - 1
}
While this is perfectly acceptable in some languages, for the sake of adhering to C-like language constructs, (and consistency with UPE), we’ll decide that no, we will not allow this. Instead, we will distinguish expressions, which leave a value on the top of the machine stack, from statements, which do not.
Since we’ll want to combine expressions and statements together in our loop bodies, we’ll say that every expression may be used as a statement, leading us to the following productions:
: expr { install_instruction(inst_pop); }
stmt| '{' stmtlist '}
: /* nothing */
stmtlist| stmtlist stmt terminator
| stmtlist terminator
Our stmt: expr
prodcution
allows us to use any expression in our loop bodies, and the stmtlist
productions allow as many
statements inside braces as we want, including empty lines.
We must also make sure that it’s acceptable for statements to appear
as top-level entries in our list
production. Unlike
expressions, there is no value to print, so our action will merely be to
STOP
.
list: /* nothing */
| list terminator+ | list stmt terminator
+ {
+ install_instruction(STOP_INST);
+ YYACCEPT;
+ }
| list assign terminator
{
install_instruction(inst_pop);
install_instruction(STOP_INST);
YYACCEPT; }
STOP
after a stmt
?
execute()
function might continue to execute instructions installed by older
statements / expressions.
Sadly, this will add another shift/reduce conflict to our compilation
output. (stmtlist
and list
both match
terminator-separated statements).
If you’re following along, the following program should now parse correctly; notice that once you type the opening brace, the REPL will not evaluate or print anything until you close it and write an expression on a new line.
{
= 12
x = 14
y = x + y
z }
z
Testing statements
Next, let’s add the infrastructure required for specifying and “jumping to” a specific location in our VM program.
In order for our language to include flow control, there must be the concept of an instruction’s address within the current program, and the ability for the machine to change its program counter to a given address.
To represent addresses, we add an Addr
typedef, which
simply points to a location in our prog
array; that is, a
MachineCode
pointer. So that we can use the
Addr
type inside our MachineCode
definition as
well, we need to forward-declare the MachineCode
struct, leading to a small rework of the declaration.
+ typedef struct MachineCode MachineCode;
+ typedef MachineCode *Addr;
- typedef struct MachineCode {
+ struct MachineCode {
+ CodeType type;
+ union {
+ double literal;
+ Inst inst;
+ Symbol *symbol;
+ Addr addr;
+ }; // anonymous union; no need to name it
- } MachineCode;
+ };
+
In our diagram of a compiled while
statement, we see
several GO
actions that seem like they would need
addresses.
Thus, we’ll need to be able to embed an address as
part of an instruction, so we need to allow our MachineCode
objects to hold addresses.
...
typedef int (*Inst)(void);
typedef enum CodeType {
CT_INST = 'I',
CT_LITERAL = 'L',
CT_SYM_REF = 'S',+ CT_ADDR = 'A',
} CodeType;
struct MachineCode {
CodeType type;
union {
double literal;
Inst inst;
Symbol *symbol;+ Addr addr;
}; };
In order to embed addresses into our instructions, they will need to
be available at parse time. Luckily, since
hoc4
moved value computation from the parser to the
machine, our expressions no longer produced any value for use by other
actions. Now that larger statements will need to use the addresses of
child statements, we can update our productions to ensure that each
expression and statement will return the address at which its
parsed code block starts.
First, we need to enable the machine to report its current “installation address” through its code-installation interface. The parsing routines can use these addresses to report where they’ve installed code.
/**
* Installs the given `MachineCode` object into the next available
* location in the machine's current program. A runtime error will
* occur if the program is full.+ *
+ * Returns the address at which the code was installed.
*/- void install_code(MachineCode mc);
+ Addr install_code(MachineCode mc);
+ /** The next available address in the machine's current program */
+ extern Addr progp;
The machine implementation changes only slightly; we expose
progp
externally, and update install_code
to
return the installation address:
-static MachineCode *progp; // Next free location for code installation
+Addr progp; // Next free location for code installation
...
-void install_code(MachineCode mc) {
+Addr install_code(MachineCode mc) {
if (progp >= prog + PROGSIZE) {
exec_error("Max program size exceeded");
}- *(progp++) = mc;
+ *progp = mc;
+ return progp++; // return location of THIS instruction
}
Next, we can update our grammar to require every expression and
statement to report their start address. This means any
nonterminals used by expr
, such as assign
or
call
, will also return addresses.
%union {
double hoc_value;
Symbol *hoc_symbol;+ Addr hoc_address;
}
...+ %type <hoc_address> assign expr call end stmt stmtlist
Most of the rules are straightforward; for example,
NUMBER
and VAR
expressions just return the
address of the first instruction they install.
expr: NUMBER
{- install_instruction(inst_pushlit);
$$ = install_instruction(inst_pushlit);
install_literal($1);
}
| '@'
{- install_instruction(inst_pushlast);
+ $$ = install_instruction(inst_pushlast);
}
| assignable
{- install_instruction(inst_pushref);
+ $$ = install_instruction(inst_pushref);
install_ref($1);
install_instruction(inst_eval); }
Returning addresses of numbers and variables hoc5/hoc.y
Surprisingly, our arithmetic, relational, and boolean operator
actions require no changes at all. Why? The default value produced by an
action, if we don’t specify one, is $1
; that is, all actions act as though
they begin with $$ = $1
.
When we are handling the operator productions, we’ve already processed the operand expressions. Since all arithmetic/relational/boolean operators are left-associative, their left-hand expression is evaluated first, and it will produce the address corresponding to pushing its value onto the stack.
In the example below, the expression 2 * PI
will return the PROG
address where we start generating that value. In this case,
it’ll produce 10. Since this is exactly what we want,
we can leave the default behavior.
Notice that this behavior is why our machine expects math-operation arguments in right-to-left order. Since the left-hand side is produced first, it’s lower on the stack!
Like our other operators, by the time we are reducing an assignment, we’ve already installed instructions to generate the left and right sides. Unlike the other operators, assignment is right-associative, and so the first instruction executed to generate the assignment actually comes from generating the right-side operand.
assign: assignable '=' expr
{
install_instruction(inst_pushref);
install_ref($1);
install_instruction(inst_assignvar);+ $$ = $3; /* instructions begin with RHS */
}
| assignable ':' '=' expr
{
install_instruction(inst_pushref);
install_ref($1);
install_instruction(inst_assignconst);+ $$ = $4; /* instructions begin with RHS */
}
Similarly, the call
actions must return the address
corresponding to producing their first argument
We can now fill in the actions for our stmt
productions.
For a single-expression statement, we have two jobs:
For a statement list, we have a sequence of zero or more statements. Each statement removes its own stack item, so there’s no need to add any more instructions. But for an empty block, we have no instructions at all; in this case, we’ll produce the address following the block–the next free program address.
: expr { install_instruction(inst_pop); }
stmt| '{' stmtlist '}' { $$ = $2; }
: /* nothing */ { $$ = progp; }
stmtlist| stmtlist stmt terminator { $$ = $1; }
| stmtlist terminator { $$ = $1; }
whilecode
Before we write the parsing logic for while-loop syntax, let’s figure out the structure of the machine instructions, so we have an idea of our “goal output” for the parser.
Our “high-level plan” for while loops was:
This is a complex set of behaviors, and in most real machines, is
achieved by installing “jumping” instructions at appropriate places in
the execution flow. We’ll examine that approach in adhoc
,
but for hoc5
, we’ll create a “super-instruction” that
handles the looping mechanism itself, directly in C:
whilecode
.
whilecode
instructionYou can think of the whilecode
instruction as being a
“variable-length” instruction, which includes the instructions for the
conditional expression and the loop body.
The whilecode
instruction itself is immediately followed
by our two addresses: one for the loop body and the
other immediately following the body.
After the addresses, we have our condition.
Conditions are allowed to be any expression, so we must
be able to run all of the instructions that make up the condition, and
then stop as soon as we’ve produced the result. We already have
a mechanism for doing that: the execute()
method. Whwn we execute a whilecode
instruction,
we’ll treat the condition like a tiny program of its own; we’ll
call execute()
on its address!
This call will simply run all of the instructions from the beginning of
cond
until it reaches a STOP
instruction. (We
should be sure to install this while parsing the cond
).
Likewise, we’ll need to install a STOP
instruction after
the loop body, so we can again evaluate the condition.
whilecode
Assuming we will eventually install this structure into our machine,
we are finally ready to write our execution flow for loops. Upon
entering the instruction, our PC
is already pointing to the
machine code for the body’s address.
int inst_whilecode(void) {
= (pc++)->addr;
Addr body_addr = (pc++)->addr;
Addr end_addr = pc;
Addr cond_addr
while (true) {
(cond_addr);
executeif (stack_pop().value) { // guaranteed: conditions are expressions
(body_addr);
execute} else {
break;
}
}
= end_addr;
pc return 0;
}
While we use recursion to execute our loop, we don’t recurse to leave the loop, since that could quickly lead to infinite recursion.
while
statementsThe syntax of a while statement is familiar:
while (conditional-expression) {
body}
while (conditional-expression) body
We’ll need to parse that syntax into the whilecode
instruction we’ve designed.
In particular, we’ll need to install:
whilecode
instruction itselfSTOP
STOP
This is complicated somewhat by the fact that we won’t know the addresses of the body / post-body instructions until after the body is parsed. To ensure our parse behaviors happen in the appropriate order, we create several “sub-rules” to perform actions at the appropriate times in the parsing process.
while
keyword, we
should reserve space for the whilecode instruction and
its addresses. This implies we’ll need a rule just for that keywordSTOP
instructionBelow you can see the components of our while loop, and how they
correspond to the installed MachineCode
.
while
KeywordFirst off, as soon as we see the while
keyword, we
should reserve space inside prog
for the
whilecode
instruction and its addresses. This reduction
occurs before any other part of the loop is parsed, so these 3 words of
program memory come first.
+while: WHILE_KW
+ {
+ $$ = install_instruction(STOP_INST);
+ install_instruction(STOP_INST); // body-addr
+ install_instruction(STOP_INST); // after-body-addr
+ }
Don’t forget to mark the whilecode
address as the
produced value! Our while
loops will be statements, so they
must produce their starting address.
- %type <hoc_address> assign call expr stmt stmtlist
+ %type <hoc_address> assign call expr stmt stmtlist while
cond
NonterminalThe “conditional” part of our loop is just an expression; the only
detail we must take care of is installing a STOP
instruction before the loop-body instructions are generated. (Since the
condition’s instructions always start two words after the
whilecode
instruction, there’s no need to report it, so we
don’t return anything from a cond
nonterminal.)
%%
+cond: '(' expr ')'
+ {+ install_instruction(STOP_INST);
+ }
STOP
s:
The end
ruleA loop body is just a stmt
, but like our condition, we
must STOP
when it’s complete. We need another
STOP
instruction after the stmt
. We use an
end
nonterminal to do so, as well as to generate the
address following the body.
- %type <hoc_address> assign call expr stmt stmtlist while
+ %type <hoc_address> assign call end expr stmt stmtlist while
end
is a tiny “helper” used to generate the next
available address; this is useful for cases where we might not
have a next statement.
: /* nothing */
end{
(STOP_INST);
install_instruction= progp;
$$ }
while
statementWe can finally combine all of these nonterminals to create our
while
statement structure. At the point we’re reducing this
rule, all addresses have been completely resolved, and so it’s time to
install them into the prog
slots reserved for us by the
while
nonterminal.
stmt: expr { install_instruction(inst_pop); }
| '{' stmtlist '}' { $$ = $2; }+ | while cond stmt end
+ {
+ $1[0].type = CT_INST;
+ $1[0].inst = inst_whilecode;
+ $1[1].type = CT_ADDR;
+ $1[1].addr = $3;
+ $1[2].type = CT_ADDR;
+ $1[2].addr = $4;
+ }
Because the if/else statements are so similar to loops, we don’t need to exhaustively cover their implementation. The installed structure is very similar:
if/else
statementsFirst, the if
statement produces an address:
- %type <hoc_address> assign call end expr stmt stmtlist while
+ %type <hoc_address> assign call end expr if stmt stmtlist while
It also reserves space for its addresses, just like
while
:
if: IF_KW
{
= install_instruction(STOP_INST);
$$ (STOP_INST);
install_instruction(STOP_INST);
install_instruction(STOP_INST);
install_instruction}
Because our else
clause is optional, we include two
forms in our grammar; they install the same set of addresses , but the
if
-only variant leaves its else-body
address
set to the STOP
instruction.
: expr { install_instruction(inst_pop); }
stmt...
| if cond stmt end
{
1[0].type = CT_INST;
$1[0].inst = inst_ifcode;
$1[1].type = CT_ADDR;
$1[1].addr = $3;
$1[3].type = CT_ADDR;
$1[3].addr = $4;
$}
| if cond stmt end ELSE_KW stmt end
{
1[0].type = CT_INST;
$1[0].inst = inst_ifcode;
$1[1].type = CT_ADDR;
$1[1].addr = $3;
$1[2].type = CT_ADDR;
$1[2].addr = $6;
$1[3].type = CT_ADDR;
$1[3].addr = $7;
$}
Notice that our syntax has relatively strict requirements for the locations of the braces:
// ALLOWED
= 42
x while (x > 0) x = x - 1
// OR
while (x > 0) {
= x - 1
x
print x}
// NOT ALLOWED
while (x > 0)
{
= x - 1
x
print x}
if
and while
constructs, the opening
brace must be on the same line as the closing )
of the conditional expressionelse
constructs, the else
keyword must
be on the same line as the closing }
of the if body.These unfortunate requirements are due to the fact that we treat the
newline terminator as an “execution point” in our grammar (in the
list
rule), so that we can have an interactive REPL. This
could be resolved by treating newlines as whitespace in a non-REPL
version of hoc.
if/else
statementsOur machine implementation is also very similar to
whilecode
, though we do not update our pc
as we go; rather, we choose to be
specific about which word we are accessing.
int inst_ifcode(void) {
= pc[0].addr;
Addr ifbody_addr = NULL;
Addr maybe_elsebody_addr if (!is_stop_inst(pc + 1)) {
= pc[1].addr;
maybe_elsebody_addr }
= pc[2].addr;
Addr end_addr = pc + 3;
Addr cond_addr (cond_addr);
execute= stack_pop(); // must be value
Datum d if (d.value) {
(ifbody_addr);
execute} else if (maybe_elsebody_addr != NULL) {
(maybe_elsebody_addr);
execute}
= end_addr;
pc return 0;
}
print
StatementNow that we have statements, there will be times when we want to display a value inside a loop or conditional body. Since statements do not produce values, the REPL cannot output them. We will need to display them ourselves.
We add a print
statement via a print keyword and
associated prexpr
instruction.
stmt: expr { install_instruction(inst_pop); }
...+ | PRINT_KW expr
+ {
+ install_instruction(inst_prexpr);
+ $$ = $2;
+ }
And the machine implementation is very simple; it’s
print
without a tab.
int inst_prexpr(void) {
= stack_pop();
Datum d ("%.8g\n", d.value);
printfreturn 0;
}
hoc5
finishes the core syntax of logic in
hoc
; we’ve finished the core vocabulary of the language.
What remains to build are tools for abstraction; we need a way
to reuse blocks of code, rather than writing a program as one huge
top-to-bottom script!
hoc6
, the last version of
hoc
presented in The UNIX Programming Environment,
solves that issue. It includes the largest and most interesting new
feature of any so far: user-defined subroutines! So grab a coffee (or
take a break), and then we’ll continue onward.
?= -std=c2x -Wall -Wextra -pedantic -Wformat -Wformat-extra-args -Wformat-nonliteral
CFLAGS ?= -d
YFLAGS =-lm
LDFLAGS:= builtins.o math.o symbol.o machine.o hoc.o
objects
.PHONY:all
: hoc
all
: $(objects)
hoc(CFLAGS) $(objects) $(LDFLAGS) -o hoc
cc $
.c hoc.tab.h: hoc.y
hoc(YFLAGS) hoc.y -o hoc.tab.c
yacc $.tab.c hoc.c
mv hoc
%.o: hoc.h hoc.tab.h %.c
(CFLAGS) -c -o $@ $*.c
cc $
.PHONY:clean
:
clean-f hoc.tab.h
rm -f $(objects)
rm -f hoc.c
rm -f hoc rm
#include "hoc.h"
#include "hoc.tab.h"
#include <math.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h> // rand
/* Defined in math.c */
extern double Abs(double), Acos(double), Atan(double), Atan2(double, double),
(double), Exp(double), Integer(double), Lg(double), Ln(double),
Cos(double), Pow(double, double), Random(void), Sin(double), Sqrt(double);
Log10
static struct {
const char *name;
double value;
} constants[] = {
{"PI", 3.14159265358979323846},
{"E", 2.71828182845904523536},
{NULL, 0},
};
static struct {
const char *name;
int args;
union {
double (*call0)(void);
double (*call1)(double);
double (*call2)(double, double);
};
} builtins[] = {
{"abs", 1, .call1 = Abs},
{"acos", 1, .call1 = Acos},
{"atan", 1, .call1 = Atan},
{"atan2", 2, .call2 = Atan2},
{"cos", 1, .call1 = Cos},
{"exp", 1, .call1 = Exp},
{"int", 1, .call1 = Integer},
{"lg", 1, .call1 = Lg},
{"ln", 1, .call1 = Ln},
{"log10", 1, .call1 = Log10},
{"pow", 2, .call2 = Pow},
{"rand", 0, .call0 = Random},
{"sin", 1, .call1 = Sin},
{"sqrt", 1, .call1 = Sqrt},
{NULL, 0, .call0 = NULL},
};
static struct {
const char *name;
int token_type;
} keywords[] = {
{"if", IF_KW},
{"else", ELSE_KW},
{"print", PRINT_KW},
{"while", WHILE_KW},
{NULL, 0},
};
extern void command_list(void);
static struct {
const char *name;
void (*command_func)(void);
} commands[] = {
{"LIST", command_list},
{NULL, 0},
};
void install_hoc_builtins(void) {
for (int i = 0; constants[i].name != NULL; i++) {
(constants[i].name, CONST, constants[i].value);
install_symbol}
for (int i = 0; builtins[i].name != NULL; i++) {
*s = install_symbol(builtins[i].name, BUILTIN, 0.0);
Symbol // TODO: Why not have separate install_value vs install_function?
->data.func.args = builtins[i].args;
sswitch (builtins[i].args) {
case 0:
->data.func.call0 = builtins[i].call0;
sbreak;
case 1:
->data.func.call1 = builtins[i].call1;
sbreak;
case 2:
->data.func.call2 = builtins[i].call2;
sbreak;
default:
(
fprintf, "Invalid arg count for symbol '%s'\n", builtins[i].name);
stderr(1);
exit}
}
for (int i = 0; keywords[i].name != NULL; i++) {
(keywords[i].name, keywords[i].token_type, 0.0);
install_symbol}
for (int i = 0; commands[i].name != NULL; i++) {
*s = install_symbol(commands[i].name, COMMAND, 0.0);
Symbol ->data.command_func = commands[i].command_func;
s}
}
#ifndef HOC_INC
#define HOC_INC 1
/*
* Global types & declarations for use in hoc
*/
#include <stddef.h>
///----------------------------------------------------------------
/// Symbols
///----------------------------------------------------------------
#define SYMBOL_NAME_LIMIT 100
typedef struct Symbol Symbol;
/** Function descriptor for builtins: these always return doubles */
struct BuiltinFunc {
int args;
union {
/* We use an anonymous union for the function pointer; there's
no meaningful name for the union as a field */
double (*call0)(void);
double (*call1)(double);
double (*call2)(double, double);
};
};
struct Symbol {
short type; // Generated from our token types
char *name;
*next;
Symbol union {
double val;
struct BuiltinFunc func;
void (*command_func)(void);
} data;
};
*install_symbol(const char *name, short type, double value);
Symbol *lookup(const char *name);
Symbol
///----------------------------------------------------------------
/// Machine Code
///----------------------------------------------------------------
typedef int (*Inst)(void);
typedef enum CodeType {
= 'I',
CT_INST = 'L',
CT_LITERAL = 'S',
CT_SYM_REF = 'A'
CT_ADDR } CodeType;
typedef struct MachineCode MachineCode;
typedef MachineCode *Addr;
struct MachineCode {
;
CodeType typeunion {
double literal;
;
Inst inst*symbol;
Symbol ;
Addr addr}; // anonymous union; no need to name it
};
///----------------------------------------------------------------
/// Machine Program
///----------------------------------------------------------------
/**
* Installs the given `MachineCode` object into the next available
* location in the machine's current program. A runtime error will
* occur if the program is full.
*
* Returns the address at which the code was installed.
*/
(MachineCode mc);
Addr install_code
/** The next available address in the machine's current program */
extern Addr progp;
#define STOP_INST NULL
#define is_stop_inst(mc) ((mc)->type == CT_INST && (mc)->inst == STOP_INST)
#define install_literal(lit) \
install_code((struct MachineCode){.type = CT_LITERAL, .literal = lit})
#define install_ref(sp) \
install_code((struct MachineCode){.type = CT_SYM_REF, .symbol = sp})
#define install_instruction(instr) \
install_code((struct MachineCode){.type = CT_INST, .inst = instr})
#define install_address(address) \
install_code((struct MachineCode){.type = CT_ADDR, .addr = address})
void execute(MachineCode *addr);
///----------------------------------------------------------------
/// Instructions
///----------------------------------------------------------------
int inst_add(void);
int inst_and(void);
int inst_assignvar(void);
int inst_assignconst(void);
int inst_call(void);
int inst_div(void);
int inst_eval(void);
int inst_eq(void);
int inst_gt(void);
int inst_gte(void);
int inst_ifcode(void);
int inst_lt(void);
int inst_lte(void);
int inst_mul(void);
int inst_neq(void);
int inst_not(void);
int inst_or(void);
int inst_pop(void);
int inst_prexpr(void);
int inst_print(void);
int inst_pushlast(void);
int inst_pushlit(void);
int inst_pushref(void);
int inst_sub(void);
int inst_whilecode(void);
///----------------------------------------------------------------
/// Commands
///----------------------------------------------------------------
void command_list(void);
void machine_run_command(Symbol *);
///----------------------------------------------------------------
/// Machine Stack
///----------------------------------------------------------------
typedef union Datum {
double value;
*symbol;
Symbol } Datum;
void machine_startup(void);
void machine_reset_program(void);
///----------------------------------------------------------------
/// Other
///----------------------------------------------------------------
/** global error handling */
((format(printf, 1, 2))) void warning(const char *msg, ...);
__attribute__((format(printf, 1, 2))) void exec_error(const char *msg, ...);
__attribute__
/** builtins */
void install_hoc_builtins(void);
#endif // HOC_INC
%{
///----------------------------------------------------------------
/// C Preamble
///----------------------------------------------------------------
/*
* "higher-order calculator" - Version 5
* Adapted from "The UNIX Programming Environment"
*/
#include <stdio.h>
#include <stdbool.h>
#include <ctype.h>
#include <signal.h>
#include <setjmp.h>
#include <stdarg.h>
#include "hoc.h"
///----------------------------------------------------------------
/// global state
///----------------------------------------------------------------
;
jmp_buf begin
///----------------------------------------------------------------
/// local declarations
///----------------------------------------------------------------
#define YACC_SUCCESS 0
int yylex(void);
void yyerror(const char *s);
void check_call_args(const Symbol *s, int actual);
/*
* Grammar points:
*
* - EXPRESSIONS provide a value on the top of the satck.
* - STATEMENTS are not expressions, and do not provide a value.
*/
%}
%union {
double hoc_value;
*hoc_symbol;
Symbol ;
Addr hoc_address}
%token <hoc_value> NUMBER
%token <hoc_symbol> VAR CONST BUILTIN UNDEF
%token <hoc_symbol> IF_KW ELSE_KW PRINT_KW WHILE_KW COMMAND
%type <hoc_symbol> assignable
%type <hoc_address> assign call end expr if stmt stmtlist while
%right '=' /* right-associative, much like C */
%left OR
%left AND
%left EQ GT GTE LT LTE NEQ
%left '+' '-' /* left-associative, same precedence */
%left '*' '/' /* left-assoc; higher precedence */
%right '^' /* exponents */
%left UNARY_MINUS NOT /* Even higher precedence
than mul or div. Can't use '-', as we've used
it already for subtraction. */
%%
: /* nothing */
list| list terminator
| list stmt terminator
{
(STOP_INST);
install_instruction;
YYACCEPT}
| list assign terminator
{
(inst_pop);
install_instruction(STOP_INST);
install_instruction;
YYACCEPT}
| list expr terminator
{
(inst_print);
install_instruction(STOP_INST);
install_instruction;
YYACCEPT}
| list error terminator { yyerrok; }
;
: expr { install_instruction(inst_pop); }
stmt| '{' stmtlist '}' { $$ = $2; }
| while cond stmt end
{
1[0].type = CT_INST;
$1[0].inst = inst_whilecode;
$1[1].type = CT_ADDR;
$1[1].addr = $3;
$1[2].type = CT_ADDR;
$1[2].addr = $4;
$}
| if cond stmt end
{
1[0].type = CT_INST;
$1[0].inst = inst_ifcode;
$1[1].type = CT_ADDR;
$1[1].addr = $3;
$1[3].type = CT_ADDR;
$1[3].addr = $4;
$}
| if cond stmt end ELSE_KW stmt end
{
1[0].type = CT_INST;
$1[0].inst = inst_ifcode;
$1[1].type = CT_ADDR;
$1[1].addr = $3;
$1[2].type = CT_ADDR;
$1[2].addr = $6;
$1[3].type = CT_ADDR;
$1[3].addr = $7;
$}
| PRINT_KW expr
{
(inst_prexpr);
install_instruction= $2;
$$ }
| COMMAND
{
($1);
machine_run_command= progp;
$$ }
;
: /* nothing */ { $$ = progp; }
stmtlist| stmtlist stmt terminator { $$ = $1; }
| stmtlist terminator { $$ = $1; }
;
while: WHILE_KW
{
= install_instruction(STOP_INST);
$$ (STOP_INST);
install_instruction(STOP_INST);
install_instruction}
;
if: IF_KW
{
= install_instruction(STOP_INST);
$$ (STOP_INST);
install_instruction(STOP_INST);
install_instruction(STOP_INST);
install_instruction}
;
: '(' expr ')'
cond{
(STOP_INST);
install_instruction}
;
: /* nothing */
end{
(STOP_INST);
install_instruction= progp;
$$ }
;
/*
* Every expr construct, when *executed*, must push a value onto the
* stack
*/
: NUMBER
expr{
= install_instruction(inst_pushlit);
$$ ($1);
install_literal}
| '@'
{
= install_instruction(inst_pushlast);
$$ }
| assignable
{
= install_instruction(inst_pushref);
$$ ($1);
install_ref(inst_eval);
install_instruction}
| assign
| call
| expr '+' expr { install_instruction(inst_add); }
| expr '-' expr { install_instruction(inst_sub); }
| expr '*' expr { install_instruction(inst_mul); }
| expr '/' expr { install_instruction(inst_div); }
| expr LT expr { install_instruction(inst_lt); }
| expr LTE expr { install_instruction(inst_lte); }
| expr GT expr { install_instruction(inst_gt); }
| expr GTE expr { install_instruction(inst_gte); }
| expr EQ expr { install_instruction(inst_eq); }
| expr NEQ expr { install_instruction(inst_neq); }
| expr AND expr { install_instruction(inst_and); }
| expr OR expr { install_instruction(inst_or); }
| expr '^' expr
{
(inst_pushref);
install_instruction(lookup("pow"));
install_ref(inst_call);
install_instruction}
| '-'expr %prec UNARY_MINUS /* prec makes us bind tighter than subtraction */
{
(inst_pushlit);
install_instruction(-1);
install_literal(inst_mul);
install_instruction=$2;
$$}
| NOT expr %prec UNARY_MINUS
{
(inst_not);
install_instruction=$2;
$$}
| '(' expr ')' { $$ = $2; }
;
: assignable '=' expr
assign{
(inst_pushref);
install_instruction($1);
install_ref(inst_assignvar);
install_instruction= $3; /* instructions begin with RHS */
$$ }
| assignable ':' '=' expr
{
(inst_pushref);
install_instruction($1);
install_ref(inst_assignconst);
install_instruction= $4; /* instructions begin with RHS */
$$ }
;
: BUILTIN '(' ')'
call{
($1, 0);
check_call_args= install_instruction(inst_pushref);
$$ ($1);
install_ref(inst_call);
install_instruction}
| BUILTIN '(' expr ')'
{
($1, 1);
check_call_args(inst_pushref);
install_instruction($1);
install_ref(inst_call);
install_instruction= $3; /* Block starts with pushing args */
$$ }
| BUILTIN '(' expr ',' expr ')'
{
($1, 2);
check_call_args= install_instruction(inst_pushref);
$$ ($1);
install_ref(inst_call);
install_instruction= $3; /* Block starts with pushing args */
$$ }
;
: VAR | CONST
assignable;
: '\n' | ';'
terminator;
%% // end of grammar
/* error tracking */
char *program_name;
int line_number = 1;
int main(int argc, char *argv[]) {
(void)argc;
= argv[0];
program_name ();
install_hoc_builtins();
machine_startup
(begin);
setjmp();
machine_reset_programwhile (yyparse() == YACC_SUCCESS && !feof(stdin)) {
(NULL);
execute();
machine_reset_program}
return 0;
}
/**
* "Looks ahead 1 character for a possibility. If it's found, returns
* `match`. If not:
* - places character back onto input stream for next read
* - returns the `otherwise` result.
*/
static int lookahead(char expect, int match, int otherwise) {
int c = getchar();
if (c == expect) {
return match;
} else {
(c, stdin);
ungetcreturn otherwise;
}
}
/* our simple, hand-rolled lexer. */
int yylex(void) {
static bool hit_eof = false;
int c;
// Skip space
do {
= getchar();
c } while (c == ' ' || c == '\t');
/* Newline / EOF handling:
* The first time we see an EOF, we "fake" a newline,
* so that the list production can recognize any in-progress
* statement. The next call will return the EOF.
*/
if (c == EOF && !hit_eof) {
(c, stdin);
ungetc= true;
hit_eof = '\n';
c }
if (c == '\n') {
++;
line_number} else {}
// Get numbers
if (c == '.' || isdigit(c)) {
(c, stdin);
ungetc("%lf", &yylval.hoc_value);
scanfreturn NUMBER;
}
if (isalpha(c)) {
*s;
Symbol
char buf[SYMBOL_NAME_LIMIT + 1];
size_t nread = 0;
do {
[nread++] = c;
buf= getchar();
c } while (nread < SYMBOL_NAME_LIMIT && (isalpha(c) || isdigit(c)));
// Just in case we exceeded the limit
while ((isalpha(c) || isdigit(c))) {
= getchar();
c }
// at this point, we have a non-alphanumeric 'c'
(c, stdin);
ungetc
[nread] = '\0';
bufif ((s=lookup(buf)) == NULL) {
= install_symbol(buf, UNDEF, 0.0);
s }
.hoc_symbol = s;
yylvalreturn (s->type == UNDEF ? VAR : s->type);
}
switch (c) {
case '<':
return lookahead('=', LTE, LT);
case '>':
return lookahead('=', GTE, GT);
case '=':
return lookahead('=', EQ, '=');
case '!':
return lookahead('=', NEQ, NOT);
case '|':
return lookahead('|', OR, '|');
case '&':
return lookahead('&', AND, '&');
default:
return c;
}
}
void yyerror(const char *s) {
("%s", s);
warning}
/*
* Verify Symbol's declared arg count matches actual,
* or display a helpful error message with mismatch
*/
void check_call_args(const Symbol *s, int actual) {
int expected = s->data.func.args;
if (expected != actual) {
("Wrong number of arguments for %s: expected %d, got %d",
exec_error->name, expected, actual);
s}
}
#define print_error_prefix() fprintf(stderr, "%s: ", program_name)
#define print_error_suffix() fprintf(stderr, " (on line %d)\n", line_number)
void warning(const char *msg, ...) {
va_list args;
(args, msg);
va_start
();
print_error_prefix(stderr, (msg), args);
vfprintf();
print_error_suffix
(args);
va_end}
void exec_error(const char *msg , ...) {
va_list args;
(args, msg);
va_start
();
print_error_prefix(stderr, (msg), args);
vfprintf();
print_error_suffix
(args);
va_end
(begin, 0);
longjmp}
#include "hoc.h"
#include "hoc.tab.h"
#include <assert.h>
#include <stdbool.h>
#include <stddef.h> // NULL
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
/*
* MACHINE STATE
* =============
*
* We have our "program" stored as a list of MachineCode
* objects, each encoding one of:
* - an instruction
* - a literal value
* - a symbol reference
*
* We "execute" our program by setting the program counter (pc)
* pointer to the first MachineCode (which *must* be an instruction),
* and handling each instruction we find. Within an instruction, the
* program counter might be updated so that by the *end* of any
* instruction, the program counter is always pointing to another
* instruction.
*
*
* The Stack
* ---------
*
* Each instruction manipulates the stack, pushing,
* popping, or combining values from the stack. The stack elements
* are Datum objects, which act much like MachineCodes, but with two
* differences.
*
* 1. Datums cannot represent an instruction; they resolve to a value
* 2. The type of a datum can always be resolved from their context,
* and so no "type" field is necessary.
*
* ┌────────────────────────────────────────────────────┐
* │ == PROG == == STACK == │
* │ ┌─────────────┐◀───┐ ┌─────────────┐ │
* │ │ MachineCode │ pc │ Datum │ │
* │ ├─────────────┤ ├─────────────┤ │
* │ │ MachineCode │ │ Datum │ │
* │ ├─────────────┤ ├─────────────┤ │
* │ │ MachineCode │ progp │ Datum │ stackp │
* │ ├─────────────┤◀────┘ ├─────────────┤ ◀──┘ │
* │ │ ... │ │ ... │ │
* │ └─────────────┘ └─────────────┘ │
* └────────────────────────────────────────────────────┘
*
*/
#define PROGSIZE 2000
static MachineCode prog[PROGSIZE];
; // Next free location for code installation
Addr progpstatic Addr pc; // the current location in the executing program
static Addr highest_addr_written = prog; // Used in LISTing contents
// stack
#define STACKSIZE 256
static Datum stack[STACKSIZE]; // the machine stack
static Datum *stackp; // next free spot on the stack
static Datum last_popped = {.value = 0}; // The last value popped
static void stack_push(Datum d);
static Datum stack_pop(void);
/** Initialize machine state for startup */
void machine_startup(void) {
(time(NULL)); // use current time as seed for random generator
srand();
machine_reset_program}
/** Initialize execution state for a new program */
void machine_reset_program(void) {
= stack;
stackp = prog;
progp }
///----------------------------------------------------------------
/// Code Installation
///----------------------------------------------------------------
(MachineCode mc) {
Addr install_codeif (progp >= prog + PROGSIZE) {
("Max program size exceeded");
exec_error}
if (progp > highest_addr_written) {
= progp;
highest_addr_written }
*progp = mc;
return progp++; // return location of THIS instruction
}
///----------------------------------------------------------------
/// Execution
///----------------------------------------------------------------
void execute(MachineCode *addr) {
= (addr == NULL) ? prog : addr;
pc
while (!is_stop_inst(pc)) {
if (pc >= progp) {
("PC was past end-of-program");
exec_error}
if (pc->type != CT_INST) {
("Programming error, type '%c' was not INST!", pc->type);
exec_error}
(*(*pc++).inst)(); // increment PC and execute inst
}
}
/** Push new value onto stack */
static void stack_push(Datum d) {
if (stackp >= stack + STACKSIZE) {
("Stack overflow");
exec_error}
*(stackp++) = d; // copy
}
static Datum stack_pop(void) {
if (stackp <= stack) {
("Stack underflow");
exec_error}
= *(--stackp);
Datum d return (last_popped = d);
}
///----------------------------------------------------------------
/// Instructions
///----------------------------------------------------------------
#define dat(x) ((Datum){.value = x})
int inst_pushlit(void) {
(pc->type == CT_LITERAL);
assert
;
Datum d.value = pc->literal;
d(d);
stack_push++;
pcreturn 0;
}
int inst_pushref(void) {
(pc->type == CT_SYM_REF);
assert
;
Datum d.symbol = pc->symbol;
d(d);
stack_push++;
pcreturn 0;
}
int inst_pushlast(void) {
(last_popped);
stack_pushreturn 0;
}
int inst_eval(void) {
= stack_pop(); // must be symbol
Datum d switch (d.symbol->type) {
case UNDEF:
("Undefined variable: '%s'", d.symbol->name);
exec_errorbreak;
case BUILTIN:
("Builtin '%s' cannot be evaluated", d.symbol->name);
exec_errorbreak;
default: // VAR,CONST
(dat(d.symbol->data.val));
stack_pushbreak;
}
return 0;
}
int inst_whilecode(void) {
/*
┌─────────────────────────────┐
│ whilecode │
├─────────────────────────────┤
│ address(body) │
├─────────────────────────────┤
│ address(stmt after body) │
├─────────────────────────────┤
│ cond │
│ .. │
├─────────────────────────────┤
│ STOP │
├─────────────────────────────┤
│ │
│ body │
│ .... │
├─────────────────────────────┤
│ STOP │
├─────────────────────────────┤
│ stmt after body │
└─────────────────────────────┘
*/
= (pc++)->addr;
Addr body_addr = (pc++)->addr;
Addr end_addr = pc;
Addr cond_addr
while (true) {
(cond_addr);
executeif (stack_pop().value) { // guaranteed: conditions are expressions
(body_addr);
execute} else {
break;
}
}
= end_addr;
pc return 0;
}
int inst_ifcode(void) {
/*
┌─────────────────────────────┐
│ ifcode │
├─────────────────────────────┤
│ address(ifbody) │
├─────────────────────────────┤
│ address(elsebody) │
├─────────────────────────────┤
│ address(stmt after if) │
├─────────────────────────────┤
│ cond │
│ .. │
├─────────────────────────────┤
│ STOP │
├─────────────────────────────┤
│ │
│ ifbody │
│ .... │
├─────────────────────────────┤
│ STOP │
├─────────────────────────────┤
│ │
│ elsebody │
│ .... │
├─────────────────────────────┤
│ STOP │
├─────────────────────────────┤
│ stmt after if │
└─────────────────────────────┘
*/
= pc[0].addr;
Addr ifbody_addr = pc[1];
MachineCode maybe_elsebody = pc[2].addr;
Addr end_addr = pc + 3;
Addr cond_addr
(cond_addr);
execute= stack_pop(); // must be value
Datum d if (d.value) {
(ifbody_addr);
execute} else if (maybe_elsebody.type == CT_ADDR) {
(maybe_elsebody.addr);
execute}
= end_addr;
pc return 0;
}
int inst_call(void) {
= stack_pop(); // must be symbol
Datum s if (s.symbol->type != BUILTIN) {
("Cannot call non-function '%s'", s.symbol->name);
exec_error}
double result;
switch (s.symbol->data.func.args) {
case 0:
= s.symbol->data.func.call0();
result break;
case 1: {
= stack_pop();
Datum arg = s.symbol->data.func.call1(arg.value);
result break;
}
case 2: {
= stack_pop();
Datum arg2 = stack_pop();
Datum arg1 = s.symbol->data.func.call2(arg1.value, arg2.value);
result break;
}
}
(dat(result));
stack_pushreturn 0;
}
int inst_assignvar(void) {
= stack_pop(); // must be symbol
Datum lhs = stack_pop(); // must be value
Datum rhs if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
("Cannot assign to symbol '%s'", lhs.symbol->name);
exec_error}
.symbol->data.val = rhs.value;
lhs.symbol->type = VAR;
lhs(dat(rhs.value));
stack_pushreturn 0;
}
int inst_assignconst(void) {
= stack_pop(); // must be symbol
Datum lhs = stack_pop(); // must be value
Datum rhs if (lhs.symbol->type == CONST) {
("Cannot reassign constant '%s'", lhs.symbol->name);
exec_error} else if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
("Cannot assign to symbol '%s'", lhs.symbol->name);
exec_error}
.symbol->data.val = rhs.value;
lhs.symbol->type = CONST;
lhs(dat(rhs.value));
stack_pushreturn 0;
}
int inst_pop(void) {
();
stack_popreturn 0;
}
int inst_print(void) {
= stack_pop();
Datum d ("\t%.8g\n", d.value);
printfreturn 0;
}
int inst_prexpr(void) {
= stack_pop();
Datum d ("%.8g\n", d.value);
printfreturn 0;
}
int inst_add(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value + rhs.value));
stack_pushreturn 0;
}
int inst_and(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs (dat(lhs.value && rhs.value));
stack_pushreturn 0;
}
int inst_or(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs (dat(lhs.value || rhs.value));
stack_pushreturn 0;
}
int inst_not(void) {
= stack_pop(); // must be value
Datum d .value = !d.value;
d(d);
stack_pushreturn 0;
}
int inst_lt(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value < rhs.value));
stack_pushreturn 0;
}
int inst_lte(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value <= rhs.value));
stack_pushreturn 0;
}
int inst_gt(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value > rhs.value));
stack_pushreturn 0;
}
int inst_gte(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value >= rhs.value));
stack_pushreturn 0;
}
int inst_eq(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value == rhs.value));
stack_pushreturn 0;
}
int inst_neq(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value != rhs.value));
stack_pushreturn 0;
}
int inst_sub(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value - rhs.value));
stack_pushreturn 0;
}
int inst_div(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
if (rhs.value == 0) {
("Division by zero");
exec_error} else {
(dat(lhs.value / rhs.value));
stack_push}
return 0;
}
int inst_mul(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs (dat(lhs.value * rhs.value));
stack_pushreturn 0;
}
///----------------------------------------------------------------
/// Commands
///----------------------------------------------------------------
void machine_run_command(Symbol *command_symbol) {
->data.command_func();
command_symbol}
#define check_inst(inst_name) \
else if (c.inst == inst_##inst_name) do { \
printf(inst_template, line, (void *)(prog + i), "INST", #inst_name); \
} \
while (false)
void command_list(void) {
// Lists contents of prog until we hit two stop instructions in a row
#define header_template "%6s | %11s | %10s | %-30s\n"
#define inst_template "%6ld | %11p | %10s | %-30s"
#define rightline_template "%6ld | %11p | %10s | %30s"
#define lit_template "%6ld | %11p | %10s | %30f"
#define addr_template "%6ld | %11p | %10s | %11p (Line %3ld)"
(header_template, "LINE", "ADDRESS", "TYPE", "CODE");
printf("------------------------------------------------------------------"
printf"\n");
for (size_t i = 0; prog + i <= highest_addr_written; i++) {
size_t line = i + 1;
= prog[i];
MachineCode c void *cp = &prog[i];
switch (c.type) {
case (CT_LITERAL):
(lit_template, line, cp, "L", c.literal);
printfbreak;
case (CT_SYM_REF):
(rightline_template, line, cp, "SYM", c.symbol->name);
printfbreak;
case (CT_ADDR): {
size_t addr_line = (c.addr - prog) + 1;
(addr_template, line, cp, "ADDR", (void *)c.addr, addr_line);
printfbreak;
}
case (CT_INST):
if (is_stop_inst(&c)) {
(inst_template,
printf,
line,
cp"INST",
"------------ STOP ------------");
}
(add);
check_inst(and);
check_inst(assignvar);
check_inst(assignconst);
check_inst(call);
check_inst(div);
check_inst(eq);
check_inst(eval);
check_inst(gt);
check_inst(gte);
check_inst(ifcode);
check_inst(lt);
check_inst(lte);
check_inst(mul);
check_inst(neq);
check_inst(not );
check_inst(or);
check_inst(pop);
check_inst(prexpr);
check_inst(print);
check_inst(pushlast);
check_inst(pushlit);
check_inst(pushref);
check_inst(sub);
check_inst(whilecode);
check_instelse {
(inst_template, line, cp, "INST", "UNKNOWN!?");
printf}
break;
default:
("Unknown Code Type: %c\n", c.type);
printf}
("\n");
printf}
}
#include "hoc.h"
#include <errno.h>
#include <fenv.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h> // rand
#define HOC_RAND_MAX 1000000L
static double check_math_err(double result, const char *op);
double Abs(double x) {
return check_math_err(fabs(x), "fabs");
}
double Acos(double x) {
return check_math_err(acos(x), "acos");
}
double Atan(double x) {
return check_math_err(atan(x), "atan");
}
double Atan2(double x, double y) {
return check_math_err(atan2(x, y), "atan2");
}
double Cos(double x) {
return check_math_err(cos(x), "cos");
}
double Exp(double x) {
return check_math_err(exp(x), "exp");
}
double Integer(double x) {
return (double)(long)x;
}
double Lg(double x) {
return check_math_err(log2(x), "lg");
}
double Ln(double x) {
return check_math_err(log(x), "log");
}
double Log10(double x) {
return check_math_err(log10(x), "log10");
}
double Pow(double x, double y) {
return check_math_err(pow(x, y), "exponentiation");
}
double Random(void) {
// We have HOC_RAND_MAX buckets
const int bucket_size = RAND_MAX / HOC_RAND_MAX;
int result;
do {
= rand() / bucket_size;
result } while (result >= HOC_RAND_MAX);
return result + 1;
}
double Sin(double x) {
return check_math_err(sin(x), "sin");
}
double Sqrt(double x) {
return check_math_err(sqrt(x), "sqrt");
}
static double check_math_err(double result, const char *op) {
static const char *domain_msg = "argument outside domain";
static const char *range_msg = "result outside range";
static const char *other_msg = "floating-point exception occurred";
const char *msg = NULL;
if ((math_errhandling & MATH_ERREXCEPT) && fetestexcept(FE_ALL_EXCEPT)) {
// Special case: Inexact results are not errors.
if (fetestexcept(FE_INEXACT)) {
goto done;
}
if (fetestexcept(FE_INVALID)) {
= domain_msg;
msg } else if (fetestexcept(FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW)) {
= range_msg;
msg } else { // unknown
= other_msg;
msg }
(FE_ALL_EXCEPT);
feclearexcept} else if (errno) {
if (errno == EDOM) {
= domain_msg;
msg } else if (errno == ERANGE) {
= range_msg;
msg } else {
= other_msg;
msg }
= 0;
errno }
if (msg != NULL) {
("math error during %s: %s", op, msg);
exec_error}
:
donereturn result;
}
#include "hoc.h"
#include "hoc.tab.h" // generated from yacc -d on our grammar
#include <stddef.h> // NULL
#include <stdlib.h> // malloc
#include <string.h> // strcmp
static Symbol *symbol_table = NULL;
void *emalloc(size_t nbytes);
*lookup(const char *name) {
Symbol *current = symbol_table;
Symbol while (current != NULL) {
if (strcmp(current->name, name) == 0) {
return current;
}
= current->next;
current }
return NULL;
}
*install_symbol(const char *name, short type, double value) {
Symbol size_t name_len = strlen(name);
*s = emalloc(sizeof(Symbol) // actual symbol
Symbol + (sizeof(char)) * (name_len + 1)); // room for name
->name = (char *)(s + 1);
s(s->name, name);
strcpy->name[name_len] = '\0';
s
->type = type;
s->data.val = value;
s->next = symbol_table;
s= s;
symbol_table return s;
}
void *emalloc(size_t nbytes) {
void *ptr = malloc(nbytes);
if (ptr == NULL) {
("Out of memory!");
exec_error}
return ptr;
}