hoc5 builds on the VM model constructed for
hoc4, enabling more complex constructs like
if/else branching and while loops. As we’ll
see, these flow-control constructs will be more useful if we can also
perform some limited output, so we’ll add a print keyword
as well.
These constructs are the first step to executing larger programs,
instead of the single-line “programs” executed by hoc4.
We’ll begin with an overview of these new features to better understand
how or implementation might change.
The first feature we’ll need for conditional logic is a set of
relational operators, so that we can compare two
values. We’ll add the standard set of ==, !=, <, <=, >, and >=.
These operators, like our arithmetic operators, will be compiled into
instructions which will pop two values from the stack, then push the
value 1 if the given comparison is true, and 0
otherwise.
The boolean operators, && and ||, operate the same way. In
hoc5, unlike most languages, we’ll always evaluate both
operands, instead of short-circuiting. We do want to inherit
C’s approach to boolean precedence, however. Namely, the boolean
operators should have a very low precedence, so that
expressions like:
x < y && y < z
Are evaluated as:
Instead of:
The hoc language up to this point is meant for
line-by-line interactions. The user enters an expression, and
hoc prints a response. This “calculator-style” of interaction does not
work as well once we introduce conditional logic. Consider:
x = 10
y = 1
while (x > 0) {
y = y * x
x = x - 1
}
lg(y)hoc code for lg(10!)
The first two lines of this example will execute correctly in
hoc4. Each one will be parsed into a tiny “assignment”
program, and execute()ed in the virtual machine.
The while loop, however, presents some brand-new
challenges. There are at least 3 hoc4 “programs” we can see
inside this snippet of code.
Furthermore, some of these programs will be executed a variable number of times:
Certainly we must always execute the instructions to evaluate the condition, since that will determine whether to enter the loop body or not.
Our loop body, and its multiple internal expressions, will only execute if the condition is true. And if it’s true, we should return to execute it again after the body.
Finally, once the condition evaluates to false, then we must skip execution of the body, and move to the instructions after the loop body.
Obviously, our parser cannot know how many iterations of the loop we will run, so the machine must be able to “jump” between the condition, body, and post-body sections on its own.
We can visualize its behavior as follows:
We’ll need a new set of instructions for this; branching
instructions, which allow us to move our pc to a
new value and continue executing.
We’ll still need some help from the parser, since our
machine needs to be told what address in the program to
jump to. We can figure this out at parsing time, but only
after the whole loop is parsed and all of its code is
installed. So, we’ll need to be able to install the
entire while construct, including the conditional
expression and loop body, before we execute any of it.
This will require some structural changes to our grammar.
For the most part, implementing relational operations is a
straightforward extension of our current arithmetic operations. There is
a small additional challenge regarding their syntax: the operators
themselves may be 1 or 2 characters,
and their prefixes overlap. For example, we must be able to distinguish
< from <=, regardless of whitespace and
surrounding digits/letters, e.g. 2 <=5 vs 2< 5.
The relational operations are left-associative, just like arithmetic.
They are set to a lower priority than any arithmetic operation,
since we want x < y + 2 to mean
what we would expect. Similarly, we want most operations to be complete
before evaluating the OR and AND operators.
%type <hoc_symbol> assignable
%right '=' /* right-associative, much like C */
+ %left OR
+ %left AND
+ %left EQ GT GTE LT LTE NEQ
%left '+' '-' /* left-associative, same precedence */
%left '*' '/' /* left-assoc; higher precedence */
%left UNARY_MINUS NOT /* Prefix operators have highest priority*Next, at the end of the lexing code, we add logic to handle the case where the character we’ve read matches more than one operator.
/* our simple, hand-rolled lexer. */
int yylex(void) {
static bool hit_eof = false;
int c;
// Skip space
// Get numbers
...
if (isalpha(c)) { // Symbols
...
+ switch (c) {
+ case '<':
+ return lookahead('=', LTE, LT);
+ case '>':
+ return lookahead('=', GTE, GT);
+ case '=':
+ return lookahead('=', EQ, '=');
+ case '!':
+ return lookahead('=', NEQ, NOT);
+ case '|':
+ return lookahead('|', OR, '|');
+ case '&':
+ return lookahead('&', AND, '&');
+ default:
+ return c;
+ }
+ }
+The lookahead function “peeks” at the next character in
our input stream. If it matches a two-character operator, then we
consume both characters and return the corresponding token. Otherwise,
we put the the character back onto the front of the input stream for the
next lexing pass, and return the otherwise token.
/**
* "Looks ahead 1 character for a possibility. If it's found, returns
* `match`. If not:
* - places character back onto input stream for next read
* - returns the `otherwise` result.
*/
static int lookahead(char expect, int match, int otherwise) {
int c = getchar();
if (c == expect) {
return match;
} else {
ungetc(c, stdin);
return otherwise;
}
}The relational operations fit nicely into our existing
expr productions. We’ll need to add references to the new
instructions into hoc.h as well.
expr: NUMBER
...
| expr '+' expr { install_instruction(inst_add); }
| expr '-' expr { install_instruction(inst_sub); }
| expr '*' expr { install_instruction(inst_mul); }
| expr '/' expr { install_instruction(inst_div); }
+ | expr LT expr { install_instruction(inst_lt); }
+ | expr LTE expr { install_instruction(inst_lte); }
+ | expr GT expr { install_instruction(inst_gt); }
+ | expr GTE expr { install_instruction(inst_gte); }
+ | expr EQ expr { install_instruction(inst_eq); }
+ | expr NEQ expr { install_instruction(inst_neq); }
+ | expr AND expr { install_instruction(inst_and); }
+ | expr OR expr { install_instruction(inst_or); }
...Each instruction operates just like arithmetic; the operands are already on the stack, in reverse order. We pop both, perform our operation, and push the result.
The <= operator is below;
the rest are similar.
int inst_lte(void) {
Datum rhs = stack_pop(); // must be value
Datum lhs = stack_pop(); // must be value
stack_push(dat(lhs.value <= rhs.value));
return 0;
}You should now be able to test simple relational expressions, such as
1 <= 0 or 42 > 13 && 0 < 1.
There are several changes we’ll need to make to the language to incorporate flow-control constructs. We’ll add the functionality bit-by-bit, solving one problem at a time.
Before we do anything else, we have one easy requirement: we’ll need
to be able to recognize some keywords for our language
now; hoc5 requires all of: while,
if, else, and print.
Luckily, we already have a mechanism for matching multi-character
tokens in our language; the symbol table. We will extend our
approach to BUILTIN tokens to handle keywords as well.
First, we’ll need to setup tokens for each of the keywords:
%token <hoc_value> NUMBER
%token <hoc_symbol> VAR CONST BUILTIN UNDEF
+%token <hoc_symbol> IF_KW ELSE_KW PRINT_KW WHILE_KW
%type <hoc_symbol> assignableThen, just as we did for our builtin constants and functions, we
create a small array of “input data” structs for use during
install_hoc_builtins, and install them the same way.
+static struct {
+ const char *name;
+ int token_type;
+} keywords[] = {
+ {"if", IF_KW},
+ {"else", ELSE_KW},
+ {"print", PRINT_KW},
+ {"while", WHILE_KW},
+ {NULL, 0},
+};
void install_hoc_builtins(void) {
for (int i = 0; constants[i].name != NULL; i++) {
install_symbol(constants[i].name, CONST, constants[i].value);
}
+
+ for (int i = 0; keywords[i].name != NULL; i++) {
+ install_symbol(keywords[i].name, keywords[i].token_type, 0.0);
+ }
for (int i = 0; builtins[i].name != NULL; i++) {
...The next problem we’ll tackle actually occurs in the “middle” of our loop constructs. Consider the factorial example from above:
x = 10
y = 1
while (x > 0) {
y = y * x
x = x - 1
}
Let’s focus on the body of this loop. We know that we will need to
parse the entire body before we execute anything. Since our current
grammar stops after every terminated expression, there’s no way for our
current grammar to parse the body before executing it. We’ll need a new
construct for these situations. We could try to add a new kind of
expression, such as loop. Then we could match multiple
“sub-expressions” inside our loop body, using the same kind of
production that we use for our REPL list nonterminal.
e.g.
expr: ...
| while '(' expr ')' '{' loop_body '}'
loop_body: /* nothing */
| loop_body expr terminator
loop_body terminator
One possibility: creating a loop expression
There are two questions to answer here:
expr in loop_body?We could certainly account for the first question, by using our action to remove each line’s expression from the stack. The second question raises more new questions than it answers; for example, if this construct is an expression, does that mean we can use it as the right-side of an assignment?
For the following example, ask yourself:
z have?x = 10
y = 1
z = while (x > 0) {
y = y * x
x = x - 1
}
While this is perfectly acceptable in some languages, for the sake of adhering to C-like language constructs, (and consistency with UPE), we’ll decide that no, we will not allow this. Instead, we will distinguish expressions, which leave a value on the top of the machine stack, from statements, which do not.
Since we’ll want to combine expressions and statements together in our loop bodies, we’ll say that every expression may be used as a statement, leading us to the following productions:
stmt: expr { install_instruction(inst_pop); }
| '{' stmtlist '}
stmtlist: /* nothing */
| stmtlist stmt terminator
| stmtlist terminatorOur stmt: expr prodcution
allows us to use any expression in our loop bodies, and the stmtlist productions allow as many
statements inside braces as we want, including empty lines.
We must also make sure that it’s acceptable for statements to appear
as top-level entries in our list production. Unlike
expressions, there is no value to print, so our action will merely be to
STOP.
list: /* nothing */
| list terminator
+ | list stmt terminator
+ {
+ install_instruction(STOP_INST);
+ YYACCEPT;
+ }
| list assign terminator
{
install_instruction(inst_pop);
install_instruction(STOP_INST);
YYACCEPT;
}STOP
after a stmt?
execute()
function might continue to execute instructions installed by older
statements / expressions.
Sadly, this will add another shift/reduce conflict to our compilation
output. (stmtlist and list both match
terminator-separated statements).
If you’re following along, the following program should now parse correctly; notice that once you type the opening brace, the REPL will not evaluate or print anything until you close it and write an expression on a new line.
{
x = 12
y = 14
z = x + y
}
zTesting statements
Next, let’s add the infrastructure required for specifying and “jumping to” a specific location in our VM program.
In order for our language to include flow control, there must be the concept of an instruction’s address within the current program, and the ability for the machine to change its program counter to a given address.
To represent addresses, we add an Addr typedef, which
simply points to a location in our prog array; that is, a
MachineCode pointer. So that we can use the
Addr type inside our MachineCode definition as
well, we need to forward-declare the MachineCode
struct, leading to a small rework of the declaration.
+ typedef struct MachineCode MachineCode;
+ typedef MachineCode *Addr;
- typedef struct MachineCode {
+ struct MachineCode {
+ CodeType type;
+ union {
+ double literal;
+ Inst inst;
+ Symbol *symbol;
+ Addr addr;
+ }; // anonymous union; no need to name it
- } MachineCode;
+ };
+In our diagram of a compiled while statement, we see
several GO actions that seem like they would need
addresses.
Thus, we’ll need to be able to embed an address as
part of an instruction, so we need to allow our MachineCode
objects to hold addresses.
...
typedef int (*Inst)(void);
typedef enum CodeType {
CT_INST = 'I',
CT_LITERAL = 'L',
CT_SYM_REF = 'S',
+ CT_ADDR = 'A',
} CodeType;
struct MachineCode {
CodeType type;
union {
double literal;
Inst inst;
Symbol *symbol;
+ Addr addr;
};
};In order to embed addresses into our instructions, they will need to
be available at parse time. Luckily, since
hoc4 moved value computation from the parser to the
machine, our expressions no longer produced any value for use by other
actions. Now that larger statements will need to use the addresses of
child statements, we can update our productions to ensure that each
expression and statement will return the address at which its
parsed code block starts.
First, we need to enable the machine to report its current “installation address” through its code-installation interface. The parsing routines can use these addresses to report where they’ve installed code.
/**
* Installs the given `MachineCode` object into the next available
* location in the machine's current program. A runtime error will
* occur if the program is full.
+ *
+ * Returns the address at which the code was installed.
*/
- void install_code(MachineCode mc);
+ Addr install_code(MachineCode mc);
+ /** The next available address in the machine's current program */
+ extern Addr progp;The machine implementation changes only slightly; we expose
progp externally, and update install_code to
return the installation address:
-static MachineCode *progp; // Next free location for code installation
+Addr progp; // Next free location for code installation
...
-void install_code(MachineCode mc) {
+Addr install_code(MachineCode mc) {
if (progp >= prog + PROGSIZE) {
exec_error("Max program size exceeded");
}
- *(progp++) = mc;
+ *progp = mc;
+ return progp++; // return location of THIS instruction
}Next, we can update our grammar to require every expression and
statement to report their start address. This means any
nonterminals used by expr, such as assign or
call, will also return addresses.
%union {
double hoc_value;
Symbol *hoc_symbol;
+ Addr hoc_address;
}
...
+ %type <hoc_address> assign expr call end stmt stmtlistMost of the rules are straightforward; for example,
NUMBER and VAR expressions just return the
address of the first instruction they install.
expr: NUMBER
{
- install_instruction(inst_pushlit);
$$ = install_instruction(inst_pushlit);
install_literal($1);
}
| '@'
{
- install_instruction(inst_pushlast);
+ $$ = install_instruction(inst_pushlast);
}
| assignable
{
- install_instruction(inst_pushref);
+ $$ = install_instruction(inst_pushref);
install_ref($1);
install_instruction(inst_eval);
}Returning addresses of numbers and variables hoc5/hoc.y
Surprisingly, our arithmetic, relational, and boolean operator
actions require no changes at all. Why? The default value produced by an
action, if we don’t specify one, is $1; that is, all actions act as though
they begin with $$ = $1.
When we are handling the operator productions, we’ve already processed the operand expressions. Since all arithmetic/relational/boolean operators are left-associative, their left-hand expression is evaluated first, and it will produce the address corresponding to pushing its value onto the stack.
In the example below, the expression 2 * PI will return the PROG
address where we start generating that value. In this case,
it’ll produce 10. Since this is exactly what we want,
we can leave the default behavior.
Notice that this behavior is why our machine expects math-operation arguments in right-to-left order. Since the left-hand side is produced first, it’s lower on the stack!
Like our other operators, by the time we are reducing an assignment, we’ve already installed instructions to generate the left and right sides. Unlike the other operators, assignment is right-associative, and so the first instruction executed to generate the assignment actually comes from generating the right-side operand.
assign: assignable '=' expr
{
install_instruction(inst_pushref);
install_ref($1);
install_instruction(inst_assignvar);
+ $$ = $3; /* instructions begin with RHS */
}
| assignable ':' '=' expr
{
install_instruction(inst_pushref);
install_ref($1);
install_instruction(inst_assignconst);
+ $$ = $4; /* instructions begin with RHS */
}Similarly, the call actions must return the address
corresponding to producing their first argument
We can now fill in the actions for our stmt productions.
For a single-expression statement, we have two jobs:
For a statement list, we have a sequence of zero or more statements. Each statement removes its own stack item, so there’s no need to add any more instructions. But for an empty block, we have no instructions at all; in this case, we’ll produce the address following the block–the next free program address.
stmt: expr { install_instruction(inst_pop); }
| '{' stmtlist '}' { $$ = $2; }
stmtlist: /* nothing */ { $$ = progp; }
| stmtlist stmt terminator { $$ = $1; }
| stmtlist terminator { $$ = $1; }whilecodeBefore we write the parsing logic for while-loop syntax, let’s figure out the structure of the machine instructions, so we have an idea of our “goal output” for the parser.
Our “high-level plan” for while loops was:
This is a complex set of behaviors, and in most real machines, is
achieved by installing “jumping” instructions at appropriate places in
the execution flow. We’ll examine that approach in adhoc,
but for hoc5, we’ll create a “super-instruction” that
handles the looping mechanism itself, directly in C:
whilecode.
whilecode instructionYou can think of the whilecode instruction as being a
“variable-length” instruction, which includes the instructions for the
conditional expression and the loop body.
The whilecode instruction itself is immediately followed
by our two addresses: one for the loop body and the
other immediately following the body.
After the addresses, we have our condition.
Conditions are allowed to be any expression, so we must
be able to run all of the instructions that make up the condition, and
then stop as soon as we’ve produced the result. We already have
a mechanism for doing that: the execute()
method. Whwn we execute a whilecode instruction,
we’ll treat the condition like a tiny program of its own; we’ll
call execute() on its address!
This call will simply run all of the instructions from the beginning of
cond until it reaches a STOP instruction. (We
should be sure to install this while parsing the cond).
Likewise, we’ll need to install a STOP instruction after
the loop body, so we can again evaluate the condition.
whilecodeAssuming we will eventually install this structure into our machine,
we are finally ready to write our execution flow for loops. Upon
entering the instruction, our PC is already pointing to the
machine code for the body’s address.
int inst_whilecode(void) {
Addr body_addr = (pc++)->addr;
Addr end_addr = (pc++)->addr;
Addr cond_addr = pc;
while (true) {
execute(cond_addr);
if (stack_pop().value) { // guaranteed: conditions are expressions
execute(body_addr);
} else {
break;
}
}
pc = end_addr;
return 0;
}While we use recursion to execute our loop, we don’t recurse to leave the loop, since that could quickly lead to infinite recursion.
while statementsThe syntax of a while statement is familiar:
while (conditional-expression) {
body
}while (conditional-expression) bodyWe’ll need to parse that syntax into the whilecode
instruction we’ve designed.
In particular, we’ll need to install:
whilecode instruction itselfSTOPSTOPThis is complicated somewhat by the fact that we won’t know the addresses of the body / post-body instructions until after the body is parsed. To ensure our parse behaviors happen in the appropriate order, we create several “sub-rules” to perform actions at the appropriate times in the parsing process.
while keyword, we
should reserve space for the whilecode instruction and
its addresses. This implies we’ll need a rule just for that keywordSTOP instructionBelow you can see the components of our while loop, and how they
correspond to the installed MachineCode.
while KeywordFirst off, as soon as we see the while keyword, we
should reserve space inside prog for the
whilecode instruction and its addresses. This reduction
occurs before any other part of the loop is parsed, so these 3 words of
program memory come first.
+while: WHILE_KW
+ {
+ $$ = install_instruction(STOP_INST);
+ install_instruction(STOP_INST); // body-addr
+ install_instruction(STOP_INST); // after-body-addr
+ }Don’t forget to mark the whilecode address as the
produced value! Our while loops will be statements, so they
must produce their starting address.
- %type <hoc_address> assign call expr stmt stmtlist
+ %type <hoc_address> assign call expr stmt stmtlist whilecond NonterminalThe “conditional” part of our loop is just an expression; the only
detail we must take care of is installing a STOP
instruction before the loop-body instructions are generated. (Since the
condition’s instructions always start two words after the
whilecode instruction, there’s no need to report it, so we
don’t return anything from a cond nonterminal.)
%%
+cond: '(' expr ')'
+ {
+ install_instruction(STOP_INST);
+ }STOPs:
The end ruleA loop body is just a stmt, but like our condition, we
must STOP when it’s complete. We need another
STOP instruction after the stmt. We use an
end nonterminal to do so, as well as to generate the
address following the body.
- %type <hoc_address> assign call expr stmt stmtlist while
+ %type <hoc_address> assign call end expr stmt stmtlist whileend is a tiny “helper” used to generate the next
available address; this is useful for cases where we might not
have a next statement.
end: /* nothing */
{
install_instruction(STOP_INST);
$$ = progp;
}while statementWe can finally combine all of these nonterminals to create our
while statement structure. At the point we’re reducing this
rule, all addresses have been completely resolved, and so it’s time to
install them into the prog slots reserved for us by the
while nonterminal.
stmt: expr { install_instruction(inst_pop); }
| '{' stmtlist '}' { $$ = $2; }
+ | while cond stmt end
+ {
+ $1[0].type = CT_INST;
+ $1[0].inst = inst_whilecode;
+ $1[1].type = CT_ADDR;
+ $1[1].addr = $3;
+ $1[2].type = CT_ADDR;
+ $1[2].addr = $4;
+ }Because the if/else statements are so similar to loops, we don’t need to exhaustively cover their implementation. The installed structure is very similar:
if/else
statementsFirst, the if statement produces an address:
- %type <hoc_address> assign call end expr stmt stmtlist while
+ %type <hoc_address> assign call end expr if stmt stmtlist whileIt also reserves space for its addresses, just like
while:
if: IF_KW
{
$$ = install_instruction(STOP_INST);
install_instruction(STOP_INST);
install_instruction(STOP_INST);
install_instruction(STOP_INST);
}Because our else clause is optional, we include two
forms in our grammar; they install the same set of addresses , but the
if-only variant leaves its else-body address
set to the STOP instruction.
stmt: expr { install_instruction(inst_pop); }
...
| if cond stmt end
{
$1[0].type = CT_INST;
$1[0].inst = inst_ifcode;
$1[1].type = CT_ADDR;
$1[1].addr = $3;
$1[3].type = CT_ADDR;
$1[3].addr = $4;
}
| if cond stmt end ELSE_KW stmt end
{
$1[0].type = CT_INST;
$1[0].inst = inst_ifcode;
$1[1].type = CT_ADDR;
$1[1].addr = $3;
$1[2].type = CT_ADDR;
$1[2].addr = $6;
$1[3].type = CT_ADDR;
$1[3].addr = $7;
}Notice that our syntax has relatively strict requirements for the locations of the braces:
// ALLOWED
x = 42
while (x > 0) x = x - 1
// OR
while (x > 0) {
x = x - 1
print x
}
// NOT ALLOWED
while (x > 0)
{
x = x - 1
print x
}if and while constructs, the opening
brace must be on the same line as the closing ) of the conditional expressionelse constructs, the else keyword must
be on the same line as the closing } of the if body.These unfortunate requirements are due to the fact that we treat the
newline terminator as an “execution point” in our grammar (in the
list rule), so that we can have an interactive REPL. This
could be resolved by treating newlines as whitespace in a non-REPL
version of hoc.
if/else
statementsOur machine implementation is also very similar to
whilecode, though we do not update our pc as we go; rather, we choose to be
specific about which word we are accessing.
int inst_ifcode(void) {
Addr ifbody_addr = pc[0].addr;
Addr maybe_elsebody_addr = NULL;
if (!is_stop_inst(pc + 1)) {
maybe_elsebody_addr = pc[1].addr;
}
Addr end_addr = pc[2].addr;
Addr cond_addr = pc + 3;
execute(cond_addr);
Datum d = stack_pop(); // must be value
if (d.value) {
execute(ifbody_addr);
} else if (maybe_elsebody_addr != NULL) {
execute(maybe_elsebody_addr);
}
pc = end_addr;
return 0;
}print StatementNow that we have statements, there will be times when we want to display a value inside a loop or conditional body. Since statements do not produce values, the REPL cannot output them. We will need to display them ourselves.
We add a print statement via a print keyword and
associated prexpr instruction.
stmt: expr { install_instruction(inst_pop); }
...
+ | PRINT_KW expr
+ {
+ install_instruction(inst_prexpr);
+ $$ = $2;
+ }And the machine implementation is very simple; it’s
print without a tab.
int inst_prexpr(void) {
Datum d = stack_pop();
printf("%.8g\n", d.value);
return 0;
}hoc5 finishes the core syntax of logic in
hoc; we’ve finished the core vocabulary of the language.
What remains to build are tools for abstraction; we need a way
to reuse blocks of code, rather than writing a program as one huge
top-to-bottom script!
hoc6, the last version of
hoc presented in The UNIX Programming Environment,
solves that issue. It includes the largest and most interesting new
feature of any so far: user-defined subroutines! So grab a coffee (or
take a break), and then we’ll continue onward.
CFLAGS ?= -std=c2x -Wall -Wextra -pedantic -Wformat -Wformat-extra-args -Wformat-nonliteral
YFLAGS ?= -d
LDFLAGS=-lm
objects := builtins.o math.o symbol.o machine.o hoc.o
.PHONY:all
all: hoc
hoc: $(objects)
cc $(CFLAGS) $(objects) $(LDFLAGS) -o hoc
hoc.c hoc.tab.h: hoc.y
yacc $(YFLAGS) hoc.y -o hoc.tab.c
mv hoc.tab.c hoc.c
%.o: hoc.h hoc.tab.h %.c
cc $(CFLAGS) -c -o $@ $*.c
.PHONY:clean
clean:
rm -f hoc.tab.h
rm -f $(objects)
rm -f hoc.c
rm -f hoc
#include "hoc.h"
#include "hoc.tab.h"
#include <math.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h> // rand
/* Defined in math.c */
extern double Abs(double), Acos(double), Atan(double), Atan2(double, double),
Cos(double), Exp(double), Integer(double), Lg(double), Ln(double),
Log10(double), Pow(double, double), Random(void), Sin(double), Sqrt(double);
static struct {
const char *name;
double value;
} constants[] = {
{"PI", 3.14159265358979323846},
{"E", 2.71828182845904523536},
{NULL, 0},
};
static struct {
const char *name;
int args;
union {
double (*call0)(void);
double (*call1)(double);
double (*call2)(double, double);
};
} builtins[] = {
{"abs", 1, .call1 = Abs},
{"acos", 1, .call1 = Acos},
{"atan", 1, .call1 = Atan},
{"atan2", 2, .call2 = Atan2},
{"cos", 1, .call1 = Cos},
{"exp", 1, .call1 = Exp},
{"int", 1, .call1 = Integer},
{"lg", 1, .call1 = Lg},
{"ln", 1, .call1 = Ln},
{"log10", 1, .call1 = Log10},
{"pow", 2, .call2 = Pow},
{"rand", 0, .call0 = Random},
{"sin", 1, .call1 = Sin},
{"sqrt", 1, .call1 = Sqrt},
{NULL, 0, .call0 = NULL},
};
static struct {
const char *name;
int token_type;
} keywords[] = {
{"if", IF_KW},
{"else", ELSE_KW},
{"print", PRINT_KW},
{"while", WHILE_KW},
{NULL, 0},
};
extern void command_list(void);
static struct {
const char *name;
void (*command_func)(void);
} commands[] = {
{"LIST", command_list},
{NULL, 0},
};
void install_hoc_builtins(void) {
for (int i = 0; constants[i].name != NULL; i++) {
install_symbol(constants[i].name, CONST, constants[i].value);
}
for (int i = 0; builtins[i].name != NULL; i++) {
Symbol *s = install_symbol(builtins[i].name, BUILTIN, 0.0);
// TODO: Why not have separate install_value vs install_function?
s->data.func.args = builtins[i].args;
switch (builtins[i].args) {
case 0:
s->data.func.call0 = builtins[i].call0;
break;
case 1:
s->data.func.call1 = builtins[i].call1;
break;
case 2:
s->data.func.call2 = builtins[i].call2;
break;
default:
fprintf(
stderr, "Invalid arg count for symbol '%s'\n", builtins[i].name);
exit(1);
}
}
for (int i = 0; keywords[i].name != NULL; i++) {
install_symbol(keywords[i].name, keywords[i].token_type, 0.0);
}
for (int i = 0; commands[i].name != NULL; i++) {
Symbol *s = install_symbol(commands[i].name, COMMAND, 0.0);
s->data.command_func = commands[i].command_func;
}
}
#ifndef HOC_INC
#define HOC_INC 1
/*
* Global types & declarations for use in hoc
*/
#include <stddef.h>
///----------------------------------------------------------------
/// Symbols
///----------------------------------------------------------------
#define SYMBOL_NAME_LIMIT 100
typedef struct Symbol Symbol;
/** Function descriptor for builtins: these always return doubles */
struct BuiltinFunc {
int args;
union {
/* We use an anonymous union for the function pointer; there's
no meaningful name for the union as a field */
double (*call0)(void);
double (*call1)(double);
double (*call2)(double, double);
};
};
struct Symbol {
short type; // Generated from our token types
char *name;
Symbol *next;
union {
double val;
struct BuiltinFunc func;
void (*command_func)(void);
} data;
};
Symbol *install_symbol(const char *name, short type, double value);
Symbol *lookup(const char *name);
///----------------------------------------------------------------
/// Machine Code
///----------------------------------------------------------------
typedef int (*Inst)(void);
typedef enum CodeType {
CT_INST = 'I',
CT_LITERAL = 'L',
CT_SYM_REF = 'S',
CT_ADDR = 'A'
} CodeType;
typedef struct MachineCode MachineCode;
typedef MachineCode *Addr;
struct MachineCode {
CodeType type;
union {
double literal;
Inst inst;
Symbol *symbol;
Addr addr;
}; // anonymous union; no need to name it
};
///----------------------------------------------------------------
/// Machine Program
///----------------------------------------------------------------
/**
* Installs the given `MachineCode` object into the next available
* location in the machine's current program. A runtime error will
* occur if the program is full.
*
* Returns the address at which the code was installed.
*/
Addr install_code(MachineCode mc);
/** The next available address in the machine's current program */
extern Addr progp;
#define STOP_INST NULL
#define is_stop_inst(mc) ((mc)->type == CT_INST && (mc)->inst == STOP_INST)
#define install_literal(lit) \
install_code((struct MachineCode){.type = CT_LITERAL, .literal = lit})
#define install_ref(sp) \
install_code((struct MachineCode){.type = CT_SYM_REF, .symbol = sp})
#define install_instruction(instr) \
install_code((struct MachineCode){.type = CT_INST, .inst = instr})
#define install_address(address) \
install_code((struct MachineCode){.type = CT_ADDR, .addr = address})
void execute(MachineCode *addr);
///----------------------------------------------------------------
/// Instructions
///----------------------------------------------------------------
int inst_add(void);
int inst_and(void);
int inst_assignvar(void);
int inst_assignconst(void);
int inst_call(void);
int inst_div(void);
int inst_eval(void);
int inst_eq(void);
int inst_gt(void);
int inst_gte(void);
int inst_ifcode(void);
int inst_lt(void);
int inst_lte(void);
int inst_mul(void);
int inst_neq(void);
int inst_not(void);
int inst_or(void);
int inst_pop(void);
int inst_prexpr(void);
int inst_print(void);
int inst_pushlast(void);
int inst_pushlit(void);
int inst_pushref(void);
int inst_sub(void);
int inst_whilecode(void);
///----------------------------------------------------------------
/// Commands
///----------------------------------------------------------------
void command_list(void);
void machine_run_command(Symbol *);
///----------------------------------------------------------------
/// Machine Stack
///----------------------------------------------------------------
typedef union Datum {
double value;
Symbol *symbol;
} Datum;
void machine_startup(void);
void machine_reset_program(void);
///----------------------------------------------------------------
/// Other
///----------------------------------------------------------------
/** global error handling */
__attribute__((format(printf, 1, 2))) void warning(const char *msg, ...);
__attribute__((format(printf, 1, 2))) void exec_error(const char *msg, ...);
/** builtins */
void install_hoc_builtins(void);
#endif // HOC_INC
%{
///----------------------------------------------------------------
/// C Preamble
///----------------------------------------------------------------
/*
* "higher-order calculator" - Version 5
* Adapted from "The UNIX Programming Environment"
*/
#include <stdio.h>
#include <stdbool.h>
#include <ctype.h>
#include <signal.h>
#include <setjmp.h>
#include <stdarg.h>
#include "hoc.h"
///----------------------------------------------------------------
/// global state
///----------------------------------------------------------------
jmp_buf begin;
///----------------------------------------------------------------
/// local declarations
///----------------------------------------------------------------
#define YACC_SUCCESS 0
int yylex(void);
void yyerror(const char *s);
void check_call_args(const Symbol *s, int actual);
/*
* Grammar points:
*
* - EXPRESSIONS provide a value on the top of the satck.
* - STATEMENTS are not expressions, and do not provide a value.
*/
%}
%union {
double hoc_value;
Symbol *hoc_symbol;
Addr hoc_address;
}
%token <hoc_value> NUMBER
%token <hoc_symbol> VAR CONST BUILTIN UNDEF
%token <hoc_symbol> IF_KW ELSE_KW PRINT_KW WHILE_KW COMMAND
%type <hoc_symbol> assignable
%type <hoc_address> assign call end expr if stmt stmtlist while
%right '=' /* right-associative, much like C */
%left OR
%left AND
%left EQ GT GTE LT LTE NEQ
%left '+' '-' /* left-associative, same precedence */
%left '*' '/' /* left-assoc; higher precedence */
%right '^' /* exponents */
%left UNARY_MINUS NOT /* Even higher precedence
than mul or div. Can't use '-', as we've used
it already for subtraction. */
%%
list: /* nothing */
| list terminator
| list stmt terminator
{
install_instruction(STOP_INST);
YYACCEPT;
}
| list assign terminator
{
install_instruction(inst_pop);
install_instruction(STOP_INST);
YYACCEPT;
}
| list expr terminator
{
install_instruction(inst_print);
install_instruction(STOP_INST);
YYACCEPT;
}
| list error terminator { yyerrok; }
;
stmt: expr { install_instruction(inst_pop); }
| '{' stmtlist '}' { $$ = $2; }
| while cond stmt end
{
$1[0].type = CT_INST;
$1[0].inst = inst_whilecode;
$1[1].type = CT_ADDR;
$1[1].addr = $3;
$1[2].type = CT_ADDR;
$1[2].addr = $4;
}
| if cond stmt end
{
$1[0].type = CT_INST;
$1[0].inst = inst_ifcode;
$1[1].type = CT_ADDR;
$1[1].addr = $3;
$1[3].type = CT_ADDR;
$1[3].addr = $4;
}
| if cond stmt end ELSE_KW stmt end
{
$1[0].type = CT_INST;
$1[0].inst = inst_ifcode;
$1[1].type = CT_ADDR;
$1[1].addr = $3;
$1[2].type = CT_ADDR;
$1[2].addr = $6;
$1[3].type = CT_ADDR;
$1[3].addr = $7;
}
| PRINT_KW expr
{
install_instruction(inst_prexpr);
$$ = $2;
}
| COMMAND
{
machine_run_command($1);
$$ = progp;
}
;
stmtlist: /* nothing */ { $$ = progp; }
| stmtlist stmt terminator { $$ = $1; }
| stmtlist terminator { $$ = $1; }
;
while: WHILE_KW
{
$$ = install_instruction(STOP_INST);
install_instruction(STOP_INST);
install_instruction(STOP_INST);
}
;
if: IF_KW
{
$$ = install_instruction(STOP_INST);
install_instruction(STOP_INST);
install_instruction(STOP_INST);
install_instruction(STOP_INST);
}
;
cond: '(' expr ')'
{
install_instruction(STOP_INST);
}
;
end: /* nothing */
{
install_instruction(STOP_INST);
$$ = progp;
}
;
/*
* Every expr construct, when *executed*, must push a value onto the
* stack
*/
expr: NUMBER
{
$$ = install_instruction(inst_pushlit);
install_literal($1);
}
| '@'
{
$$ = install_instruction(inst_pushlast);
}
| assignable
{
$$ = install_instruction(inst_pushref);
install_ref($1);
install_instruction(inst_eval);
}
| assign
| call
| expr '+' expr { install_instruction(inst_add); }
| expr '-' expr { install_instruction(inst_sub); }
| expr '*' expr { install_instruction(inst_mul); }
| expr '/' expr { install_instruction(inst_div); }
| expr LT expr { install_instruction(inst_lt); }
| expr LTE expr { install_instruction(inst_lte); }
| expr GT expr { install_instruction(inst_gt); }
| expr GTE expr { install_instruction(inst_gte); }
| expr EQ expr { install_instruction(inst_eq); }
| expr NEQ expr { install_instruction(inst_neq); }
| expr AND expr { install_instruction(inst_and); }
| expr OR expr { install_instruction(inst_or); }
| expr '^' expr
{
install_instruction(inst_pushref);
install_ref(lookup("pow"));
install_instruction(inst_call);
}
| '-'expr %prec UNARY_MINUS /* prec makes us bind tighter than subtraction */
{
install_instruction(inst_pushlit);
install_literal(-1);
install_instruction(inst_mul);
$$=$2;
}
| NOT expr %prec UNARY_MINUS
{
install_instruction(inst_not);
$$=$2;
}
| '(' expr ')' { $$ = $2; }
;
assign: assignable '=' expr
{
install_instruction(inst_pushref);
install_ref($1);
install_instruction(inst_assignvar);
$$ = $3; /* instructions begin with RHS */
}
| assignable ':' '=' expr
{
install_instruction(inst_pushref);
install_ref($1);
install_instruction(inst_assignconst);
$$ = $4; /* instructions begin with RHS */
}
;
call: BUILTIN '(' ')'
{
check_call_args($1, 0);
$$ = install_instruction(inst_pushref);
install_ref($1);
install_instruction(inst_call);
}
| BUILTIN '(' expr ')'
{
check_call_args($1, 1);
install_instruction(inst_pushref);
install_ref($1);
install_instruction(inst_call);
$$ = $3; /* Block starts with pushing args */
}
| BUILTIN '(' expr ',' expr ')'
{
check_call_args($1, 2);
$$ = install_instruction(inst_pushref);
install_ref($1);
install_instruction(inst_call);
$$ = $3; /* Block starts with pushing args */
}
;
assignable: VAR | CONST
;
terminator: '\n' | ';'
;
%% // end of grammar
/* error tracking */
char *program_name;
int line_number = 1;
int main(int argc, char *argv[]) {
(void)argc;
program_name = argv[0];
install_hoc_builtins();
machine_startup();
setjmp(begin);
machine_reset_program();
while (yyparse() == YACC_SUCCESS && !feof(stdin)) {
execute(NULL);
machine_reset_program();
}
return 0;
}
/**
* "Looks ahead 1 character for a possibility. If it's found, returns
* `match`. If not:
* - places character back onto input stream for next read
* - returns the `otherwise` result.
*/
static int lookahead(char expect, int match, int otherwise) {
int c = getchar();
if (c == expect) {
return match;
} else {
ungetc(c, stdin);
return otherwise;
}
}
/* our simple, hand-rolled lexer. */
int yylex(void) {
static bool hit_eof = false;
int c;
// Skip space
do {
c = getchar();
} while (c == ' ' || c == '\t');
/* Newline / EOF handling:
* The first time we see an EOF, we "fake" a newline,
* so that the list production can recognize any in-progress
* statement. The next call will return the EOF.
*/
if (c == EOF && !hit_eof) {
ungetc(c, stdin);
hit_eof = true;
c = '\n';
}
if (c == '\n') {
line_number++;
} else {}
// Get numbers
if (c == '.' || isdigit(c)) {
ungetc(c, stdin);
scanf("%lf", &yylval.hoc_value);
return NUMBER;
}
if (isalpha(c)) {
Symbol *s;
char buf[SYMBOL_NAME_LIMIT + 1];
size_t nread = 0;
do {
buf[nread++] = c;
c = getchar();
} while (nread < SYMBOL_NAME_LIMIT && (isalpha(c) || isdigit(c)));
// Just in case we exceeded the limit
while ((isalpha(c) || isdigit(c))) {
c = getchar();
}
// at this point, we have a non-alphanumeric 'c'
ungetc(c, stdin);
buf[nread] = '\0';
if ((s=lookup(buf)) == NULL) {
s = install_symbol(buf, UNDEF, 0.0);
}
yylval.hoc_symbol = s;
return (s->type == UNDEF ? VAR : s->type);
}
switch (c) {
case '<':
return lookahead('=', LTE, LT);
case '>':
return lookahead('=', GTE, GT);
case '=':
return lookahead('=', EQ, '=');
case '!':
return lookahead('=', NEQ, NOT);
case '|':
return lookahead('|', OR, '|');
case '&':
return lookahead('&', AND, '&');
default:
return c;
}
}
void yyerror(const char *s) {
warning("%s", s);
}
/*
* Verify Symbol's declared arg count matches actual,
* or display a helpful error message with mismatch
*/
void check_call_args(const Symbol *s, int actual) {
int expected = s->data.func.args;
if (expected != actual) {
exec_error("Wrong number of arguments for %s: expected %d, got %d",
s->name, expected, actual);
}
}
#define print_error_prefix() fprintf(stderr, "%s: ", program_name)
#define print_error_suffix() fprintf(stderr, " (on line %d)\n", line_number)
void warning(const char *msg, ...) {
va_list args;
va_start(args, msg);
print_error_prefix();
vfprintf(stderr, (msg), args);
print_error_suffix();
va_end(args);
}
void exec_error(const char *msg , ...) {
va_list args;
va_start(args, msg);
print_error_prefix();
vfprintf(stderr, (msg), args);
print_error_suffix();
va_end(args);
longjmp(begin, 0);
}
#include "hoc.h"
#include "hoc.tab.h"
#include <assert.h>
#include <stdbool.h>
#include <stddef.h> // NULL
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
/*
* MACHINE STATE
* =============
*
* We have our "program" stored as a list of MachineCode
* objects, each encoding one of:
* - an instruction
* - a literal value
* - a symbol reference
*
* We "execute" our program by setting the program counter (pc)
* pointer to the first MachineCode (which *must* be an instruction),
* and handling each instruction we find. Within an instruction, the
* program counter might be updated so that by the *end* of any
* instruction, the program counter is always pointing to another
* instruction.
*
*
* The Stack
* ---------
*
* Each instruction manipulates the stack, pushing,
* popping, or combining values from the stack. The stack elements
* are Datum objects, which act much like MachineCodes, but with two
* differences.
*
* 1. Datums cannot represent an instruction; they resolve to a value
* 2. The type of a datum can always be resolved from their context,
* and so no "type" field is necessary.
*
* ┌────────────────────────────────────────────────────┐
* │ == PROG == == STACK == │
* │ ┌─────────────┐◀───┐ ┌─────────────┐ │
* │ │ MachineCode │ pc │ Datum │ │
* │ ├─────────────┤ ├─────────────┤ │
* │ │ MachineCode │ │ Datum │ │
* │ ├─────────────┤ ├─────────────┤ │
* │ │ MachineCode │ progp │ Datum │ stackp │
* │ ├─────────────┤◀────┘ ├─────────────┤ ◀──┘ │
* │ │ ... │ │ ... │ │
* │ └─────────────┘ └─────────────┘ │
* └────────────────────────────────────────────────────┘
*
*/
#define PROGSIZE 2000
static MachineCode prog[PROGSIZE];
Addr progp; // Next free location for code installation
static Addr pc; // the current location in the executing program
static Addr highest_addr_written = prog; // Used in LISTing contents
// stack
#define STACKSIZE 256
static Datum stack[STACKSIZE]; // the machine stack
static Datum *stackp; // next free spot on the stack
static Datum last_popped = {.value = 0}; // The last value popped
static void stack_push(Datum d);
static Datum stack_pop(void);
/** Initialize machine state for startup */
void machine_startup(void) {
srand(time(NULL)); // use current time as seed for random generator
machine_reset_program();
}
/** Initialize execution state for a new program */
void machine_reset_program(void) {
stackp = stack;
progp = prog;
}
///----------------------------------------------------------------
/// Code Installation
///----------------------------------------------------------------
Addr install_code(MachineCode mc) {
if (progp >= prog + PROGSIZE) {
exec_error("Max program size exceeded");
}
if (progp > highest_addr_written) {
highest_addr_written = progp;
}
*progp = mc;
return progp++; // return location of THIS instruction
}
///----------------------------------------------------------------
/// Execution
///----------------------------------------------------------------
void execute(MachineCode *addr) {
pc = (addr == NULL) ? prog : addr;
while (!is_stop_inst(pc)) {
if (pc >= progp) {
exec_error("PC was past end-of-program");
}
if (pc->type != CT_INST) {
exec_error("Programming error, type '%c' was not INST!", pc->type);
}
(*(*pc++).inst)(); // increment PC and execute inst
}
}
/** Push new value onto stack */
static void stack_push(Datum d) {
if (stackp >= stack + STACKSIZE) {
exec_error("Stack overflow");
}
*(stackp++) = d; // copy
}
static Datum stack_pop(void) {
if (stackp <= stack) {
exec_error("Stack underflow");
}
Datum d = *(--stackp);
return (last_popped = d);
}
///----------------------------------------------------------------
/// Instructions
///----------------------------------------------------------------
#define dat(x) ((Datum){.value = x})
int inst_pushlit(void) {
assert(pc->type == CT_LITERAL);
Datum d;
d.value = pc->literal;
stack_push(d);
pc++;
return 0;
}
int inst_pushref(void) {
assert(pc->type == CT_SYM_REF);
Datum d;
d.symbol = pc->symbol;
stack_push(d);
pc++;
return 0;
}
int inst_pushlast(void) {
stack_push(last_popped);
return 0;
}
int inst_eval(void) {
Datum d = stack_pop(); // must be symbol
switch (d.symbol->type) {
case UNDEF:
exec_error("Undefined variable: '%s'", d.symbol->name);
break;
case BUILTIN:
exec_error("Builtin '%s' cannot be evaluated", d.symbol->name);
break;
default: // VAR,CONST
stack_push(dat(d.symbol->data.val));
break;
}
return 0;
}
int inst_whilecode(void) {
/*
┌─────────────────────────────┐
│ whilecode │
├─────────────────────────────┤
│ address(body) │
├─────────────────────────────┤
│ address(stmt after body) │
├─────────────────────────────┤
│ cond │
│ .. │
├─────────────────────────────┤
│ STOP │
├─────────────────────────────┤
│ │
│ body │
│ .... │
├─────────────────────────────┤
│ STOP │
├─────────────────────────────┤
│ stmt after body │
└─────────────────────────────┘
*/
Addr body_addr = (pc++)->addr;
Addr end_addr = (pc++)->addr;
Addr cond_addr = pc;
while (true) {
execute(cond_addr);
if (stack_pop().value) { // guaranteed: conditions are expressions
execute(body_addr);
} else {
break;
}
}
pc = end_addr;
return 0;
}
int inst_ifcode(void) {
/*
┌─────────────────────────────┐
│ ifcode │
├─────────────────────────────┤
│ address(ifbody) │
├─────────────────────────────┤
│ address(elsebody) │
├─────────────────────────────┤
│ address(stmt after if) │
├─────────────────────────────┤
│ cond │
│ .. │
├─────────────────────────────┤
│ STOP │
├─────────────────────────────┤
│ │
│ ifbody │
│ .... │
├─────────────────────────────┤
│ STOP │
├─────────────────────────────┤
│ │
│ elsebody │
│ .... │
├─────────────────────────────┤
│ STOP │
├─────────────────────────────┤
│ stmt after if │
└─────────────────────────────┘
*/
Addr ifbody_addr = pc[0].addr;
MachineCode maybe_elsebody = pc[1];
Addr end_addr = pc[2].addr;
Addr cond_addr = pc + 3;
execute(cond_addr);
Datum d = stack_pop(); // must be value
if (d.value) {
execute(ifbody_addr);
} else if (maybe_elsebody.type == CT_ADDR) {
execute(maybe_elsebody.addr);
}
pc = end_addr;
return 0;
}
int inst_call(void) {
Datum s = stack_pop(); // must be symbol
if (s.symbol->type != BUILTIN) {
exec_error("Cannot call non-function '%s'", s.symbol->name);
}
double result;
switch (s.symbol->data.func.args) {
case 0:
result = s.symbol->data.func.call0();
break;
case 1: {
Datum arg = stack_pop();
result = s.symbol->data.func.call1(arg.value);
break;
}
case 2: {
Datum arg2 = stack_pop();
Datum arg1 = stack_pop();
result = s.symbol->data.func.call2(arg1.value, arg2.value);
break;
}
}
stack_push(dat(result));
return 0;
}
int inst_assignvar(void) {
Datum lhs = stack_pop(); // must be symbol
Datum rhs = stack_pop(); // must be value
if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
exec_error("Cannot assign to symbol '%s'", lhs.symbol->name);
}
lhs.symbol->data.val = rhs.value;
lhs.symbol->type = VAR;
stack_push(dat(rhs.value));
return 0;
}
int inst_assignconst(void) {
Datum lhs = stack_pop(); // must be symbol
Datum rhs = stack_pop(); // must be value
if (lhs.symbol->type == CONST) {
exec_error("Cannot reassign constant '%s'", lhs.symbol->name);
} else if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
exec_error("Cannot assign to symbol '%s'", lhs.symbol->name);
}
lhs.symbol->data.val = rhs.value;
lhs.symbol->type = CONST;
stack_push(dat(rhs.value));
return 0;
}
int inst_pop(void) {
stack_pop();
return 0;
}
int inst_print(void) {
Datum d = stack_pop();
printf("\t%.8g\n", d.value);
return 0;
}
int inst_prexpr(void) {
Datum d = stack_pop();
printf("%.8g\n", d.value);
return 0;
}
int inst_add(void) {
Datum rhs = stack_pop(); // must be value
Datum lhs = stack_pop(); // must be value
stack_push(dat(lhs.value + rhs.value));
return 0;
}
int inst_and(void) {
Datum rhs = stack_pop(); // must be value
Datum lhs = stack_pop(); // must be value
stack_push(dat(lhs.value && rhs.value));
return 0;
}
int inst_or(void) {
Datum rhs = stack_pop(); // must be value
Datum lhs = stack_pop(); // must be value
stack_push(dat(lhs.value || rhs.value));
return 0;
}
int inst_not(void) {
Datum d = stack_pop(); // must be value
d.value = !d.value;
stack_push(d);
return 0;
}
int inst_lt(void) {
Datum rhs = stack_pop(); // must be value
Datum lhs = stack_pop(); // must be value
stack_push(dat(lhs.value < rhs.value));
return 0;
}
int inst_lte(void) {
Datum rhs = stack_pop(); // must be value
Datum lhs = stack_pop(); // must be value
stack_push(dat(lhs.value <= rhs.value));
return 0;
}
int inst_gt(void) {
Datum rhs = stack_pop(); // must be value
Datum lhs = stack_pop(); // must be value
stack_push(dat(lhs.value > rhs.value));
return 0;
}
int inst_gte(void) {
Datum rhs = stack_pop(); // must be value
Datum lhs = stack_pop(); // must be value
stack_push(dat(lhs.value >= rhs.value));
return 0;
}
int inst_eq(void) {
Datum rhs = stack_pop(); // must be value
Datum lhs = stack_pop(); // must be value
stack_push(dat(lhs.value == rhs.value));
return 0;
}
int inst_neq(void) {
Datum rhs = stack_pop(); // must be value
Datum lhs = stack_pop(); // must be value
stack_push(dat(lhs.value != rhs.value));
return 0;
}
int inst_sub(void) {
Datum rhs = stack_pop(); // must be value
Datum lhs = stack_pop(); // must be value
stack_push(dat(lhs.value - rhs.value));
return 0;
}
int inst_div(void) {
Datum rhs = stack_pop(); // must be value
Datum lhs = stack_pop(); // must be value
if (rhs.value == 0) {
exec_error("Division by zero");
} else {
stack_push(dat(lhs.value / rhs.value));
}
return 0;
}
int inst_mul(void) {
Datum rhs = stack_pop(); // must be value
Datum lhs = stack_pop(); // must be value
stack_push(dat(lhs.value * rhs.value));
return 0;
}
///----------------------------------------------------------------
/// Commands
///----------------------------------------------------------------
void machine_run_command(Symbol *command_symbol) {
command_symbol->data.command_func();
}
#define check_inst(inst_name) \
else if (c.inst == inst_##inst_name) do { \
printf(inst_template, line, (void *)(prog + i), "INST", #inst_name); \
} \
while (false)
void command_list(void) {
// Lists contents of prog until we hit two stop instructions in a row
#define header_template "%6s | %11s | %10s | %-30s\n"
#define inst_template "%6ld | %11p | %10s | %-30s"
#define rightline_template "%6ld | %11p | %10s | %30s"
#define lit_template "%6ld | %11p | %10s | %30f"
#define addr_template "%6ld | %11p | %10s | %11p (Line %3ld)"
printf(header_template, "LINE", "ADDRESS", "TYPE", "CODE");
printf("------------------------------------------------------------------"
"\n");
for (size_t i = 0; prog + i <= highest_addr_written; i++) {
size_t line = i + 1;
MachineCode c = prog[i];
void *cp = &prog[i];
switch (c.type) {
case (CT_LITERAL):
printf(lit_template, line, cp, "L", c.literal);
break;
case (CT_SYM_REF):
printf(rightline_template, line, cp, "SYM", c.symbol->name);
break;
case (CT_ADDR): {
size_t addr_line = (c.addr - prog) + 1;
printf(addr_template, line, cp, "ADDR", (void *)c.addr, addr_line);
break;
}
case (CT_INST):
if (is_stop_inst(&c)) {
printf(inst_template,
line,
cp,
"INST",
"------------ STOP ------------");
}
check_inst(add);
check_inst(and);
check_inst(assignvar);
check_inst(assignconst);
check_inst(call);
check_inst(div);
check_inst(eq);
check_inst(eval);
check_inst(gt);
check_inst(gte);
check_inst(ifcode);
check_inst(lt);
check_inst(lte);
check_inst(mul);
check_inst(neq);
check_inst(not );
check_inst(or);
check_inst(pop);
check_inst(prexpr);
check_inst(print);
check_inst(pushlast);
check_inst(pushlit);
check_inst(pushref);
check_inst(sub);
check_inst(whilecode);
else {
printf(inst_template, line, cp, "INST", "UNKNOWN!?");
}
break;
default:
printf("Unknown Code Type: %c\n", c.type);
}
printf("\n");
}
}
#include "hoc.h"
#include <errno.h>
#include <fenv.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h> // rand
#define HOC_RAND_MAX 1000000L
static double check_math_err(double result, const char *op);
double Abs(double x) {
return check_math_err(fabs(x), "fabs");
}
double Acos(double x) {
return check_math_err(acos(x), "acos");
}
double Atan(double x) {
return check_math_err(atan(x), "atan");
}
double Atan2(double x, double y) {
return check_math_err(atan2(x, y), "atan2");
}
double Cos(double x) {
return check_math_err(cos(x), "cos");
}
double Exp(double x) {
return check_math_err(exp(x), "exp");
}
double Integer(double x) {
return (double)(long)x;
}
double Lg(double x) {
return check_math_err(log2(x), "lg");
}
double Ln(double x) {
return check_math_err(log(x), "log");
}
double Log10(double x) {
return check_math_err(log10(x), "log10");
}
double Pow(double x, double y) {
return check_math_err(pow(x, y), "exponentiation");
}
double Random(void) {
// We have HOC_RAND_MAX buckets
const int bucket_size = RAND_MAX / HOC_RAND_MAX;
int result;
do {
result = rand() / bucket_size;
} while (result >= HOC_RAND_MAX);
return result + 1;
}
double Sin(double x) {
return check_math_err(sin(x), "sin");
}
double Sqrt(double x) {
return check_math_err(sqrt(x), "sqrt");
}
static double check_math_err(double result, const char *op) {
static const char *domain_msg = "argument outside domain";
static const char *range_msg = "result outside range";
static const char *other_msg = "floating-point exception occurred";
const char *msg = NULL;
if ((math_errhandling & MATH_ERREXCEPT) && fetestexcept(FE_ALL_EXCEPT)) {
// Special case: Inexact results are not errors.
if (fetestexcept(FE_INEXACT)) {
goto done;
}
if (fetestexcept(FE_INVALID)) {
msg = domain_msg;
} else if (fetestexcept(FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW)) {
msg = range_msg;
} else { // unknown
msg = other_msg;
}
feclearexcept(FE_ALL_EXCEPT);
} else if (errno) {
if (errno == EDOM) {
msg = domain_msg;
} else if (errno == ERANGE) {
msg = range_msg;
} else {
msg = other_msg;
}
errno = 0;
}
if (msg != NULL) {
exec_error("math error during %s: %s", op, msg);
}
done:
return result;
}
#include "hoc.h"
#include "hoc.tab.h" // generated from yacc -d on our grammar
#include <stddef.h> // NULL
#include <stdlib.h> // malloc
#include <string.h> // strcmp
static Symbol *symbol_table = NULL;
void *emalloc(size_t nbytes);
Symbol *lookup(const char *name) {
Symbol *current = symbol_table;
while (current != NULL) {
if (strcmp(current->name, name) == 0) {
return current;
}
current = current->next;
}
return NULL;
}
Symbol *install_symbol(const char *name, short type, double value) {
size_t name_len = strlen(name);
Symbol *s = emalloc(sizeof(Symbol) // actual symbol
+ (sizeof(char)) * (name_len + 1)); // room for name
s->name = (char *)(s + 1);
strcpy(s->name, name);
s->name[name_len] = '\0';
s->type = type;
s->data.val = value;
s->next = symbol_table;
symbol_table = s;
return s;
}
void *emalloc(size_t nbytes) {
void *ptr = malloc(nbytes);
if (ptr == NULL) {
exec_error("Out of memory!");
}
return ptr;
}