The last version of hoc defined in UPE, hoc6
provides some major features that complete its transition from
calculator to minimally-useful scripting language.
Users can now define their own subroutines to create reusable
behavior. The implementation changes for this feature are probably—with
the exception of our transition from hoc3
to
hoc4
—the largest overall.
The print
statement is much
improved in hoc6
. We can print multiple comma-separated
expressions, and even more importantly, string literals!
hoc6
enables hoc programs to read input data from files
and/or the console. Together with the enhanced print
statement, hoc
programs can now prompt the user for input,
read pre-calculated data from files, and engage interactively with
users.
Subroutines are divided into two categories: functions, which are required to return a value, and procedures, which are required not to. You must decide what kind of subroutine you’re writing up-front, because they use different keywords:
The func
keyword defines a
function:
func theAnswer() {
return 42
}
A simple hoc
function
Whereas the proc
keyword is
used for procedures:
proc tellMeTheAnswer() {
print 42
}
A simple hoc
procedure
Procedures are still allowed to use the return
keyword in order to exit early,
but it must be a “bare return”, with no value.
Subroutines may call other subroutines or themselves; they can also use variables freely. Note, however, that the variables defined inside a subroutine are not scoped; they’ll still be available outside of the routine!
The following example returns the factorial of the global variable
x
. If x
isn’t defined, we’ll get an error when
we call it.
func factorialOfX() {
if (x <= 0) {
return 1
}
f = x
result = 1
while (f > 1) {
result = result * f
f = f - 1
}
return result
}
A hoc
function that uses
variables
You can see that all of the variables used inside the function are still available after it returns.
factorialOfX()
./hoc6/hoc: Undefined variable x (on line 15)
x = 5
factorialOfX()
120
x
5
f
1
result
120
When defining a subroutine in hoc6
, there is no list of
formal parameters. Instead, the function/procedure body can freely use
the syntax $1
, $2
, etc. to refer to the first, second,
etc. argument. We can use this to make a much nicer factorial
function:
() {
func facif ($1 <= 0) {
return 1;
} else {
return $1 * fac($1-1)
}
}
fac(k)
computes k!
When executing your subroutine, hoc
will only verify
that such an argument was actually passed when you try to
use it.
fac(11)
39916800
fac()
./hoc6/hoc: Not enough arguments passed to subroutine: fac (on line 10)
The arguments to a subroutine are passed by value; we are allowed to
modify their values without changing them in the caller. Notice
how we change $1
at the beginning
of compare()
below.
() {
func compare1 = $1 - $2
$if ($1 < 0) {
return -1;
} else if ($1 == 0) {
return 0;
}
return 1;
}
compare(x, y)
returns
-1
, 0
, or 1
, for x < y, x = y, or x > y.
hoc6
has a much-improved print statement. Not only can
multiple expressions be printed, e.g:
print PI, sin(PI), cos(PI)
We can also now print string literals, mixed with arbitrary expressions.
"The sine value of ", PI, " is ", int(sin(PI)), "." print
Importantly, this is the only functionality supported for strings. They may not be:
In hoc6
, we add a new builtin:
read(varName)
This allows our hoc
programs to read user input into a
variable and check that the input was valid in a single
call.
First, when starting hoc, we can specify a set of one or more input
files, and/or stdin
. The read(varName)
function will read from one file at a time; when a file is exhausted,
hoc
transparently moves to the next file.
We’d use the command ./hoc6 datafile
to read input from
datafile
, and ./hoc6 -
to read from stdin
(presumably, the console).
This can be combined arbitrarily. The following shippet first reads
variables from two static files, and once they’re exhausted, will read
any remaining data from stdin
.
./hoc6 input-file1 input-file2 -
hoc6
To read a double from our current input sources, you can use the
read(myVar)
statement. hoc responds to the read request as
follows:
hoc
will attempt to read a double
from
the current input source
If the data is valid, it will be stored into the variable named
myVar
, and read(myVar)
will return
1
. If we are out of data,
myVar
is set to 0
and read(myVar)
returns 0
. If the data was invalid, we get
an exec_error
.
This allows us to write programs that interactively use
stdin
during execution, such as:
> ./hoc -
{
print "Enter some numbers; press EOF when finished: \n"
sum = 0
while (read(x)) {
sum = sum + x
}
print "The sum is: ", sum, "\n"
}
Enter some numbers; press EOF when finished:
1 2 3 4 5 6 7 8
<Ctrl-D pressed>
The sum is: 36
<We can continue using hoc here>
We can think of a user-defined subroutine as a named set of machine
instructions that are permanently installed in the
hoc
virtual machine. When we define a subroutine, we
install the instructions, but we don’t execute them. This is a
significant change; until now, hoc
has executed all
top-level statements as soon as they’re recognized.
We’ll break down the changes into meaningful areas, as described below. step-by-step. Since procedures and functions only differ in their return values, our discussion applies to both.
prog
area.prog
space - Previously, our
hoc
machine overwrote whatever code had been installed in
prog
as soon as it was executed. We need a way to mark
subroutine instructions as “reserved” so that they are not overwritten
by new statements.First, let’s add the ability to recognize a subroutine definition. We have two kinds of subroutines, so we should examine their components:
func subrName() {
<statements>
}
Some components of this syntax we already have; the block of
statements can nicely be handled by stmtlist
, for
example.
However, we’ll need to add the func
& proc
keywords, as well as adding parser
actions to associate the subroutine’s name Symbol with
its instructions.
Like loops and conditionals, it’ll be helpful to have new
tokens—PROC_KW
and FUNC_KW
—for our “introductory” keywords
proc
and func
. This will let us take action
before any other components of the declaration are parsed.
-%token <hoc_symbol> IF_KW ELSE_KW PRINT_KW WHILE_KW
+%token <hoc_symbol> IF_KW ELSE_KW PRINT_KW WHILE_KW FUNC_KW PROC_KW
New keywords for subroutine declarations hoc6/hoc.y
The keywords must also be added to our language builtins.
static struct hoc_keyword {
const char *name;
int token_type;
} keywords[] = {
{"if", IF_KEYWORD},
...+ {"func", FUNC_KW},
+ {"proc", PROC_KW},
{NULL, 0}, };
New keywords for subroutines hoc6/builtins.c
Once we’ve found a func
or
proc
keyword, we want to parse the
subroutine’s name, and create an associated Symbol
object. To that end, we’ll need a new Symbol
type; in fact,
we’ll define two, so that we can handle the differences in
return-value behavior. For lack of a better alternative, we’ll name them
UPROC
and UFUNC
, for procedures and functions
respectively.
%token <hoc_value> NUMBER- %token <hoc_symbol> VAR CONST BUILTIN UNDEF
+ %token <hoc_symbol> VAR CONST BUILTIN UNDEF UFUNC UPROC
%type <hoc_symbol> assignable
New token types for subroutine declarations hoc6/hoc.y
The rules for naming a function will be the same as those for naming
any other symbol, so we can reuse the VAR
token for parsing
a subroutine’s name. However, when we eventually call a
subroutine, we’ll need to be able to differentiate them from variables
and builtin functions, so we create a small wrapper, subr_decl
, to override the installed
Symbol’s type. This is the same pattern we used for
assignable
before, to handle VAR
vs
CONST
symbols.
First, we’ll declare that, like our assignable
wrapper,
subr_decl
procuces a Symbol
.
%token <hoc_value> NUMBER- %type <hoc_symbol> assignable
+ %type <hoc_symbol> assignable subr_decl
subr_decl
produces a
UFUNC
or UPROC
hoc6/hoc.y
The production for a subr_decl
is easy, since we have
our introductory keywords letting us know when a declaration has begun.
Our only job is to override the Symbol.type
installed by
the lexer.
+ subr_decl: PROC_KW VAR
+ {
+ $2->type = UPROC;
+ $$ = $2;
+ }
+ | FUNC_KW VAR
+ {
+ $2->type = UFUNC;
+ $$ = $2;
+ }
+ ;
The subr_decl
production
updates Symbol
types hoc6/hoc.y
$2->type
before
this production runs?
yylex()
function installs it as
UNDEF
.
We can complete our language support for subroutine definnitions by recognizing their overall shape, including the body.
Modeling ourselves after our if
and while
constructs, we note that these declarations should be
statements, and we can use the stmt
production to recognize their bodies.
Like all statements, procedure declarations return the address at
which they begin. These declarations are entirely defined by their
body, so we’ll return the address of their body stmt
.
stmt: expr { install_instruction(inst_pop); }
| '{' stmtlist '}' { $$ = $2; }+ | subr_decl '(' ')' stmt
+ {
+ $$ = $4;
+ }
A subroutine declaration is a new kind of
stmt
hoc6/hoc.y
Since user-defined subroutines are really just a set of instructions,
a Symbol
referring to a subroutine is really holding the
address of the associated instructions.
It seems like the time to add an Addr
member to the
Symbol.data
union, but unfortunately, this causes a new
problem. Remember that the MachineCode
struct already
contains a Symbol
pointer, because that’s how we represent
symbol-table references. This causes cyclic dependencies between the
Symbol
and MachineCode
types.
We can solve this by moving our forward declaration of
MachineCode
& Addr
above our
Symbol
definition in hoc.h
, and let the
compiler resolve the full struct layout.
+ ///----------------------------------------------------------------
+ /// Addresses (Shared by multiple types)
+ ///----------------------------------------------------------------
+ typedef struct MachineCode MachineCode;
+ typedef MachineCode *Addr;
...
struct Symbol {
char *name;
short type; // Generated from our token types
Symbol *next;
union {
double val;+ Addr addr;
struct BuiltinFunc func;
} data;
};
+ ///----------------------------------------------------------------
+ /// Machine Code
+ ///----------------------------------------------------------------
...- typedef struct MachineCode MachineCode;
- typedef MachineCode *Addr;
We need a way to “install” routines into our machine permanently, so
that we can execute them multiple times. The biggest obstacle to this
process is our machine_reset_program()
function, which
resets progp
every time we execute a parsed statement. We
need a way to prevent our subroutine bodies from being overwritten.
One workable method is to mark all space from prog
to
the end of the procedure as “reserved”. Instead of resetting
progp
back to the beginning of program memory, we’ll use a
new variable, prog_start
, which indicates the first
non-reserved word that we can use.
MachineCode prog[PROGSIZE];+ Addr prog_start = prog; // No installation above this address
Addr progp; // Next free location for code installation static Addr pc; // the current location in the executing program
The prog
* variables have the following relationship:
Now, when we define a routine, we’ll preserve its code by shifting
prog_start
to just after its body. This is
handled by reserve_subr
, a new machine function. That
function also updates the given Symbol to point to the
location where the subroutine begins.
/**
* Reserves all currently-installed code as a subroutine, under Symbol
* `subr_sym`. The value of `subr_sym` will be updated to the address
* of the installed routine. Ensures that the code will not be
* overwritten by future installations, even after a machine reset.
*/
void reserve_subr(Symbol *subr_sym);
Addr install_code(MachineCode mc) {
*progp = mc;
Addr result = progp++; // return location of THIS instruction
return result;
}
+ void reserve_subr(Symbol *subr_sym) {
+ subr_sym->data.addr = prog_start;
+ prog_start = progp;
+ }
Then, we make machine_reset_program()
aware of
prog_start
, so that we never again overwrite any code
between prog
and prog_start
.
void machine_reset_program(void) {
stackp = stack;- progp = progp;
+ progp = prog_start;
Finally, we update our parser action for subroutine-declaration
statements to call our new reservation function and populate the
Symbol
value. Now, writing a subroutine declaration will
install code, but won’t execute any!
stmt: expr { install_instruction(inst_pop); }
| '{' stmtlist '}' { $$ = $2; }
| subr_decl '(' ')' stmt
{+ reserve_subr($1);
$$ = $4; }
hoc.y
: Reserving space for
procedures hoc6/hoc.y
stmt
, and where do they come from?
STOP
. It’s installed by the list: list stmt
production, and since
prog_start
is updated before the
list
production runs, it will be evaluated and then
complete.
If you are following along, now is a good time to test your
implementation. We can do that by adding a temporary expr
production that understands how to produce a value for an installed
declaration.
%%
expr: ...
... | UPROC
{$$ = install_instruction(inst_pushlit);
($1->data.addr - prog);
install_literal
}
| UFUNC
{$$ = install_instruction(inst_pushlit);
($1->data.addr - prog);
install_literal }
Now, using the function/procedure name as a variable will evaluate to
its address, relative to prog
. As expected, a more complex
body (like calling a builtin), corresponds to a larger block of
instructions.
./hoc6/hoc
proc f1() { x = 123 + 345 }
proc f2() { print sin(cos(x+42)) }
proc f3() { y = 10 }
f1
0
f2
10
f3
24
hoc
now correctly installs our subroutines into
permanent storage. We now focus on calling those subroutines,
which requires a coordinated dance between the caller and called
code.
The basic idea of executing subroutines in our hoc VM involves a combination of two cooperating instructions:
subrexec
, to enter
a named subroutine, saving our current location
uprocret
, to
return from a procedure to the saved location. (There’s also a
ufuncret
instruction, covered when
we implement return values.)
We’ll add this functionality in stages:
Representing Subroutine Calls & Callers - Whenever we call a procedure, we must keep track of where we will resume execution afterwards. Since subroutines may call other subroutines (or themselves), we need to track an arbitrary stack of callers.
Language Changes
Executing Subroutines
Returning to Callers
The bare minimum required for executing a reserved block of code
—ignoring, for now, arguments and return values—is the ability to jump
to the subroutine’s address, then jump back the caller
after executing. The origin / destination data for a call is encoded
into a call frame. Unlike most of our objects, the
Frame
structure has no use outside of the machine module,
and so we can define it inside machine.c
.
typedef struct Frame {
*subr_called; // the UFUNC/UPROC we have jumped to
Symbol ; // return location
Addr ret_pc} Frame;
Below is an example of how relationship between the subrexec
instruction, the Frame
it creates, and the uprocret
instruction, which uses the
information to return to the caller.
Because subroutines may call other subroutines (or themselves), a
single Frame
of call data is not
sufficient. Instead, this information is often organized into a
stack of per-call data, imaginatively named the call
stack. Since a Frame
is pushed onto the stack
every time a subroutine is invoked, and popped when a routine returns to
its caller, the currently-executing routine always has access to its own
data at the top of the stack.
#define FRAMELIMIT 100 // Recursion depth limit
static Frame frames[FRAMELIMIT];
static const Frame *OVERFLOW_FRAME = frames + FRAMELIMIT;
static Frame *fp = frames; // Next frame to use
We also add some simple functions for manipulating the frame stack.
/**
* Push a new frame for a subroutine call onto the frame stack, or fail on
* overflow.
*
* The frame is initialized as a call to Symbol `s`, returning to address
* `caller_pc`. Caller must initialize all other frame data.
*/
static Frame *frame_push(Symbol *s, Addr caller_pc) {
if (fp == OVERFLOW_FRAME) {
("Call-Stack Overflow while calling '%s'", s->name);
exec_error}
->subr_called = s;
fp->ret_pc = caller_pc;
fpreturn fp++;
}
/** Pop most recent Frame from the frame stack, or fail on underflow */
static Frame *frame_pop(void) {
if (fp == frames) {
("Call-Stack Underflow while attempting to return");
exec_error}
return --fp;
}
/** Access top of frame stack, or fail if no Frame exists */
static Frame *frame_peek(void) {
if (fp == frames) {
("Call-Stack Underflow while attempting to peek at top frame");
exec_error}
return fp - 1;
}
subrexec
InstructionWith our new call stack, we can implement the subrexec
instruction, which performs the
work to actually change our PC location and push a new Frame
onto our stack. This instruction
is always two words long, since we also need the Symbol
which points to our procedure.
To actually run the subroutine, we use a recursive
execute()
call, using Symbol->data.addr
as
the pc value.
int inst_subrexec(void) {
*sym = (pc++)->symbol;
Symbol *f = frame_push(sym, pc);
Frame
(sym->data.addr);
executereturn 0;
}
While it may seem as though we could simply change the value of the
PC in order to “jump” to the subroutine, a nested execute()
call is more compatible with with our implementation of
ifcode
and whilecode
. Since those instructions
use recursion, a direct-jump subroutine implementation without
using recursion would lead to executing code in C stack frames we would
not expect.
As an example, consider the following hoc program.
subrexec
and the uprocret
instructions, what would this
program print?
Most likely, something like:
1
2
4
5
3
Because our ifcode
instructions recurse, we’re inside 4
execute()
functions by the time we reach
print 1
:
execute()
from our REPLexecute()
inside inst_ifcode
for
if (1)
.execute()
inside inst_ifcode
for
if (2)
.execute()
inside inst_ifcode
for
if (3)
.Now, after we finish print 1
, we return
,
and we assume that merely jumps to the instructions for
print 2
. Since that’s the end of a stmtlist
block, the following instruction will be a STOP
.
Now things get odd. This will exit the fourth layer
of the execute()
function, bringing us back to the
inst_ifcode
we were running for if (3)
…
inside our procedure! And that will begin running
print 4
! There are many problematic situations we can
create for ourselves in this manner.
To install the subrexec
instruction, we must add the
language rules for calling a function or procedure. For now, we’ll
disallow specifying any arguments, so the address we produce for the
parser will just be the instruction for calling the procedure /
function. Notice that procedure calls, which don’t return a
value, must be statements, as they don’t leave a value
on the stack.
%%
stmt: ...
'(' ')'
| UPROC
{$$ = install_instruction(inst_subrexec);
($1);
install_ref }
hoc.y
: Calling
procedures hoc6/hoc.y
Function calls are expressions, and so they can be used
inside conditions, etc. (The job of placing a value onto the data stack
will be handled by the return
statement of the function;
trying to call a function at this point will cause errors!)
%%
expr: ...
'(' ')'
| UFUNC
{$$ = install_instruction(inst_subrexec);
($1);
install_ref }
hoc.y
: Calling
functions hoc6/hoc.y
Our procedures must, at some point, return to their caller. We need a
facility to “jump back” to our calling code. Since we’re executing
recursively, this means we need to add a way to signal that it’s time to
return from our current execute()
call, and update the
pc
value to match our return
address.
For hoc6
, we will use a simple implementation, adding an
is_returning
flag that is examined by
execute()
. Whenever that flag is true, the VM knows it
should just be unwinding nested execute
calls, so the
execute()
function returns immediately, without
incrementing the pc
value again or executing anything.
That flag should stay true until we find ourselves back int the
inst_subrexec
function, since we’ve returned from the
called subroutine. At that point, we can safely set the flag to false,
knowing that the return instruction has already updated the
PC
.
You can see a visual representation of this flow below:
subrexec
runs within some execute()
call
(dotted). It finds the procedure address, and starts its own
execute()
flow.uprocret
instructionuprocret
sets pc = frame.ret_pc
, and sets the is_returning
flagexecute()
sees the flag is enabled, and returns to
subrexec
subrexec
disables is_returning
, and
returns to its enclosing execute()
call. the
pc
value is still set to the value from
uprocret
and not updated - execution continues
correctly.First, we must add the flag for is_returning
:
#define FRAMELIMIT 100 // Recursion depth limit
#define OVERFLOW_FRAME (frames + FRAMELIMIT)
static Frame frames[FRAMELIMIT];
static Frame *fp = frames; // Next frame to use+ static bool is_returning; // true when returning from a user subroutine
Next, we’ll add the uprocret
instruction; it must:
Frame
from our frame stack, so we know
our return addresspc
to that return address, so that execution
can resume in our caller.is_returning
true, so that the next
execute()
iteration returns early, and we may return to
subrexec
.int inst_uprocret(void) {
*f = frame_pop();
Frame = true;
is_returning = f->ret_pc;
pc return 0;
}
A “return” instruction hoc6/machine.c
Finally, we update execute
to respect this state.
void execute(Addr start_addr) {
pc = start_addr;+ is_returning = false;
- while (!is_stop_inst(pc)) {
+ while (!is_stop_inst(pc) && !is_returning) {
There’s one other aspect of our execution flow that must be considered before we’re done. To see it, consider the following hoc program, and decide what output will result when running it.
tester = 0
proc earlyReturnTest() {
if (tester > 0) {
print 2
return
}
print 1
}
The reason for this bug lies in our current implementation of the
ifcode
and whilecode
instructions, and the
nested execution flow we’re using. Because ifcode
and
whilecode
also overwrite the
pc
value after their nested execute()
calls,
they may override the pc
value just set by
uprocret
!
Consider the following flow:
Looking back to our example, the return
keyword inside
our function sets pc
and is_returning
. We
immediately return to the ifcode
function, whose nested
execute just ended. But the inst_ifcode
function
also sets the PC value, overriding our return
location.
The relevant parts of hoc5
’s inst_ifcode()
are commented below:
int inst_ifcode(void) {
= pc[0].addr;
Addr ifbody_addr ...
= pc[2].addr;
Addr end_addr = pc + 3;
Addr cond_addr (cond_addr);
execute= stack_pop();
Datum d if (d.value) {
(ifbody_addr); // uprocret changes our PC inside here
execute}
= end_addr; // and it's overwritten here... :(
pc return 0;
}
In hoc5
,
inst_ifcode
unconditionally overwrites pc
to
end_addr
How can we make our return
keyword force its way past any enclosing if
and
while
blocks, until it reaches its parent
subrexec
instruction?
is_returning
ChecksAs the UPE authors suggest, a simple (if hacky) fix, is to make these
instructions aware of the our is_returning
state, since
they are part of our ‘pc-change’ flow. Before making any manual changes
to the pc
value, we check whether we’re actually returning
(and so should stop executing). This lets us “bubble up” from a
return
statement, through any enclosing if
s or
while
s, to the subroutine we’re returning from! Here’s our
fixed ifcode
flow:
The actual changes are minor:
inst_ifcode
int inst_ifcode(void) {
...
execute(cond_addr);
Datum d = stack_pop(); // must be value
if (d.value) {
execute(ifbody_addr);
} else if (maybe_elsebody.type == CT_ADDR) {
execute(maybe_elsebody.addr);
}
}
- pc = end_addr;
+ if (!is_returning) {
+ pc = end_addr;
+ }
return 0; }
inst_whilecode
int inst_whilecode(void) {
Addr body_addr = (pc++)->addr;
Addr end_addr = (pc++)->addr;
Addr cond_addr = pc;
while (true) {+ if (is_returning) {
+ return 0; // PC is already set to the appropriate address
+ }
execute(cond_addr);
if (stack_pop().value) {
execute(body_addr);
} else {
break;
}
}
// condition false
pc = end_addr;
return 0; }
return
StatementsTo allow users to trigger the return, we add a return
keyword to our token declarations and language
builtins:
- %token <hoc_symbol> IF_KW ... FUNC_KW PROC_KW
+ %token <hoc_symbol> IF_KW ... FUNC_KW PROC_KW RETURN_KW
static struct {
const char *name;
int token_type;
} keywords[] = {
{"if", IF_KW},
{"else", ELSE_KW},
{"func", FUNC_KW},
{"proc", PROC_KW},
{"print", PRINT_KW},
{"while", WHILE_KW},+ {"return", RETURN_KW},
{NULL, 0}, };
And a return statement to install the
uprocret
machine instruction. The return statement has an
interesting new feature: it should only be used while
defining a subroutine.
Adding the keyword itself is straightforward; we add the
RET_KEYWORD
to hoc.y
and the associated
"return"
string to builtins.c
.
But in order to prevent users from writing a return
statement outside of a subroutine, we’ll need to know, while executing
the parser action, whether we’re currently defining a
subroutine or not.
To do this, we’ll add a new piece of parser state:
current_subr
.
/** Tracks the symbol for the subroutine currently being parsed (if any). */
*current_subr = NULL; Symbol
While our parser is parsing a subroutine, we want this variable to
track the Symbol
of that subroutine. Once we’ve finished
parsing the routine, we must reset it to NULL
. This is most
easily done inside our subr_decl
action:
subr_decl: PROC_KW VAR
{
$2->type = UPROC;+ current_subr = $2;
$$ = $2;
}
| FUNC_KW VAR
{
$2->type = UFUNC;+ current_subr = $2;
$$ = $2;
} ;
And then removing the definition after parsing the body
stmt: ...
| subr_decl '(' ')' stmt
{
reserve_subr($1);+ current_subr = NULL;
$$ = $4; }
Once we have access to the current subroutine, enforcing that return statements can’t happen outside of them is trivial.
stmt: expr { install_instruction(inst_pop); }
...+ | RETURN_KW
+ {
+ if (current_subr == NULL) {
+ exec_error("Syntax error - "
+ "`return` keyword used outside of a subroutine");
+ }
+ if (current_subr->type != UPROC) {
+ exec_error("Syntax error - "
+ "bare return used outside of a procedure");
+ }
+ $$ = install_instruction(inst_uprocret);
+ }
It’s all well and good to allow users to return from their
procedures, but what if they forget, or their procedure simply runs
until it ends? We need to make absolutely sure that our
procedures will return to their caller’s location. To do this, we can
simply force every procedure to end with a
uprocret
instruction, during our parser action for
procedure declarations. At this point, the procedure body has been
installed, so we can safely add one more instruction to the end of the
procedure’s body before reserving space for it in our machine.
stmt: ...
| subr_name '(' ')' stmt
{+ /*
+ * Note: the order of these two lines matters!
+ * `uprocret` should be part of the subr body,
+ * so reserve space *after* installing it.
+ */
+ install_instruction(inst_uprocret);
reserve_subr($1);
current_subr = NULL;
$$ = $4; }
Forcing returns from procedures hoc6/hoc.y
This is another good place to test your implementation; no additional code should be required for defining & calling procedures.
At this point, we can declare procedures and then call them. Next, we will implement passing data into our subroutines, via arguments, and then accessing that data inside our subroutine bodies.
Arguments to our subroutines will be managed on our data stack, must be sure to add and remove them on entrance and exit.
First, we must decide how to encode arguments for our machine to find
when executing subrexec
. We’ve seen one way of passing
arguments already in hoc3
, when using BUILTIN
functions. Those functions had a fixed number of arguments; those
values were stored on the data stack, and each inst_callN
variant knew how many to remove.
Our subroutines, on the other hand, allow an arbitrary number of arguments. During a call, arguments will be placed onto the (data) stack, and stay there for the entire duration of the call. In order for the subroutines to use the arguments, we must be able to locate them on the stack; furthermore, to avoid invalid memory accesses, we should check that an argument exists before trying to access it.
We’ll encode the number of arguments and their stack
location inside our Frame
during our
subrexec
call. The number of arguments will be generated as
part of the subrexec
instruction, to ensure that we only
access paramters that actually exist. The stack location is
only known at call time, so we’ll update our subrexec
instruction to populate that value.
Then, our subroutine bodies will be able to generate instructions
that can lookup argument values, using this data from the current
Frame
.
First, we’ll update our Frame
struct with fields to
represent arguments passed.
typedef struct Frame {
Symbol *subr_called; // the UFUNC/UPROC we have jumped to
Addr ret_pc; // return location+ Datum *argp; // pointer to first argument
+ int arg_count; // number of arguments
} Frame;
Arguments add two additional responsibilities to our parser. The first is some additional complexity around the addresses of our procedure-call statements. Consider the hoc statement:
callMyProcedure()
Previously, this stmt
generated a single, two-word
instruction, subrexec
, and the stmt
production
for the call produced the address of the first word.
%%
stmt: ...
'(' ')'
| UPROC
{$$ = install_instruction(inst_subrexec);
($1);
install_ref }
Now, if we instead call a subroutine that requires arguments, like:
add3(6, 14, 7)
Then the address of the subrexec
instruction is
not sufficient. Like BUILTIN
s, we require the
address where our argument expressions begin to be
generated.
The second need is our requirement that the parser should tell us how many arguments were passed in a given call, so that it can generate our new version of subrexec, which includes the number of arguments.
You can see the required information in the diagram below:
We need a new piece of syntax for tracking the start address and number of argument expressions in a call.
To track a “count” value, we’ll add yet another new value type to our
parser – integer
. This’ll be
useful for cases where we want to refer to an “index” or “count”, rather
than trying to use a double
to specify a whole number
value, such as “3 arguments passed” or “get argument for parameter
2”.
%union {
double hoc_value;
Symbol *hoc_symbol;
Addr hoc_address;+ int hoc_integer;
} %type <hoc_symbol> assignable subr_name
Since we need two new pieces of information, we’ll create two new non-terminals to represent them inside our subroutine calls.
First, we’ll create a non-terminal, arglist
, that accepts a list of
expressions, and produces the number of expressions in
that list.
%type <hoc_symbol> assignable subr_decl+ %type <hoc_integer> arglist
For our arglist
production, the expressions themselves
are still installed by the expr
nonterminal; the
arglist
action only cares about how many expressions it has
seen. It counts them by using itself recursively; we also include a
special case for a single argument.
%%
arglist: /* nothing */ { $$ = 0; }
$$ = 1; }
| expr { ',' expr { $$ = $1 + 1; } | arglist
Next, we need to track the address where our argument expressions
begin to be installed. We’ve done this kind of thing before, with the
end
production in our if
and
while
statements. We create a new args_start
production to to “mark our place” when we start parsing an argument
list.
It produces an address:
- %type <hoc_address> assign call end expr if stmt stmtlist while
- %type <hoc_address> args_start assign call end expr if stmt stmtlist while
And it operates almost like the end
production; however,
it doesn’t install anything.
%%
args_start: /* nothing */ { $$ = progp; }
;
Finally, we can add these nonterminals into our procedure and function call syntax.
Procedure Calls
stmt: expr { install_instruction(inst_pop); }
| '{' stmtlist '}' { $$ = $2; }- | UPROC '(' ')'
+ | UPROC '(' args_start arglist ')'
{- $$ = install_instruction(inst_subrexec);
+ install_instruction(inst_subrexec);
install_ref($1);+ install_literal($4);
+ $$ = $3;
}
Function Calls
expr: NUMBER
...- | UFUNC '(' ')'
+ | UFUNC '(' args_start arglist ')'
{- $$ = install_instruction(inst_subrexec);
+ install_instruction(inst_subrexec);
install_ref($1);+ install_literal($4);
+ $$ = $3;
}
Then we’ll need to update inst_subrexec
to handle the
new information in our instruction format; upon reaching an
subrexec
call, we’ve got the arguments for the call at the
top of the stack, so we set our argp
value to
point to the first argument’s Datum
. Since we’ve
got a stack of arguments, the first argument is exactly
arg_count
indices behind our stackp
pointer.
int inst_subrexec(void) {
Symbol *sym = (pc++)->symbol;+ double arg_count = (pc++)->literal;
Frame *f = frame_push(sym, pc);+ f->arg_count = arg_count;
+ f->argp = stackp - f->arg_count;
execute(sym->data.addr);
is_returning = false;
return 0; }
In order to actually use our arguments within the body of a
procedure or function, we need to transform references like $1
, $2
, etc. into stack
accesses, which either fetch or
overwrite a value on the stack.
argget
The basic idea will be to parse a parameter
reference $k
into a new
argget
instruction. This
instruction is two words long; the second word, k, specifies the parameter number we
want to access. The machine will:
arg_count
argp
argset
Assigning a value to an argument will work similarly; the value at
the top of the stack replaces the Datum
value for argument
k. It’s then put back
at the top of the stack, since assignments are expressions.
First, we’ll need a way to recognize parameter references, and turn
the reference $k
into the literal
integer k.
Lucky for us, we just added a new “integer” value-type into our
language, so we can associate a new PARAM_NUMBER
token with
that type. (Remember: It’s capitalized because it’s a terminal.)
%type <hoc_symbol> assignable udef_name
%type <hoc_integer> arglist+ %token <hoc_integer> PARAM_NUMBER
Then, we can update our yylex()
function to handle parameters as well. A $
with no prefix or other surrounding
characters is only valid as a parameter reference, so we can place this
conditional logic anywhere before our final switch
statement.
If scanf
fails to read an integer, or we get a
non-positive integer value, we consider it a lexing failure. This allows
more informative error messages, but whether to perform this much
semantic validation inside your lexer is a matter of opinion.
int yylex(void) {
int c;
...
if (c == '\n') {
line_number++;
}
+ // Parse parameters
+ if (c == '$') {
+ if (scanf("%d", &yylval.hoc_integer) != 1) {
+ exec_error("Invalid parameter specification", "");
+ } else if (yylval.hoc_integer <= 0) {
+ exec_error("Parameters must be nonnegative", "");
+ }
+ return PARAM_NUMBER;
+ }
We can use our PARAM_NUMBER
token to write new
expr
productions for evaluating or assigning a parameter.
Like the productions for return
statements, these should
only be allowed while defining a procedure.
expr:
...+ | PARAM_NUMBER
+ {
+ if (current_subr == NULL) {
+ exec_error("Syntax error - "
+ "Parameter '$%d' not allowed here", $1);
+ }
+ $$ = install_instruction(inst_argget);
+ install_literal($1);
+ }
+ | PARAM_NUMBER '=' expr
+ {
+ if (current_subr == NULL) {
+ exec_error("Syntax error - "
+ "Parameter '$%d' not allowed here", $1);
+ }
+ $$ = install_instruction(inst_argset);
+ install_literal($1);
+ }
| expr '+' expr { install_instruction(inst_add); }
argget
and argset
The inst_argget
and inst_argset
instructions use the Frame.argp
pointer to access the
correct location in the stack. The only arithmetic required is
converting the parameter numbers into zero-indexed offsets.
int inst_argget(void) {
int arg_number = (pc++)->literal;
*current_frame = frame_peek();
Frame if (current_frame->arg_count < arg_number) {
("Subroutine '%s' uses parameter %d, but only got %d.",
exec_error->subr_called->name,
current_frame,
arg_number->arg_count);
current_frame}
(current_frame->argp[arg_number - 1]);
stack_pushreturn 0;
}
int inst_argset(void) {
int arg_number = (pc++)->literal;
*current_frame = frame_peek();
Frame if (current_frame->arg_count < arg_number) {
("Subroutine '%s' uses parameter %d, but only got %d.",
exec_error->subr_called->name,
current_frame,
arg_number->arg_count);
current_frame}
= stack_pop();
Datum d ->argp[arg_number - 1] = d;
current_frame(d);
stack_pushreturn 0;
}
There’s one last detail to handle related to argument values. We need
to remove them when we return from our subroutines. To that
end, we add a small helper function to the machine
module,
so that both procedures and functions can remove their arguments once
complete.
/**
* Reset any machine state indicated by the given frame. This includes:
* - Removing any arguments from the stack for the call.
* - Resetting the program counter
* - Setting global "returning" flag
*/
static void return_from_frame(Frame *f) {
// remove args from stack
while (f->arg_count-- > 0) {
();
stack_pop}
= true;
is_returning = f->ret_pc;
pc }
Our uprocret
instruction can then be made a little
simpler.
int inst_uprocret(void) {
Frame *f = frame_pop();- is_returning = true;
- pc = f->ret_pc;
+ undo_frame_state(f);
return 0;+ }
Finally, we can make the finishing touches that allow us to make
functions work. Since calling a function produces a
value, a function call must be an expression rather than a statement.
Furthermore, a function must include the
return
keyword (with a value), which we’ll need to check at
runtime rather than parse-time.
The ufuncret
instruction is used for returning from a
function, it should be used just after our return value has
been produced and pushed onto the stack. There’s just one problem:
there’s a bunch of arguments underneath it, and they need to
go! So, the ufuncret
instruction removes the top stack
value, saves it, then calls return_from_frame
to remove the
arguments from the stack. After that’s complete, we can re-push
the expected return value onto the top of the stack, which will be
available to the caller once control returns to them.
int inst_ufuncret(void) {
*f = frame_pop();
Frame = stack_pop();
Datum result (f);
undo_frame_state(result);
stack_pushreturn 0;
}
The format of a value-return is simply
return <expr>
, which means that the expression will
already have pushed the value onto the data stack by the time we install
the return instruction. This makes the action for value-return
statements simple; it’s only real job is verifying that we’re actually
inside a function; otherwise, we disallow the installation of a
ufuncret
instruction.
| RETURN_KW expr+ {
+ if (current_subr == NULL) {
+ exec_error("Syntax error - "
+ "return keyword used outside of a subroutine");
+ }
+ if (current_subr->type != UFUNC) {
+ exec_error("Syntax error: -"
+ "`return <expr>` used outside of a function");
+ }
+ install_instruction(inst_ufuncret);
+ $$ = $2;
+ }
+
When a function is installed, our parser will still end it
with the uprocret
instruction, since we use the same code
to install both types of subroutine. Unlike procedures, however,
functions should never reach that instruction. We should always
hit a ufuncret
instruction first.
To make sure users understand what they did wrong, we will update the
uprocret
instruction to raise an error at runtime
if it detects that we’re inside a function, since that means we
forgot to return a value
int inst_uprocret(void) {
Frame *f = frame_pop();
+ if (f->def_called->type != UPROC) {
+ exec_error("Function '%s' does not return a value", f->subr_called->name);
+ }
undo_frame_state(f);
return 0; }
And with that, our function implementation should be complete!
We add a (very) minimal implementation of strings for use in
print
statements. This is their only allowed use; in
particular, they cannot be used as variables. In this version of
hoc
, they also leak memory.
The implementation adds a new string-printing instruction, which has a two-word layout:
This only requires support for parsing and installing strings into our machine’s memory.
We begin by writing a (very) small dynamic-string module, hoc6/str.c, based on the common
len
+ capacity
implementation. For the sake of
brevity (hah), we don’t discuss the implementation here,
but we’ll come back to strings in ehoc
. Its interface is
all we’re interested in.
typedef struct str str;
/** Allocate a string; runtime error on allocation failure */
*str_new(void);
str
/** Append a character to the end of a string */
bool str_append(str *s, char c);
/**
* Get a "view" of string `s` for use as a valid, NUL-terminated C
* string. The pointer returned is non-owning, and will be
* invalidated by any manipulations made to the string.
*
* In particular, if the caller wishes to store the resulting char
* buffer for other uses, it should be copied.
*/
const char *str_view(str *s);
/** Clear the string's contents */
void str_clear(str *s);
/** Deallocate `s`, freeing its memory. */
void str_free(str *s);
The dynamic string interfacehoc6/hoc.h
We’ll use this library to allocate storage for strings as they’re parsed.
The biggest change required for string parsing is recognizing string literals and allocating strings for them during the lexing process.
First we add the STRING
token
%union {
double hoc_value;
Symbol *hoc_symbol;
Addr hoc_address;
int hoc_integer;+ str *hoc_string;
}
%token <hoc_value> NUMBER
%token <hoc_symbol> VAR CONST BUILTIN UNDEF UFUNC UPROC COMMAND
/* These keyword symbols are efectively singletons */
%token <hoc_symbol> IF_KW ELSE_KW PRINT_KW WHILE_KW FUNC_KW PROC_KW RETURN_KW READ_KW
%type <hoc_symbol> assignable udef_name
%type <hoc_integer> arglist
%token <hoc_integer> PARAM_NUMBER+ %token <hoc_string> STRING
We’ll force all strings to be double-quoted (though they may contain escaped quotes), and so once we see a double-quote, we can collect characters until we reach the next one.
int yylex(void) {
...
// Parse parameters
...+ // Parse strings
+ if (c == '"') {
+ str *s = str_new();
+ while (true) {
+ c = getchar();
+ if (c == '"') {
+ break;
+ }
+ if (c == EOF || c == '\n') {
+ exec_error("String wasn't closed: '%s'", str_view(s));
+ }
+ str_append(s, escaped(c));
+ }
+ yylval.hoc_string = s;
+ return STRING;
+ }
+
// Get numbers ...
The only interesting detail is how to escape special characters; the
escaped
function allows us to handle this
transparently.
/**
* If c is '\': Returns next input character, string-escaped.
* Otherwise: returns c
*/
int escaped(char c) {
// Each escaped character is followed by its output char
static char translations[] = "b\b"
"f\f"
"n\n"
"t\t";
if (c != '\\') {
return c;
}
= getchar();
c char *cp = strchr(translations, c);
// islower() used so that we don't match the output chars or NUL
if (islower(c) && cp != NULL) {
return cp[1];
}
return c;
}
While we’re working on yylex()
, we might as well fixup
our other use of static strings in hoc: Symbol
names. We
only allowed 100-character identifiers before, whereas we could easily
support more than that. We replace our use of a static buffer with a
str
object instead. (Note that because
install_symbol
copies the string passed to
it, we are safe to pass it a str_view
here.)
If you want to leave the identifier limit in place, you could do so
by leaving the hoc.h
constant and while
condition check in place. Otherwise, you can remove
SYMBOL_NAME_LIMIT
from hoc.h
as well.
We’ll use a single static str
for all lexing calls;
there’s no reason to allocate & free memory for these repeatedly.
Its internal memory use will expand to the maximum identifier length
used during hoc
’s runtime.
int yylex(void) {+ static str *symbol_str = NULL;
int c;
When we find the beginning of a symbol, we’ll reset our string, instead of setting up a buffer.
// Get numbers
...
if (isalpha(c)) {
Symbol *s;
- char buf[SYMBOL_NAME_LIMIT + 1];
- size_t nread = 0;
+ // reset symbol string for new input
+ if (symbol_str == NULL) {
+ symbol_str = str_new();
+ } else {
+ str_clear(symbol_str);
+ }
...
Then, we can remove the SYMBOL_NAME_LIMIT
checks and
replace character updates with str
functions.
do {- buf[nread++] = c;
+ str_append(symbol_str, c);
c = getchar();- } while (nread < SYMBOL_NAME_LIMIT && (isalpha(c) || isdigit(c)));
+ } while (isalpha(c) || isdigit(c));
- // Just in case we exceeded the limit
- while ((isalpha(c) || isdigit(c))) {
- c = getchar();
- }
-
// at this point, we have a non-alphanumeric 'c'
ungetc(c, stdin);
- buf[nread] = '\0';
+ const char *buf = str_view(symbol_str);
if ((s=lookup(buf)) == NULL) {
s = install_symbol(buf, UNDEF, 0.0);
}
yylval.hoc_symbol = s;
return (s->type == UNDEF ? VAR : s->type); }
MachineCode
to hold stringsWe’ll add yet another potential type for MachineCode
elements, allowing a word of our VM memory to point to a string.
struct MachineCode {
CodeType type;
union {
double literal;
Inst inst;
Symbol *symbol;
Addr addr;+ str *string;
}; };
MachineCode with str
addition hoc6/hoc.h
prstr
instructionOnce our parser can install prstr
instructions, actually
printing the string is trivial. We use our str_view
method
to create a C string from the string pointer; since we print it
immediately, we don’t need to do any defensive copying. (Make sure to
add a prototype for inst_prstr
to hoc.h
.)
int inst_prstr(void) {
*s = (pc++)->string;
str ("%s", str_view(s));
printfreturn 0;
}
(Note the lack of \n
in the output format string. This
will be useful in the next output feature: printing multiple
expressions)
print
StatementNow to combine our string-printing functionality with the existing expression-printing.
We’ll replace our print expr
statement with a
generalized prlist
nonterminal.
stmt: ...- | PRINT_KW expr
- {
- install_instruction(inst_prexpr);
- $$ = $2;
- }
+ | PRINT_KW prlist
+ {
+ $$ = $2;
+ }
...
This nonterminal handles any nonzero number of STRING
or
expr
elements, and prints each of them. We can base our
list-handling rules on our arglist
production.
%%
prlist: expr
{(inst_prexpr);
install_instruction$$ = $1;
}
| STRING
{$$ = install_instruction(inst_prstr);
($1);
install_string
}',' expr
| prlist
{(inst_prexpr);
install_instruction$$ = $1;
}',' STRING
| prlist
{(inst_prstr);
install_instruction($3);
install_string$$ = $1;
}
To complete our minimal I/O implementation, we’ll add features to read data from input files or the user’s terminal. In fact, we’ll allow specifying multiple files on the command line, and proceed from one to the next as they’re exhausted.
The UPE authors add the input-reading code to hoc.y
. For
hoc6
, we’ll instead add a new inputfiles
module, though the functionality is the same. We’ll revisit input/output
organization in adhoc
.
The call-graph for our user-input flow follows the following simplified diagram:
The main program initializes our input sources based on the
command-line arguments, using the init_inputfiles
interface:
/**
* Initializes the inputfiles module with the first `filecount`
* members of the `filenames` array; Each should be a file path.
*/
void init_inputfiles(char **argv, int filecount);
To actually read data, we’ll use a new varread
instruction, installed by the parser when we see a read
expression. That machine instruction will use the only other function in
the inputfiles
module: read_double()
.
/**
* Returns true if a double value could be read from the runtime
* input files, false otherwise. `dest` will be set to the value read,
* if any, or zero.
*
* If input data exists but is in an invalid format, causes an
* execution error.
*/
bool read_double(double *dest);
The user input layer of hoc is built around a set of zero or more
input files, specified on the command line. Any one of these
files may be stdin
, which is specified by the filename
-
.
As clients use the read_double
function to read data,
internally, the input layer reads from the the “current inputfile”.
Whenever an EOF
is read, the layer transparently switches
to the next input file and restarts the read, so that the caller is
unaware.
Since the user specified a list of input files on the command line,
most of our state revolves around pointers into the
filenames
array. We also need a FILE*
for the
current input file.
static FILE *curr_inputfile; // the FILE* we're reading from
static char **next_filename; // next filename to read
static char **end_namep; // represents "no more files"
Our inputfiles
module must have its global state
initialized before it can be used; our initialization function properly
sets curr_inputfile
and curr_namep
.
void init_inputfiles(char **filenames, int filecount) {
= NULL;
curr_inputfile = filenames;
next_filename = filenames + filecount;
end_namep (); // ensures curr_inputfile is either valid or NULL
next_inputfile}
At the beginning of main
, we
call
init_inputfiles() with any elements of the
argvarray after the program name, assuming all command-line arguments are input files. This ensures that all calls to our input functions have a valid
curr_inputfile, and if it's
NULL`,
we are out of input.
int main(int argc, char *argv[]) {
program_name = argv[0];+ init_inputfiles(argv + 1, argc-1);
install_hoc_builtins(); ...
We call next_inputfile
to switch to the next input file
whenever we read an EOF
. It returns true if another input
file is found, or false if all specified inputs are exhausted.
/**
* Change curr_inputfile to point to the next input file, if
* any. Closes any existing input file (unless that file points to
* stdin).
*
* Returns true if there is more input to read; false otherwise.
*/
static bool next_inputfile(void) {
if (curr_inputfile != NULL && curr_inputfile != stdin) {
(curr_inputfile);
fclose}
= NULL;
curr_inputfile
do {
if (next_filename == end_namep) {
return false;
}
// sync curr_inputfile with curr_namep
if (strcmp(*next_filename, "-") == 0) {
= stdin;
curr_inputfile } else {
= fopen(*next_filename, "r");
curr_inputfile if (curr_inputfile == NULL) {
("Could not open input filename: '%s'", *next_filename);
warning}
}
+= 1;
next_filename } while (curr_inputfile == NULL);
return true;
}
Finally, the read_double
function is implemented to hide
the list of inputs from callers, and instead handles EOF
conditions itself.
bool read_double(double *dest) {
// curr_inputfile is NULL when all input is exhausted
if (curr_inputfile == NULL) {
("Cannot read input data - No input files remaining");
warningreturn false;
}
while (true) {
switch (fscanf(curr_inputfile, "%lf", dest)) {
case EOF:
if (!next_inputfile()) {
return false;
}
break; // otherwise try again
case 0:
("Could not read double from '%s' - invalid data format",
exec_error*curr_namep);
default:
return true;
}
}
}
read
expressionAfter adding a read
keyword to builtins.c
,
we can add a final expression production for our new syntax:
read(destVariable)
. The varread
instruction we
install has two words; the second is the destination variable.
expr: NUMBER
...
| assign
| call+ | READ_KW '(' VAR ')'
+ {
+ $$ = install_instruction(inst_varread);
+ install_ref($3);
+ }
Our final instruction for hoc6
is varread
,
which uses the read_double
interface to assign a variable
value and push a success boolean onto the stack.
int inst_varread(void) {
*dest = (pc++)->symbol;
Symbol bool success = read_double(&dest->data.val);
(dat(success));
stack_pushif (!success) {
->data.val = 0;
dest}
->type = VAR;
destreturn 0;
}
There are a huge number of opportunities to explore in hoc at this point! Naming just a few:
Hopefully you’re as excited as I was upon reaching the end of
hoc6
to keep exploring– there is so much low hanging fruit!
Can you build a language you’d actually want to use in a real task?
If there’s enough interest, I’ll add more details to the
adhoc
source code where some of these ideas are explored.
Please feel free to drop me a
line; feedback is always welcome!
?= -std=c2x -Wall -Wextra -pedantic -Wformat -Wformat-extra-args -Wformat-nonliteral
CFLAGS ?= -d
YFLAGS =-lm
LDFLAGS:= builtins.o math.o symbol.o machine.o str.o hoc.o inputfiles.o
objects
.PHONY:all
: hoc
all
: $(objects)
hoc(CFLAGS) $(objects) $(LDFLAGS) -o hoc
cc $
.c hoc.tab.h: hoc.y
hoc(YFLAGS) hoc.y -o hoc.tab.c
yacc $.tab.c hoc.c
mv hoc
%.o: hoc.h hoc.tab.h %.c
(CFLAGS) -c -o $@ $*.c
cc $
.PHONY:clean
:
clean-f hoc.tab.h
rm -f $(objects)
rm -f hoc.c
rm -f hoc rm
#include "hoc.h"
#include "hoc.tab.h"
#include <math.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h> // rand
/* Defined in math.c */
extern double Abs(double), Acos(double), Atan(double), Atan2(double, double),
(double), Exp(double), Integer(double), Lg(double), Ln(double),
Cos(double), Pow(double, double), Random(void), Sin(double), Sqrt(double);
Log10
static struct {
const char *name;
double value;
} constants[] = {
{"PI", 3.14159265358979323846},
{"E", 2.71828182845904523536},
{NULL, 0},
};
static struct {
const char *name;
int args;
union {
double (*call0)(void);
double (*call1)(double);
double (*call2)(double, double);
};
} builtins[] = {
{"abs", 1, .call1 = Abs},
{"acos", 1, .call1 = Acos},
{"atan", 1, .call1 = Atan},
{"atan2", 2, .call2 = Atan2},
{"cos", 1, .call1 = Cos},
{"exp", 1, .call1 = Exp},
{"int", 1, .call1 = Integer},
{"lg", 1, .call1 = Lg},
{"ln", 1, .call1 = Ln},
{"log10", 1, .call1 = Log10},
{"pow", 2, .call2 = Pow},
{"rand", 0, .call0 = Random},
{"sin", 1, .call1 = Sin},
{"sqrt", 1, .call1 = Sqrt},
{NULL, 0, .call0 = NULL},
};
static struct {
const char *name;
int token_type;
} keywords[] = {
{"if", IF_KW},
{"else", ELSE_KW},
{"func", FUNC_KW},
{"proc", PROC_KW},
{"print", PRINT_KW},
{"read", READ_KW},
{"while", WHILE_KW},
{"return", RETURN_KW},
{NULL, 0},
};
extern void command_list(void);
static struct {
const char *name;
void (*command_func)(void);
} commands[] = {
{"LIST", command_list},
{NULL, 0},
};
void install_hoc_builtins(void) {
for (int i = 0; constants[i].name != NULL; i++) {
(constants[i].name, CONST, constants[i].value);
install_symbol}
for (int i = 0; builtins[i].name != NULL; i++) {
*s = install_symbol(builtins[i].name, BUILTIN, 0.0);
Symbol // TODO: Why not have separate install_value vs install_function?
->data.func.args = builtins[i].args;
sswitch (builtins[i].args) {
case 0:
->data.func.call0 = builtins[i].call0;
sbreak;
case 1:
->data.func.call1 = builtins[i].call1;
sbreak;
case 2:
->data.func.call2 = builtins[i].call2;
sbreak;
default:
(
fprintf, "Invalid arg count for symbol '%s'\n", builtins[i].name);
stderr(1);
exit}
}
for (int i = 0; keywords[i].name != NULL; i++) {
(keywords[i].name, keywords[i].token_type, 0.0);
install_symbol}
for (int i = 0; commands[i].name != NULL; i++) {
*s = install_symbol(commands[i].name, COMMAND, 0.0);
Symbol ->data.command_func = commands[i].command_func;
s}
}
#ifndef HOC_INC
#define HOC_INC 1
/*
* Global types & declarations for use in hoc
*/
#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
///----------------------------------------------------------------
/// strings
///----------------------------------------------------------------
typedef struct str str;
/** Allocate a string; runtime error on allocation failure */
*str_new(void);
str
/** Append a character to the end of a string */
bool str_append(str *s, char c);
/**
* Get a "view" of string `s` for use as a valid, NUL-terminated C
* string. The pointer returned is non-owning, and will be
* invalidated by any manipulations made to the string.
*
* In particular, if the caller wishes to store the resulting char
* buffer for other uses, it should be copied.
*/
const char *str_view(str *s);
/** Clear the string's contents */
void str_clear(str *s);
/** Deallocate `s`, freeing its memory. */
void str_free(str *s);
///----------------------------------------------------------------
/// Addresses (Shared by multiple types)
///----------------------------------------------------------------
typedef struct MachineCode MachineCode;
typedef MachineCode *Addr;
///----------------------------------------------------------------
/// Symbols
///----------------------------------------------------------------
typedef struct Symbol Symbol;
/** Function descriptor for builtins: these always return doubles */
struct BuiltinFunc {
int args;
union {
/* We use an anonymous union for the function pointer; there's
no meaningful name for the union as a field */
double (*call0)(void);
double (*call1)(double);
double (*call2)(double, double);
};
};
struct Symbol {
short type; // Generated from our token types
char *name;
*next;
Symbol union {
double val;
;
Addr addrstruct BuiltinFunc func;
void (*command_func)(void);
} data;
};
*install_symbol(const char *name, short type, double value);
Symbol *lookup(const char *name);
Symbol
///----------------------------------------------------------------
/// Machine Code
///----------------------------------------------------------------
typedef int (*Inst)(void);
typedef enum CodeType {
= 'I',
CT_INST = 'L',
CT_LITERAL = 'S',
CT_SYM_REF = 'A',
CT_ADDR = 'C',
CT_CHRSTR } CodeType;
struct MachineCode {
;
CodeType typeunion {
double literal;
;
Inst inst*symbol;
Symbol ;
Addr addr*string;
str }; // anonymous union; no need to name it
};
#define STOP_INST NULL
#define is_stop_inst(mc) ((mc)->type == CT_INST && (mc)->inst == STOP_INST)
///----------------------------------------------------------------
/// Machine Program
///----------------------------------------------------------------
extern MachineCode prog[];
extern Addr prog_start;
/**
* Installs the given `MachineCode` object into the next available
* location in the machine's current program. A runtime error will
* occur if the program is full.
*
* Returns the address at which the code was installed.
*/
(MachineCode mc);
Addr install_code
/** The next available address in the machine's current program */
extern Addr progp;
/**
* Reserves all currently-installed code as a subroutine, under Symbol
* `subr_sym`. The value of `subr_sym` will be updated to the address
* of the installed routine. Ensures that the code will not be
* overwritten by future installations, even after a machine reset.
*/
void reserve_subr(Symbol *subr_sym);
#define install_literal(lit) \
install_code((struct MachineCode){.type = CT_LITERAL, .literal = lit})
#define install_ref(sp) \
install_code((struct MachineCode){.type = CT_SYM_REF, .symbol = sp})
#define install_instruction(instr) \
install_code((struct MachineCode){.type = CT_INST, .inst = instr})
#define install_address(address) \
install_code((struct MachineCode){.type = CT_ADDR, .addr = address})
#define install_string(s) \
install_code((struct MachineCode){.type = CT_CHRSTR, .string = s})
void execute(Addr start_addr);
///----------------------------------------------------------------
/// Instructions
///----------------------------------------------------------------
int inst_add(void);
int inst_and(void);
int inst_argget(void);
int inst_argset(void);
int inst_assignvar(void);
int inst_assignconst(void);
int inst_call(void);
int inst_div(void);
int inst_eval(void);
int inst_eq(void);
int inst_gt(void);
int inst_gte(void);
int inst_ifcode(void);
int inst_lt(void);
int inst_lte(void);
int inst_mul(void);
int inst_neq(void);
int inst_not(void);
int inst_or(void);
int inst_pop(void);
int inst_prexpr(void);
int inst_print(void);
int inst_prstr(void);
int inst_pushlast(void);
int inst_pushlit(void);
int inst_pushref(void);
int inst_sub(void);
int inst_subrexec(void);
int inst_ufuncret(void);
int inst_uprocret(void);
int inst_varread(void);
int inst_whilecode(void);
///----------------------------------------------------------------
/// Commands
///----------------------------------------------------------------
void command_list(void);
void machine_run_command(Symbol *);
///----------------------------------------------------------------
/// Machine Stack
///----------------------------------------------------------------
typedef union Datum {
double value;
*symbol;
Symbol } Datum;
void machine_startup(void);
void machine_reset_program(void);
///----------------------------------------------------------------
/// Runtime Functions
///----------------------------------------------------------------
/** input files */
/**
* Initializes the inputfiles module with the first `filecount`
* members of the `filenames` array; Each should be a file path.
*/
void init_inputfiles(char **filenames, int filecount);
/**
* Returns true if a double value could be read from the runtime
* input files, false otherwise. `dest` will be set to the value read,
* if any, or zero.
*
* If input data exists but is in an invalid format, causes an
* execution error.
*/
bool read_double(double *dest);
/** global error handling */
((format(printf, 1, 2))) void warning(const char *msg, ...);
__attribute__((format(printf, 1, 2))) void exec_error(const char *msg, ...);
__attribute__
/** builtins */
void install_hoc_builtins(void);
#endif // HOC_INC
%{
///----------------------------------------------------------------
/// C Preamble
///----------------------------------------------------------------
/*
* "higher-order calculator" - Version 5
* From "The UNIX Programming Environment"
*/
#include <stdbool.h>
#include <stdio.h>
#include <ctype.h>
#include <signal.h>
#include <setjmp.h>
#include <stdarg.h>
#include <string.h>
#include "hoc.h"
///----------------------------------------------------------------
/// global state
///----------------------------------------------------------------
/** Tracks the symbol for the subroutine currently being parsed (if any). */
*current_subr = NULL;
Symbol
;
jmp_buf begin
///----------------------------------------------------------------
/// local declarations
///----------------------------------------------------------------
#define YACC_SUCCESS 0
int yylex(void);
void yyerror(const char *s);
void check_call_args(const Symbol *s, int actual);
%}
/*
* Grammar points:
*
* - EXPRESSIONS provide a value on the top of the satck.
* - STATEMENTS are not expressions, and do not provide a value.
*/
%union {
double hoc_value;
*hoc_symbol;
Symbol ;
Addr hoc_addressint hoc_integer;
*hoc_string;
str }
%token <hoc_value> NUMBER
%token <hoc_symbol> VAR CONST BUILTIN UNDEF UFUNC UPROC COMMAND
/* These keyword symbols are efectively singletons */
%token <hoc_symbol> IF_KW ELSE_KW PRINT_KW WHILE_KW FUNC_KW PROC_KW RETURN_KW READ_KW
%type <hoc_symbol> assignable subr_decl
%type <hoc_integer> arglist
%token <hoc_integer> PARAM_NUMBER
%token <hoc_string> STRING
%type <hoc_address> args_start assign call end expr if prlist stmt stmtlist while
%right '=' /* right-associative, much like C */
%left OR
%left AND
%left EQ GT GTE LT LTE NEQ
%left '+' '-' /* left-associative, same precedence */
%left '*' '/' /* left-assoc; higher precedence */
%right '^' /* exponents */
%left UNARY_MINUS NOT /* Even higher precedence
than mul or div. Can't use '-', as we've used
it already for subtraction. */
%%
: list
start{
(STOP_INST);
install_instruction;
YYACCEPT}
;
: /* nothing */
list| list terminator
| list stmt terminator
{
(STOP_INST);
install_instruction;
YYACCEPT}
| list assign terminator
{
(inst_pop);
install_instruction(STOP_INST);
install_instruction;
YYACCEPT}
| list expr terminator
{
(inst_print);
install_instruction(STOP_INST);
install_instruction;
YYACCEPT}
| list error terminator { yyerrok; }
;
: assignable '=' expr
assign{
= $3;
$$ (inst_pushref);
install_instruction($1);
install_ref(inst_assignvar);
install_instruction}
| assignable ':' '=' expr
{
= $4;
$$ (inst_pushref);
install_instruction($1);
install_ref(inst_assignconst);
install_instruction}
;
: /* nothing */ { $$ = 0; }
arglist| expr { $$ = 1; }
| arglist ',' expr { $$ = $1 + 1; }
;
: expr { install_instruction(inst_pop); }
stmt| '{' stmtlist '}' { $$ = $2; }
| subr_decl '(' ')' stmt
{
/*
* Note: the order of these two lines matters!
* `uprocret` should be part of the subr body,
* so reserve space *after* installing it.
*/
(inst_uprocret);
install_instruction($1);
reserve_subr= NULL;
current_subr = $4;
$$ }
| UPROC '(' args_start arglist ')'
{
(inst_subrexec);
install_instruction($1);
install_ref($4);
install_literal= $3;
$$ }
| RETURN_KW
{
if (current_subr == NULL) {
("Syntax error - "
exec_error"`return` keyword used outside of a subroutine");
}
if (current_subr->type != UPROC) {
("Syntax error - "
exec_error"bare return used outside of a procedure");
}
= install_instruction(inst_uprocret);
$$ }
| RETURN_KW expr
{
if (current_subr == NULL) {
("Syntax error - "
exec_error"return keyword used outside of a subroutine");
}
if (current_subr->type != UFUNC) {
("Syntax error - "
exec_error"`return <expr>` used outside of a function");
}
(inst_ufuncret);
install_instruction= $2;
$$ }
| while cond stmt end
{
1[0].type = CT_INST;
$1[0].inst = inst_whilecode;
$1[1].type = CT_ADDR;
$1[1].addr = $3;
$1[2].type = CT_ADDR;
$1[2].addr = $4;
$}
| if cond stmt end
{
1[0].type = CT_INST;
$1[0].inst = inst_ifcode;
$1[1].type = CT_ADDR;
$1[1].addr = $3;
$1[3].type = CT_ADDR;
$1[3].addr = $4;
$}
| if cond stmt end ELSE_KW stmt end
{
1[0].type = CT_INST;
$1[0].inst = inst_ifcode;
$1[1].type = CT_ADDR;
$1[1].addr = $3;
$1[2].type = CT_ADDR;
$1[2].addr = $6;
$1[3].type = CT_ADDR;
$1[3].addr = $7;
$}
| COMMAND
{
($1);
machine_run_command= progp;
$$ }
| PRINT_KW prlist
{
= $2;
$$ }
;
: expr
prlist{
(inst_prexpr);
install_instruction= $1;
$$ }
| STRING
{
= install_instruction(inst_prstr);
$$ ($1);
install_string}
| prlist ',' expr
{
(inst_prexpr);
install_instruction= $1;
$$ }
| prlist ',' STRING
{
(inst_prstr);
install_instruction($3);
install_string= $1;
$$ }
;
: /* nothing */ { $$ = progp; }
stmtlist| stmtlist stmt terminator { $$ = $1; }
| stmtlist terminator
;
: '(' expr ')'
cond{
(STOP_INST);
install_instruction}
;
while: WHILE_KW
{
= install_instruction(STOP_INST);
$$ (STOP_INST);
install_instruction(STOP_INST);
install_instruction}
;
if: IF_KW
{
= install_instruction(STOP_INST);
$$ (STOP_INST);
install_instruction(STOP_INST);
install_instruction(STOP_INST);
install_instruction}
;
: /* nothing */
end{
(STOP_INST);
install_instruction= progp;
$$ }
;
: BUILTIN '(' ')'
call{
($1, 0);
check_call_args= install_instruction(inst_pushref);
$$ ($1);
install_ref(inst_call);
install_instruction}
| BUILTIN '(' expr ')'
{
($1, 1);
check_call_args(inst_pushref);
install_instruction($1);
install_ref(inst_call);
install_instruction= $3; /* Block starts with pushing args */
$$ }
| BUILTIN '(' expr ',' expr ')'
{
($1, 2);
check_call_args= install_instruction(inst_pushref);
$$ ($1);
install_ref(inst_call);
install_instruction= $3; /* Block starts with pushing args */
$$ }
;
: FUNC_KW VAR
subr_decl{
= $2;
current_subr 2->type=UFUNC;
$= $2;
$$ }
| PROC_KW VAR
{
= $2;
current_subr 2->type=UPROC;
$= $2;
$$ }
;
/*
* Every expr construct, when *executed*, must push a value onto the
* stack
*/
: NUMBER
expr{
= install_instruction(inst_pushlit);
$$ ($1);
install_literal}
| '@'
{
= install_instruction(inst_pushlast);
$$ }
| assignable
{
= install_instruction(inst_pushref);
$$ ($1);
install_ref(inst_eval);
install_instruction}
| assign
| call
| READ_KW '(' VAR ')'
{
= install_instruction(inst_varread);
$$ ($3);
install_ref}
| UFUNC '(' args_start arglist ')'
{
(inst_subrexec);
install_instruction($1);
install_ref($4);
install_literal= $3;
$$ }
| PARAM_NUMBER
{
if (current_subr == NULL) {
("Syntax error - "
exec_error"Parameter references not allowed here");
}
= install_instruction(inst_argget);
$$ ($1);
install_literal}
| PARAM_NUMBER '=' expr
{
if (current_subr == NULL) {
("Syntax error - "
exec_error"Parameter references not allowed here");
}
= install_instruction(inst_argset);
$$ ($1);
install_literal}
| expr '+' expr { install_instruction(inst_add); }
| expr '-' expr { install_instruction(inst_sub); }
| expr '*' expr { install_instruction(inst_mul); }
| expr '/' expr { install_instruction(inst_div); }
| expr LT expr { install_instruction(inst_lt); }
| expr LTE expr { install_instruction(inst_lte); }
| expr GT expr { install_instruction(inst_gt); }
| expr GTE expr { install_instruction(inst_gte); }
| expr EQ expr { install_instruction(inst_eq); }
| expr NEQ expr { install_instruction(inst_neq); }
| expr AND expr { install_instruction(inst_and); }
| expr OR expr { install_instruction(inst_or); }
| expr '^' expr
{
(inst_pushref);
install_instruction(lookup("pow"));
install_ref(inst_call);
install_instruction}
| '-'expr %prec UNARY_MINUS /* prec makes us bind tighter than subtraction */
{
(inst_pushlit);
install_instruction(-1);
install_literal(inst_mul);
install_instruction=$2;
$$}
| NOT expr %prec NOT
{
(inst_not);
install_instruction=$2;
$$}
| '(' expr ')' { $$ = $2; }
;
: /* nothing */ { $$ = progp; }
args_start;
: '\n'
terminator| ';'
;
: VAR
assignable| CONST
%% // end of grammar
/* error tracking */
char *program_name;
int line_number = 1;
bool reached_eof = false;
int main(int argc, char *argv[]) {
= argv[0];
program_name (argv + 1, argc - 1);
init_inputfiles();
install_hoc_builtins
();
machine_startup
(begin);
setjmp();
machine_reset_programwhile (!reached_eof && yyparse() == YACC_SUCCESS) {
(NULL);
execute();
machine_reset_program}
return 0;
}
/**
* "Looks ahead 1 character for a possibility. If it's found, returns
* `match`. If not:
* - places character back onto input stream for next read
* - returns the `otherwise` result.
*/
static int lookahead(char expect, int match, int otherwise) {
int c = getchar();
if (c == expect) {
return match;
} else {
(c, stdin);
ungetcreturn otherwise;
}
}
/**
* If c is '\': Returns next input character, string-escaped.
* Otherwise: returns c
*/
int escaped(char c) {
// Each escaped character is followed by its output char
static char translations[] = "b\b"
"f\f"
"n\n"
"t\t";
if (c != '\\') {
return c;
}
= getchar();
c char *cp = strchr(translations, c);
// islower() used so that we don't match the output chars or NUL
if (islower(c) && cp != NULL) {
return cp[1];
}
return c;
}
/* our simple, hand-rolled lexer. */
int yylex(void) {
static str *symbol_str = NULL;
int c;
// Skip space
do {
= getchar();
c } while (c == ' ' || c == '\t');
// Skip comments
if (c == '/' && lookahead('/', 0, '/') == 0) {
while (c != '\n' && c != EOF) {
= getchar();
c }
(c, stdin);
ungetc}
/* Newline / EOF handling
*
* The first time we see an EOF, we "fake" a newline,
* so that the list production can recognize any in-progress
* statement. The next call will return the EOF.
*/
if (c == EOF && !reached_eof) {
(c, stdin);
ungetc= true;
reached_eof = '\n';
c }
if (c == '\n') {
++;
line_number}
// Parse parameters
if (c == '$') {
if (scanf("%d", &yylval.hoc_integer) != 1) {
("Invalid parameter specification");
exec_error} else if (yylval.hoc_integer <= 0) {
("Parameter number '%d' invalid - cannot be negative",
exec_error.hoc_integer);
yylval}
return PARAM_NUMBER;
}
// Parse strings
if (c == '"') {
*s = str_new();
str while (true) {
= getchar();
c if (c == '"') {
break;
}
if (c == EOF || c == '\n') {
("String wasn't closed: '%s'", str_view(s));
exec_error}
(s, escaped(c));
str_append}
.hoc_string = s;
yylvalreturn STRING;
}
// Get numbers
if (c == '.' || isdigit(c)) {
(c, stdin);
ungetc("%lf", &yylval.hoc_value);
scanfreturn NUMBER;
}
if (isalpha(c)) {
*s;
Symbol
// reset symbol string for new input
if (symbol_str == NULL) {
= str_new();
symbol_str } else {
(symbol_str);
str_clear}
do {
(symbol_str, c);
str_append= getchar();
c } while (isalpha(c) || isdigit(c));
// Just in case we exceeded the limit
while ((isalpha(c) || isdigit(c))) {
= getchar();
c }
// at this point, we have a non-alphanumeric 'c'
(c, stdin);
ungetc
const char *buf = str_view(symbol_str);
if ((s = lookup(buf)) == NULL) {
= install_symbol(buf, UNDEF, 0.0);
s }
.hoc_symbol = s;
yylvalreturn (s->type == UNDEF ? VAR : s->type);
}
if (isalpha(c)) {
*s;
Symbol static str *sbuf = NULL;
// setup string for new input
if (sbuf == NULL) {
= str_new();
sbuf } else {
(sbuf);
str_clear}
do {
= getchar();
c } while (isalpha(c) || isdigit(c));
(c, stdin);
ungetc
const char *buf = str_view(sbuf);
if ((s = lookup(buf)) == NULL) {
= install_symbol(buf, UNDEF, 0.0);
s }
.hoc_symbol = s;
yylvalreturn (s->type == UNDEF ? VAR : s->type);
}
switch (c) {
case '<':
return lookahead('=', LTE, LT);
case '>':
return lookahead('=', GTE, GT);
case '=':
return lookahead('=', EQ, '=');
case '!':
return lookahead('=', NEQ, NOT);
case '|':
return lookahead('|', OR, '|');
case '&':
return lookahead('&', AND, '&');
default:
return c;
}
}
void yyerror(const char *s) {
("%s", s);
warningif (current_subr != NULL) {
->type = UNDEF;
current_subr= NULL;
current_subr }
}
/*
* Verify Symbol's declared arg count matches actual,
* or display a helpful error message with mismatch
*/
void check_call_args(const Symbol *s, int actual) {
int expected = s->data.func.args;
if (expected != actual) {
("Wrong number of arguments for %s: expected %d, got %d",
exec_error->name,
s,
expected);
actual}
}
#define print_error_prefix() fprintf(stderr, "%s: ", program_name)
#define print_error_suffix() fprintf(stderr, " (on line %d)\n", line_number)
void warning(const char *msg, ...) {
va_list args;
(args, msg);
va_start
();
print_error_prefix(stderr, (msg), args);
vfprintf();
print_error_suffix
(args);
va_end}
void exec_error(const char *msg, ...) {
va_list args;
(args, msg);
va_start
();
print_error_prefix(stderr, (msg), args);
vfprintf();
print_error_suffix
(args);
va_end
(begin, 0);
longjmp}
#include "hoc.h"
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
///----------------------------------------------------------------
/// global state
///----------------------------------------------------------------
static FILE *curr_inputfile; // the FILE* we're reading from
static char **next_filename; // next filename to read
static char **end_namep; // represents "no more files"
/**
* Change curr_inputfile to point to the next input file, if
* any. Closes any existing input file (unless that file points to
* stdin).
*
* Returns true if there is more input to read; false otherwise.
*/
static bool next_inputfile(void) {
if (curr_inputfile != NULL && curr_inputfile != stdin) {
(curr_inputfile);
fclose}
= NULL;
curr_inputfile
do {
if (next_filename == end_namep) {
return false;
}
// sync curr_inputfile with curr_namep
if (strcmp(*next_filename, "-") == 0) {
= stdin;
curr_inputfile } else {
= fopen(*next_filename, "r");
curr_inputfile if (curr_inputfile == NULL) {
("Could not open input filename: '%s'", *next_filename);
warning}
}
+= 1;
next_filename } while (curr_inputfile == NULL);
return true;
}
void init_inputfiles(char **filenames, int filecount) {
= NULL;
curr_inputfile = filenames;
next_filename = filenames + filecount;
end_namep (); // ensures curr_inputfile is either valid or NULL
next_inputfile}
bool read_double(double *dest) {
// curr_inputfile is NULL only when all input is exhausted
if (curr_inputfile == NULL) {
("Cannot read input data - No input files remaining");
warningreturn false;
}
while (true) {
switch (fscanf(curr_inputfile, "%lf", dest)) {
case EOF:
if (!next_inputfile()) {
return false;
}
break; // otherwise try again
case 0:
("Could not read double from '%s' - invalid data format",
exec_error*next_filename);
default:
return true;
}
}
}
#include "hoc.h"
#include "hoc.tab.h"
#include <assert.h>
#include <stdbool.h>
#include <stddef.h> // NULL
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
/*
* MACHINE STATE
* =============
*
*
* == PROG == == FRAMES == == STACK ==
* ┌─────────────┐◀───┐ ┌─────────────┐ ┌─────────────┐
* │ MachineCode │ pc │ Frame │ │ Datum │
* ├─────────────┤ ├─────────────┤ ├─────────────┤
* │ MachineCode │ │ Frame │ │ Datum │
* ├─────────────┤ ├─────────────┤ ├─────────────┤
* │ MachineCode │ progp │ Frame │ framep │ Datum │ stackp
* ├─────────────┤◀────┘ ├─────────────┤◀───┘ ├─────────────┤ ◀──┘
* │ ... │ │ ... │ │ ... │
* └─────────────┘ └─────────────┘ └─────────────┘
*
*
* The Program
* -----------
* We have our "program" stored as a list of MachineCode
* objects, each encoding one of:
* - an instruction
* - a literal value
* - a symbol reference
*
* We "execute" our program by setting the program counter (pc)
* pointer to the first MachineCode (which *must* be an instruction),
* and handling each instruction we find. Within an instruction, the
* program counter might be updated so that by the *end* of any
* instruction, the program counter is always pointing to another
* instruction.
*
*
* The Stack
* ---------
*
* Each instruction manipulates the stack, pushing,
* popping, or combining values from the stack. The stack elements
* are Datum objects, which act much like MachineCodes, but with two
* differences.
*
* 1. Datums cannot represent an instruction; they resolve to a value
* 2. The type of a datum can always be resolved from their context,
* and so no "type" field is necessary.
*
*
* The Frame Stack
* ---------------
*
* When a user-defined subroutine is called, a new Frame is
* placed on a separate "frame stack". The frames contain enough
* information to:
*
* - "Jump" to the location of the subroutine
* - The `def_called` field is a Symbol with the start Addr
*
* - Locate any arguments on the (data) stack
* - The argp/arg_count fields locate the arguments passed
*
* - Return to previous prog location after the subroutine finishes
* - Via the `ret_pc` field
*
*
* PROG STACK
* ┌─────────────┐ ┌─────────────┐
* ┌─▶│ MachineCode │ │ Datum │
* │ ├─────────────┤ FRAME ├─────────────┤
* │ │ MachineCode │ ┌─────────────┐ │ Datum │
* │ ├─────────────┤ │def_called: │ ├─────────────┤
* │ │ ... │ │"myfunc" │ │ Datum │
* │ ├─────────────┤ ├─────────────┤ ├─────────────┤
* │ │ MachineCode │ ┌──┤ret_pc │ ┌─▶│ Datum │
* │ ├─────────────┤ │ ├─────────────┤ │ ├─────────────┤
* │ │ MachineCode │ │ │argp ├──┘ │ Datum │
* │ ├─────────────┤ │ ├─────────────┤ ─▶├─────────────┤
* │ │ MachineCode │◀─┘ │arg_count: 2 │─ ┘ │ Datum │
* │ ├─────────────┤ └─────────────┘ ├─────────────┤
* │ │ ... │ │ . │
* │ └─────────────┘ │ . │
* │ └─────────────┘
* │ SYMBOL TABLE
* │ ┌─────────────────────────────────────────────────────┐
* │ │ ┌──────────────┐ │
* │ │ │name: "myfunc"│ │
* │ │ ├──────────────┤ │
* └──┼─│addr │ │
* │ └──────────────┘ │
* └─────────────────────────────────────────────────────┘
*/
/*
* Prog
*/
#define PROGSIZE 2000
[PROGSIZE];
MachineCode prog= prog; // No installation above this address
Addr prog_start ; // Next free location for code installation
Addr progpstatic Addr pc; // the current location in the executing program
static Addr highest_addr_written = prog; // Used in LISTing contents
/*
* Data stack
*/
#define STACKSIZE 256
static Datum stack[STACKSIZE]; // the machine stack
static Datum *stackp; // next free spot on the stack
static Datum last_popped = {.value = 0}; // The last value popped
static void stack_push(Datum d);
static Datum stack_pop(void);
/*
* Frames and Frame stack
*/
typedef struct Frame {
*subr_called; // the UFUNC/UPROC we have jumped to
Symbol ; // return location
Addr ret_pc*argp; // pointer to first argument
Datum int arg_count; // number of arguments
} Frame;
#define FRAMELIMIT 100 // Recursion depth limit
static Frame frames[FRAMELIMIT];
static const Frame *OVERFLOW_FRAME = frames + FRAMELIMIT;
static Frame *fp = frames; // Next frame to use
static bool is_returning; // true when returning from a user subroutine
///----------------------------------------------------------------
/// Initialization
///----------------------------------------------------------------
/** Initialize machine state for startup */
void machine_startup(void) {
(time(NULL)); // use current time as seed for random generator
srand();
machine_reset_program}
/** Initialize machine state for a new program */
void machine_reset_program(void) {
= false;
is_returning = stack;
stackp = prog_start;
progp }
///----------------------------------------------------------------
/// Code Installation
///----------------------------------------------------------------
(MachineCode mc) {
Addr install_codeif (progp >= prog + PROGSIZE) {
("Max program size exceeded");
exec_error}
*progp = mc;
= progp++; // return location of THIS instruction
Addr result if (result > highest_addr_written) {
= result;
highest_addr_written }
return result;
}
void reserve_subr(Symbol *subr_sym) {
->data.addr = prog_start;
subr_sym= progp;
prog_start }
///----------------------------------------------------------------
/// Execution
///----------------------------------------------------------------
void execute(Addr start_addr) {
= false;
is_returning = (start_addr == NULL) ? prog_start : start_addr;
pc while (!is_stop_inst(pc) && !is_returning) {
if (pc >= progp) {
("PC was past end-of-program");
exec_error}
if (pc->type != CT_INST) {
("Programming error, type '%c' was not INST!", pc->type);
exec_error}
(*(*pc++).inst)(); // increment PC and execute inst
}
}
/** Push new value onto stack */
static void stack_push(Datum d) {
if (stackp >= stack + STACKSIZE) {
("Stack overflow");
exec_error}
*(stackp++) = d; // copy
}
/** Pop most recent value from stack, or fail on underflow */
static Datum stack_pop(void) {
if (stackp <= stack) {
("Stack underflow");
exec_error}
= *(--stackp);
Datum d return (last_popped = d);
}
/**
* Push a new frame for a subroutine call onto the frame stack, or fail on
* overflow.
*
* The frame is initialized as a call to Symbol `s`, returning to address
* `caller_pc`. Caller must initialize all other frame data.
*/
static Frame *frame_push(Symbol *s, Addr caller_pc) {
if (fp == OVERFLOW_FRAME) {
("Call-Stack Overflow while calling '%s'", s->name);
exec_error}
->subr_called = s;
fp->ret_pc = caller_pc;
fpreturn fp++;
}
/** Pop most recent Frame from the frame stack, or fail on underflow */
static Frame *frame_pop(void) {
if (fp == frames) {
("Call-Stack Underflow while attempting to return");
exec_error}
return --fp;
}
/** Access top of frame stack, or fail if no Frame exists */
static Frame *frame_peek(void) {
if (fp == frames) {
("Call-Stack Underflow while attempting to peek at top frame");
exec_error}
return fp - 1;
}
/**
* Reset any machine state indicated by the given frame. This includes:
* - Removing any arguments from the stack for the call.
* - Resetting the program counter
* - Setting global "returning" flag
*/
static void return_from_frame(Frame *f) {
for (int i = f->arg_count; i > 0; i--) {
(); // remove arg from stack
stack_pop}
= true;
is_returning = f->ret_pc;
pc }
///----------------------------------------------------------------
/// Instructions
///----------------------------------------------------------------
#define dat(x) ((Datum){.value = x})
int inst_pushlit(void) {
(pc->type == CT_LITERAL);
assert
;
Datum d.value = pc->literal;
d(d);
stack_push++;
pcreturn 0;
}
int inst_pushref(void) {
(pc->type == CT_SYM_REF);
assert
;
Datum d.symbol = pc->symbol;
d(d);
stack_push++;
pcreturn 0;
}
int inst_pushlast(void) {
(last_popped);
stack_pushreturn 0;
}
int inst_eval(void) {
= stack_pop(); // must be symbol
Datum d switch (d.symbol->type) {
case UNDEF:
("Undefined variable: '%s'", d.symbol->name);
exec_errorbreak;
case BUILTIN:
case UFUNC:
case UPROC:
("Callable '%s' cannot be evaluated", d.symbol->name);
exec_errorbreak;
default: // VAR,CONST
(dat(d.symbol->data.val));
stack_pushbreak;
}
return 0;
}
int inst_whilecode(void) {
/*
┌─────────────────────────────┐
│ whilecode │
├─────────────────────────────┤
│ address(body) │
├─────────────────────────────┤
│ address(stmt after body) │
├─────────────────────────────┤
│ cond │
│ .. │
├─────────────────────────────┤
│ STOP │
├─────────────────────────────┤
│ body │
│ .... │
├─────────────────────────────┤
│ STOP │
├─────────────────────────────┤
│ stmt after body │
└─────────────────────────────┘
*/
= (pc++)->addr;
Addr body_addr = (pc++)->addr;
Addr end_addr = pc;
Addr cond_addr
while (true) {
if (is_returning) {
return 0; // PC is already set to the appropriate address
}
(cond_addr);
executeif (stack_pop().value) {
(body_addr);
execute} else {
break;
}
}
// condition false
= end_addr;
pc return 0;
}
int inst_ifcode(void) {
/*
┌─────────────────────────────┐
│ ifcode │
├─────────────────────────────┤
│ address(ifbody) │
├─────────────────────────────┤
│ address(elsebody) │
├─────────────────────────────┤
│ address(stmt after if) │
├─────────────────────────────┤
│ cond │
│ .. │
├─────────────────────────────┤
│ STOP │
├─────────────────────────────┤
│ ifbody │
│ .... │
├─────────────────────────────┤
│ STOP │
├─────────────────────────────┤
│ elsebody │
│ .... │
├─────────────────────────────┤
│ STOP │
├─────────────────────────────┤
│ stmt after if │
└─────────────────────────────┘
*/
= pc[0].addr;
Addr ifbody_addr = pc[1];
MachineCode maybe_elsebody = pc[2].addr;
Addr end_addr = pc + 3;
Addr cond_addr
(cond_addr);
execute= stack_pop(); // must be value
Datum d if (d.value) {
(ifbody_addr);
execute} else if (maybe_elsebody.type == CT_ADDR) {
(maybe_elsebody.addr);
execute}
if (!is_returning) {
= end_addr;
pc }
return 0;
}
int inst_call(void) {
= stack_pop(); // must be symbol
Datum s if (s.symbol->type != BUILTIN) {
("Cannot call non-builtin '%s'", s.symbol->name);
exec_error}
double result;
switch (s.symbol->data.func.args) {
case 0:
= s.symbol->data.func.call0();
result break;
case 1: {
= stack_pop();
Datum arg = s.symbol->data.func.call1(arg.value);
result break;
}
case 2: {
= stack_pop();
Datum arg2 = stack_pop();
Datum arg1 = s.symbol->data.func.call2(arg1.value, arg2.value);
result break;
}
}
(dat(result));
stack_pushreturn 0;
}
int inst_assignvar(void) {
= stack_pop(); // must be symbol
Datum lhs = stack_pop(); // must be value
Datum rhs if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
("Cannot assign to symbol '%s'", lhs.symbol->name);
exec_error}
.symbol->data.val = rhs.value;
lhs.symbol->type = VAR;
lhs(dat(rhs.value));
stack_pushreturn 0;
}
int inst_assignconst(void) {
= stack_pop(); // must be symbol
Datum lhs = stack_pop(); // must be value
Datum rhs if (lhs.symbol->type == CONST) {
("Cannot reassign constant '%s'", lhs.symbol->name);
exec_error} else if (!(lhs.symbol->type == VAR || lhs.symbol->type == UNDEF)) {
("Cannot assign to symbol '%s'", lhs.symbol->name);
exec_error}
.symbol->data.val = rhs.value;
lhs.symbol->type = CONST;
lhs(dat(rhs.value));
stack_pushreturn 0;
}
int inst_pop(void) {
();
stack_popreturn 0;
}
int inst_print(void) {
= stack_pop();
Datum d ("\t%.8g\n", d.value);
printf(stdout);
fflushreturn 0;
}
int inst_prexpr(void) {
= stack_pop();
Datum d ("%.8g", d.value);
printf(stdout);
fflushreturn 0;
}
int inst_prstr(void) {
*s = (pc++)->string;
str ("%s", str_view(s));
printf(stdout);
fflushreturn 0;
}
int inst_add(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value + rhs.value));
stack_pushreturn 0;
}
int inst_and(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value && rhs.value));
stack_pushreturn 0;
}
int inst_or(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value || rhs.value));
stack_pushreturn 0;
}
int inst_not(void) {
= stack_pop(); // must be value
Datum d .value = !d.value;
d(d);
stack_pushreturn 0;
}
int inst_lt(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value < rhs.value));
stack_pushreturn 0;
}
int inst_lte(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value <= rhs.value));
stack_pushreturn 0;
}
int inst_gt(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value > rhs.value));
stack_pushreturn 0;
}
int inst_gte(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value >= rhs.value));
stack_pushreturn 0;
}
int inst_eq(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value == rhs.value));
stack_pushreturn 0;
}
int inst_neq(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value != rhs.value));
stack_pushreturn 0;
}
int inst_sub(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
(dat(lhs.value - rhs.value));
stack_pushreturn 0;
}
int inst_div(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs
if (rhs.value == 0) {
("Division by zero");
exec_error} else {
(dat(lhs.value / rhs.value));
stack_push}
return 0;
}
int inst_mul(void) {
= stack_pop(); // must be value
Datum rhs = stack_pop(); // must be value
Datum lhs (dat(lhs.value * rhs.value));
stack_pushreturn 0;
}
int inst_subrexec(void) {
*sym = (pc++)->symbol;
Symbol double arg_count = (pc++)->literal;
*f = frame_push(sym, pc);
Frame ->arg_count = (int)arg_count;
f->argp = stackp - f->arg_count;
f
(sym->data.addr);
execute= false;
is_returning return 0;
}
int inst_argget(void) {
int arg_number = (int)(pc++)->literal;
*current_frame = frame_peek();
Frame if (current_frame->arg_count < arg_number) {
("Subroutine '%s' uses parameter %d, but only got %d.",
exec_error->subr_called->name,
current_frame,
arg_number->arg_count);
current_frame}
(current_frame->argp[arg_number - 1]);
stack_pushreturn 0;
}
int inst_argset(void) {
int arg_number = (int)(pc++)->literal;
*current_frame = frame_peek();
Frame if (current_frame->arg_count < arg_number) {
("Subroutine '%s' uses parameter %d, but only got %d.",
exec_error->subr_called->name,
current_frame,
arg_number->arg_count);
current_frame}
= stack_pop();
Datum d ->argp[arg_number - 1] = d;
current_frame(d);
stack_pushreturn 0;
}
int inst_uprocret(void) {
*f = frame_pop();
Frame if (f->subr_called->type != UPROC) {
("Function '%s' does not return a value", f->subr_called->name);
exec_error}
(f);
return_from_framereturn 0;
}
int inst_ufuncret(void) {
*f = frame_pop();
Frame = stack_pop();
Datum result (f);
return_from_frame(result);
stack_pushreturn 0;
}
int inst_varread(void) {
*dest = (pc++)->symbol;
Symbol bool success = read_double(&dest->data.val);
(dat(success));
stack_pushif (!success) {
->data.val = 0;
dest}
->type = VAR;
destreturn 0;
}
///----------------------------------------------------------------
/// Commands
///----------------------------------------------------------------
void machine_run_command(Symbol *command_symbol) {
->data.command_func();
command_symbol}
#define check_inst(inst_name) \
else if (c.inst == inst_##inst_name) do { \
printf(inst_template, line, cp, "INST", #inst_name); \
} \
while (false)
void command_list(void) {
// Lists contents of prog until we hit two stop instructions in a row
#define header_template "%6s | %11s | %10s | %-30s\n"
#define inst_template "%6ld | %11p | %10s | %-30s"
#define rightline_template "%6ld | %11p | %10s | %30s"
#define sym_addr_template "%6ld | %11p | %10s | %-7.7s: %11p (Line %3ld)"
#define lit_template "%6ld | %11p | %10s | %30f"
#define str_template "%6ld | %11p | %10s | %30.30s"
#define addr_template "%6ld | %11p | %10s | %11p (Line %3ld)"
(header_template, "LINE", "ADDRESS", "TYPE", "CODE");
printf("------------------------------------------------------------------"
printf"\n");
for (size_t i = 0; prog + i <= highest_addr_written; i++) {
size_t line = i + 1;
= prog[i];
MachineCode c void *cp = &prog[i];
if (prog + i == prog_start && prog != prog_start) {
("---------------------------->"
printf" END SUBR "
"<----------------------------\n");
}
switch (c.type) {
case (CT_LITERAL):
(lit_template, line, cp, "L", c.literal);
printfbreak;
case (CT_SYM_REF):
switch (c.symbol->type) {
case UFUNC:
case UPROC: {
size_t addr_line = (c.symbol->data.addr - prog) + 1;
(sym_addr_template,
printf,
line,
cp"SUBR",
.symbol->name,
c(void *)c.symbol->data.addr,
);
addr_linebreak;
}
default:
(rightline_template, line, cp, "SYM", c.symbol->name);
printf}
break;
case (CT_ADDR): {
size_t addr_line = (c.addr - prog) + 1;
(addr_template, line, cp, "ADDR", (void *)c.addr, addr_line);
printfbreak;
}
case (CT_CHRSTR): {
(str_template, line, cp, "CHAR", str_view(c.string));
printfbreak;
}
case (CT_INST):
if (is_stop_inst(&c)) {
(inst_template,
printf,
line,
cp"INST",
"------------ STOP ------------");
}
(add);
check_inst(and);
check_inst(argget);
check_inst(argset);
check_inst(assignvar);
check_inst(assignconst);
check_inst(call);
check_inst(div);
check_inst(eq);
check_inst(eval);
check_inst(gt);
check_inst(gte);
check_inst(ifcode);
check_inst(lt);
check_inst(lte);
check_inst(mul);
check_inst(neq);
check_inst(not );
check_inst(or);
check_inst(pop);
check_inst(prexpr);
check_inst(prstr);
check_inst(print);
check_inst(pushlast);
check_inst(pushlit);
check_inst(pushref);
check_inst(sub);
check_inst(subrexec);
check_inst(ufuncret);
check_inst(uprocret);
check_inst(varread);
check_inst(whilecode);
check_instelse {
(inst_template, line, cp, "INST", "UNKNOWN!?");
printf}
break;
default:
("Unknown Code Type: %c\n", c.type);
printf}
("\n");
printf}
}
#include "hoc.h"
#include <errno.h>
#include <fenv.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h> // rand
#define HOC_RAND_MAX 1000000L
static double check_math_err(double result, const char *op);
double Abs(double x) {
return check_math_err(fabs(x), "fabs");
}
double Acos(double x) {
return check_math_err(acos(x), "acos");
}
double Atan(double x) {
return check_math_err(atan(x), "atan");
}
double Atan2(double x, double y) {
return check_math_err(atan2(x, y), "atan2");
}
double Cos(double x) {
return check_math_err(cos(x), "cos");
}
double Exp(double x) {
return check_math_err(exp(x), "exp");
}
double Integer(double x) {
return (double)(long)x;
}
double Lg(double x) {
return check_math_err(log2(x), "lg");
}
double Ln(double x) {
return check_math_err(log(x), "log");
}
double Log10(double x) {
return check_math_err(log10(x), "log10");
}
double Pow(double x, double y) {
return check_math_err(pow(x, y), "exponentiation");
}
double Random(void) {
// We have HOC_RAND_MAX buckets
const int bucket_size = RAND_MAX / HOC_RAND_MAX;
int result;
do {
= rand() / bucket_size;
result } while (result >= HOC_RAND_MAX);
return result + 1;
}
double Sin(double x) {
return check_math_err(sin(x), "sin");
}
double Sqrt(double x) {
return check_math_err(sqrt(x), "sqrt");
}
static double check_math_err(double result, const char *op) {
static const char *domain_msg = "argument outside domain";
static const char *range_msg = "result outside range";
static const char *other_msg = "floating-point exception occurred";
const char *msg = NULL;
if ((math_errhandling & MATH_ERREXCEPT) && fetestexcept(FE_ALL_EXCEPT)) {
// Special case: Inexact results are not errors.
if (fetestexcept(FE_INEXACT)) {
goto done;
}
if (fetestexcept(FE_INVALID)) {
= domain_msg;
msg } else if (fetestexcept(FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW)) {
= range_msg;
msg } else { // unknown
= other_msg;
msg }
(FE_ALL_EXCEPT);
feclearexcept} else if (errno) {
if (errno == EDOM) {
= domain_msg;
msg } else if (errno == ERANGE) {
= range_msg;
msg } else {
= other_msg;
msg }
= 0;
errno }
if (msg != NULL) {
("math error during %s: %s", op, msg);
exec_error}
:
donereturn result;
}
#include "hoc.h"
#include <stdbool.h>
#include <stddef.h>
#include <stdlib.h>
/*
* Invariants (Given instance `s`):
* 1. s.len <= s.capacity - 1;
* 2. sizeof(s.buf) == s.capacity
*
* These invariants exist so that anytime we want a view of the string
* for use with C functions, there is capacity within `s.buf` to
* append a NUL character.
*
* Any function that mutates the string should ensure these invariants.
*/
struct str {
char *buf;
size_t len;
size_t capacity;
};
#define INIT_CAP 20 /** Initial capacity of a string */
*str_new(void) {
str *s = malloc(sizeof(str));
str if (s == NULL) {
("Out of memory!");
exec_error}
->capacity = INIT_CAP;
s->buf = malloc(s->capacity); // Ensures invariant 1
sif (s->buf == NULL) {
(s);
free("Out of memory!");
exec_error}
->len = 0; // Ensure invariant 2
sreturn s;
}
bool str_append(str *s, char c) {
if (s->len == s->capacity - 1) {
/*
* If we'd break our invariant by adding a character,
* grow our buffer capacity and update fields to match
*/
char *tmp = realloc(s->buf, s->capacity * 2);
if (tmp == NULL) {
return false;
}
->buf = tmp;
s->capacity = s->capacity * 2;
s}
->buf[s->len++] = c;
sreturn true;
}
const char *str_view(str *s) {
->buf[s->len] = '\0';
sreturn s->buf;
}
void str_clear(str *s) {
->len = 0;
s}
void str_free(str *s) {
(s->buf);
free(s);
free}
#include "hoc.h"
#include "hoc.tab.h" // generated from yacc -d on our grammar
#include <stddef.h> // NULL
#include <stdlib.h> // malloc
#include <string.h> // strcmp
static Symbol *symbol_table = NULL;
void *emalloc(size_t nbytes);
*lookup(const char *name) {
Symbol *current = symbol_table;
Symbol while (current != NULL) {
if (strcmp(current->name, name) == 0) {
return current;
}
= current->next;
current }
return NULL;
}
*install_symbol(const char *name, short type, double value) {
Symbol size_t name_len = strlen(name);
*s = emalloc(sizeof(Symbol) // actual symbol
Symbol + (sizeof(char)) * (name_len + 1)); // room for name
->name = (char *)(s + 1);
s(s->name, name);
strcpy->name[name_len] = '\0';
s
->type = type;
s->data.val = value;
s->next = symbol_table;
s= s;
symbol_table return s;
}
void *emalloc(size_t nbytes) {
void *ptr = malloc(nbytes);
if (ptr == NULL) {
("Out of memory!");
exec_error}
return ptr;
}