Core Words Of Flight - Take 7
Charles Eric LaForest
laforest AT eecg.utoronto.ca

2008/03/26
Added RTU instruction compilation

2006/12/11
Merged VM branch with LOOK alteration into the COMSIM kergen which contains
defines to create single and dual-core kernels

2006/03/02
Altered LOOK to not pop the string on the input stack and to return the
value of NAME-END if no match so we can test against that for error checking.
This means changing some low-level words in the Flight extensions so as
to keep things working as before since most of the time the input string
doesn't need to be kept.

2006/02/25
Altered LOOK to use NAME-END as end of dictionary address, so it can be
temporarily altered for retargeted compilation.

2006/01/05
Fixed PUSH-STRING to set the tail

2005/11/01:
FOR FUTURE WORK
PUSH-STRING should create the tail, instead of SCAN-STRING.
Otherwise, every function afterwards that allocates memory using
PUSH-STRING (which is what it really does) has to create the tail,
which we already know has to contain its own address (unless it was
converted into a name entry).

2005/10/18:

Altered COMPILE-OPCODE to NEXT-SLOT at the beginning.
Altered the DEFN sub-words. Mainly added the concept of a null slot to
denote a HERE location in which nothing has been compiled yet.
This is definitely not optimal, but it fixed a subtle bug.
Other words should check for null slot condition and thus not ALIGN
needlessly, saving the occasional skipped, wasted memory word.
FOR FUTURE WORK:
Noted that the whole slot shifting mechanism is wasteful:
Instead, shift the contents of HERE, and add in the opcode in the last slot.
The slot mask then devolves into a counter, which since it is less than
32, can be a simple one-hot count, eliminating the need for an addition.
However, ALIGN will then have to check and shift the opcodes into the
zeroth sloth if required.

2005/09/13:

Factored DEFN into DEFN-AS, NEW-WORD, and DEFN, to allow
for naming a chunk of code after it has been compiled,
so one can overlay a word with another of the same name
which refers to the original word internally.

Tweaked COMPILE-OPCODE (moved a DROP to eliminate
a >R...R> pair). Substituted '|' with '+'.
(since PC@ is 0, this works, but maybe it's slower...)

2005/03/18:

NOTE:
It someone never occured to me that the sequence
SCAN 5 SCAN NUMI can't work if each SCAN'ed in string
overwrites the previous one (NUMI would go and convert itself!).
Thus SCAN has to accumulate strings and LOOK must eat them.
This does mean that a function can have multiple string
arguments now, as a generalization of the Data Stack.

NOTE: 
Important edit! Conditional jumps no longer always reload the ISR.
If the branch is not taken, the next instruction executes as usual.
The previous behaviour was intended to make all flow-control instructions
behave the same, to avoid exceptions, but this new behaviour seems better.
The code density improves, since 'JMP0 JMP' can now be consecutive in
a memory word. However, make sure this does not slow things down or
bloat the state machine. 
This shouldn't break this code, assuming the subsequent instructions within
a memory word are NOPs, but the compilation words will have to be altered to
take this into account.

NOTE:
Since SCAN, DEFN, and LOOK are not in a hierarchical relationship, passing
the string address via the stack is wrong. It complicates the code. 
It should be stored in a named variable.
This recapitulates scope blocks, interestingly.

0 Misc.

These are the lowest level input buffer manipulation routines.
The input buffer behaves like a simple stack of counted strings.
The top of stack pointer is INPUT, the bottom pointer is THERE.

0.1 STRING-TAIL

Returns the address of the tail of a string.
Takes the head address of the string.
The tail contains an address used for referencing code or terminating
string comparisons.

>A A@+ (get string length and skip it)
A> + ; (string tail address)


0.2 PUSH-STRING

Alters INPUT to allocate the space for a counted string.
Sets the tail of the string to point to its own address.
Takes the length of the string from TOS.

DUP LIT 1 + NOT (account for string count and address tail)
LIT [address of INPUT] >A A@ + DUP A!  (adjust INPUT)
>A A! (store string length count at (INPUT))
LIT [address of INPUT] >A A@
CALL STRING-TAIL
DUP >A A! ;

0.3 POP-STRING

Alters INPUT to discard the most recently SCAN'ed string.

LIT [address of INPUT] >A A@
call STRING-TAIL
LIT 1 + (point to head of next string)
LIT [address of INPUT] >A A! ;

Or for comparison:
LIT [address of INPUT] DUP >R 
>A A@ DUP (get (INPUT))  
>A A@ (get string length)
+ LIT 2 + (add length+2 to (INPUT) to point to next string)
R> >A A! ; (update INPUT)


0.4 COMPARE-STRINGS

Takes the head address of two strings.
Returns the addresses (in the same order)
of the first non-matching pair of symbols.
Note that the practice of tailling each string
with the address of the tail guarantees that the
comparison will terminate (at the tail).

>A >R
LOOP:
R@+ A@+ XOR
JMP0 LOOP
LIT -1 R> +
LIT -1 A> + ;

1 SCAN

1.1 READ1

Presumed here is that READ1 always returns a machine word.
Implementation depends on interface to outside world. 
Here I assume a memory port for illustration purposes.

LIT [address of input port] >A A@ ; (read in one machine word)


1.2 SCAN-STRING

Called to read a string into a pushed string entry.

LIT [address of INPUT] >A A@ >R R@+ (get string length and prep)
SCAN-STRING-LOOP:
call READ1 R!+ (READ1 uses A)
LIT -1 + DUP JMP0 DONE
JMP SCAN-STRING-LOOP
DONE: (storage address should be == (INPUT)+length+1 now)
DROP R> DUP >A A! ;
(store address of end of string in itself for LOOK termination)


1.4 SCAN

The location after the string is kept free to hold (THERE) as
a LOOK termination marker, or an actual code address should it become
a name entry.

Input is a counted string, where the first machine word contains the
following number of machine words used by the string, regardless of
symbol encoding.

Always reads in the string.
There is no check. The sender of the string can check if
"INPUT HERE -" is greater than the string length to see if
there is space.

The check depends on the fact that HERE points to an address
that begins at 0 and increments, while INPUT points to a location
that begins at THERE and decrements. 
THERE begins at the end of memory and decrements towards zero.
If they meet, then
there is no more free memory. Thus if the start address is lesser
than the one HERE points at, there is no room for the string
(or for anything else at all!). 

call READ1
call PUSH-STRING 
jmp SCAN-STRING


2 DEFN

2.1 FIRST-SLOT

Begin compiling at zeroth slot.

LIT [address of SLOT] >A
LIT [mask for slot 0] A! ;

2.2 LAST-SLOT

Setup at fifth slot, so the next compilation
will occur at the zeroth slot.

LIT [address of SLOT] >A
LIT [mask for slot 5] A! ;

2.3 NULL-SLOT

Null slot mask. Denotes that HERE is empty.

LIT [address of SLOT] >A
LIT [mask for null slot] A! ;

2.4 ALIGN

Makes HERE point to the next free location, so we don't point a name at
the tail of the previous procedure or clobber literals.
Make HERE-NEXT point to the following location.
Mark HERE as empty with NULL-SLOT.

LIT [address of HERE-NEXT] >A A@
DUP >R LIT 0 R!+ (zero (PC@) for compilation and increment)
R> A! (update HERE-NEXT)
LIT [address of HERE] >A A! (update HERE)
JMP NULL-SLOT

2.5 NEXT-SLOT

Point to first slot if HERE is empty (NULL-SLOT), else
point to next free slot at HERE, else ALIGN.

( N.B.: Major hardware change: slot 0 is now on LSB side (so we don't have to
worry about the MSB with '2/') )

LIT [address of SLOT] >A A@ (get slot mask)
LIT [null slot mask] OVER XOR JMP0 HEREEMPTY
LIT [fifth slot mask] OVER XOR JMP0 HEREFULL
2* 2* 2* 2* 2* (next 5-bit slot) A! ;
HEREFULL:
DROP CALL ALIGN JMP FIRST-SLOT
HEREEMPTY:
DROP JMP FIRST-SLOT

2.6 DEFN-AS

Takes the aligned address of a code entry (usually (HERE) )
and the address of a SCANed string (from INPUT)
and converts it to a name entry. Updates THERE.
String must be the only one in the input buffer stack
as it is converted in place, so INPUT must be pointing to it
and its tail must be at THERE.
Because of this, INPUT does not need to be changed.

LIT [address of THERE] >A A@ >A A! (store address in (THERE))
LIT [address of INPUT] >A A@ (get string start address)
LIT -1 + (move to first free location before it)
LIT [address of THERE] >A A! ;

2.7 NEW-WORD

Returns an aligned address to compile at.

call ALIGN
LIT [address of HERE] >A A@ (get (HERE)) ;

2.8 DEFN

Defines a name entry for the current code entry location.

CALL NEW-WORD
JMP DEFN-AS


3 LOOK

3.2 LOOK

Eats the topmost name string on the input buffer stack.
Returns address of code associated with name or new (INPUT) if no match.
No error checking other than for end of dictionary.

LIT [address of THERE] >A A@ LIT 1 + DUP (address of latest name entry string)
LOOP:
LIT [address of INPUT] >A A@ (address of topmost SCANed string)
call COMPARE-STRINGS
LIT [address of INPUT] >A A@
call STRING-TAIL
XOR JMPO MATCH
DROP call STRING-TAIL  (uses the DUPed name entry address)
LIT 1 + (point to start of next name entry)
DUP LIT [address of NAME-END] >A A@ XOR JMP0 NOMATCH (are we at end of name dict?)
DUP JMP LOOP
MATCH:
>A DROP A@ ; (return code address for name entry)
NOMATCH:
DROP LIT [address of NAME-END] >A A@ ; (return address of end of dict)


4 Compilation words

4.1 COMPILE-INIT

Sets up HERE/HERE-NEXT/SLOT for compilation

LIT [address of HERE] >A
LIT [initial compilation address] A!+ A>
LIT [address of HERE-NEXT] >A A!
LIT [address of SLOT] >A
LIT [mask for slot 5] A!
LIT [address of HERE] >A A@ >A
LIT [PC@|PC@|PC@|PC@|PC@|PC@] A! ;

4.2 COMPILE-OPCODE

Takes an opcode (in slot 0) and compiles it 
at the next empty HERE/SLOT location.
Assumes the current slot is full.
Leaves SLOT pointing to next full slot.
This fails for PC@, since its opcode is all zeroes.
(But then, it shouldn't be necessary to compile it explicitely,
that's what ALIGN is for)
Note the "+" instead of "|" at the end.
We can do this since empty slots are zeroed (since PC@ is 0).

call NEXT-SLOT
>R LIT [address of SLOT] >A A@ ~ (get slot mask and invert)
R> (place opcode on top)
OVER OVER & JMP0 COMPILE (slot 0)
2* 2* 2* 2* 2* (shift opcode by one slot) 
OVER OVER & JMP0 COMPILE (slot 1)
2* 2* 2* 2* 2* 
OVER OVER & JMP0 COMPILE (slot 2)
2* 2* 2* 2* 2*
OVER OVER & JMP0 COMPILE (slot 3)
2* 2* 2* 2* 2*
OVER OVER & JMP0 COMPILE (slot 4)
2* 2* 2* 2* 2*
COMPILE: (then we *have* to be in slot 5)
LIT [address of HERE] >A A@
>A A@ + A! DROP ; (compile opcode, drop mask, return)

4.3 COMPILE-LITERAL

Take a procedure address/literal (for CALL/JMP/JMP0/LIT). 
Compile it at HERE-NEXT. Increments HERE-NEXT.

LIT [address of HERE-NEXT] >A A@
>A A!+ (store address/literal and increment)
A> LIT [adress of HERE-NEXT] >A A! ; (update HERE-NEXT)

4.4 CMPC, (CALL)
 
Takes the address of a procedure an compiles a call to it.
Then aligns to the next free word since later slots in current
word will never execute.

LIT [opcode for CALL]
CALL COMPILE-OPCODE
CALL COMPILE-LITERAL
JMP ALIGN

4.5 CRTU, (RTU)
 
Compiles a Return To User instruction
then aligns to the next free word since later slots in current
word will never execute since ISR is not saved.
Upon a trap, exection will resume in the following memory word.

LIT [opcode for RTU]
CALL COMPILE-OPCODE
JMP ALIGN

4.6 CMPJ, (JMP)

LIT [opcode for JMP]
CALL COMPILE-OPCODE
CALL COMPILE-LITERAL
JMP ALIGN

4.7 CMP0, (JMP0)

If JMP0 not taken, the next opcodes run instead, 
unlike calls and jumps

LIT [opcode for JMP0]
CALL COMPILE-OPCODE
JMP COMPILE-LITERAL

4.8 CMP+, (JMP+)

LIT [opcode for JMP+]
CALL COMPILE-OPCODE
JMP COMPILE-LITERAL

4.9 NUMC, (LIT)

LIT [opcode for LIT]
CALL COMPILE-OPCODE
JMP COMPILE-LITERAL


5. EXEC/NXEC

5.1 EXECUTE

If called: Take a procedure address and call to it.
If inlined/jumped to: Take a procedure address and jump to it.
(take note of tail-recursion elimination)

>R ;

5.2 EXEC

Read in a procedure address and call to it.

call READ1 
call EXECUTE 
jmp EXEC

5.3 NXEC

Read in a string, look up the function address, call to it.

call SCAN 
call LOOK
call POP_STRING
call EXECUTE 
jmp NXEC


6. Inline compilation

Copying a word slot-by-slot requires decompiling it
instruction by instruction and recompiling it at the new
location. This requires reimplementing the compilation
words 'in reverse'. Complicated and ugly.
Simply ALIGNing and copying existing code is no good since
the words that will be worthwhile to inline are the shortest,
and so many slots will get proportionally wasted.
The extreme case is when inlining machine instructions:
you would end up with one opcode per word!
So that's the key: we know in advance which words will
end up inlined, and so we write a word which when run
compiles the code in situ. It's a macro!

6.1 (DUP), (DROP), (+), (A!+), etc...

LIT [opcode in slot 0]
JMP COMPILE-OPCODE 

7 NUMI

This is where we can't avoid a predefined character encoding,
even though it's disjoint from higher symbolic representation.
Ideally, this would be some UNICODE code, but I'll settle for
decimal ASCII numerals for now (0 -> 48,...9 -> 57), one per word.
Wasteful, but simple.

7.1 10*

Multiply an integer by 10.

DUP 2* 2* 2* OVER + + ;

7.2 NUMI

Takes a string address of an unsigned decimal number 
and returns the corresponding integer on the stack.
No error or overflow checking!

LIT [address of INPUT] >A A@
>R R@+ (read in length)
LIT 0 >A
DUP JMP0 DONE (if N == 0)
LOOP:
A> call 10* R@+ LIT -48 + + >A (shift total by radix, get number, convert to int, add)
LIT -1 + DUP JMP0 DONE
JMP LOOP
DONE:
DROP R> DROP A>
JMP POP-STRING


8 Interesting finds

8.1 10^N

Take a number, returns its power of 10. No range checking.
(This formed the basis for NUMI, it's also a generic framework
for run-time iteration. The substitution of SWAPs with 'A'
instructions also seems significant.)

LIT 1 >A 
DUP JMP0 DONE (if N==0)
LOOP:
A> call 10* >A
LIT -1 + DUP JMP0 DONE
JMP LOOP
DONE:
DROP A> ;