Stack Computer Architecture
By "stack architecture", I refer to second-generation designs which use
hardware stacks, separate from main memory, to perform expression evaluation,
parameter passing, and subroutine linkage. These stacks typically make only
their top-most or top two elements accessible, granting faster access than
register files. Typically, such second-generation stack computers have two
stacks: a Data Stack attached to the ALU, and a Return Stack which holds the
return addresses of subroutine calls. Philip Koopman wrote the main text
 on stack machines, though it strongly favours
horizontally microcoded designs (which has its own pros and cons) over
Stack computers excel at control-heavy tasks, with fast lightweight subroutine
calls and minimal interrupt latencies. They also require fewer transistors
for a basic useful implementation than plain MIPS-like computers. However,
stack computers perform relatively poorly at numerical processing tasks since
we have to constantly copy and juggle items on the stack. Future designs
should consider instead adding stacks to conventional register files,
granting the best of both worlds. Richard L. Sites outlined this idea in a
1978 one-page paper .
Nonetheless, for conventional two-stack architectures, my research
 suggests that with additional improvements
(instruction folding, pipelining, an Instruction Stack) they should reach
roughly the same perfomance as a plain DLX-like or MIPS-like scalar processor.
- Philip Koopman, "Stack Computers: the new wave", Ellis Horwood, 1989. (see also his Stack Computers & Forth page)
- Richard L. Sites, "A combined register-stack architecture", ACM SIGARCH Comput. Archit. News, vol. 6, no. 8, pp. 19-19, 1978.
- Charles Eric LaForest, "Second-Generation Stack Computer Architecture", BIS Thesis, University of Waterloo, 2007 (Waterloo copy)
Forth, Stack Machines, and Virtualization
- A comp.lang.forth thread
with much info (and good discussion) about the last bits of stack-machine work I've done. (so far...)
- Fletcher: Final Project Report
(Poster, one section per page), (Poster, all one page)
This was a continuation of my undergraduate work in stack machines: how to extend fast subroutine calls into fast context switching.
The end result was a basic kernel and virtual memory model which granted clean and fast machine virtualization.
(Don't get too excited. It assumes a very different software universe, where code is routinely dynamically generated.)
The Project Design Document and my BIS thesis (see Whence) provide the background details,
but are not necessary prior reading.
- After half a day of writing straight machine code for Gullwing and realizing why people don't do that, I spent a couple days writing
a simple symbolic assembler which abuses the C macro system and uses the C compiler's symbol table to do naming and name resolution.
You write macros which compile to code which when run generates the actual binary memory image of your assembly code.
Here's the assembler with the source for the Flight kernel: kergen.c
And its target hardware parameters: params.h.
A more detailed description of the Flight kernel source code might help: core_words_v7.txt
(but it's pretty hard to read by itself. Check my undergrad thesis for useful background.)
- Extensions To The Flight Programming Language
(Provides more explanation than in my BIS thesis.)
The Flight programming language is based on a small kernel of primitives sufficient to linearly allocate memory,
compile stack-based machine code, associate a name to a memory address, look up a memory address from its
name, and execute the code there. Although self-contained, this kernel lacks useful features such as the freeing
of allocated memory, interactive use of the underlying machine, inline compilation of code and strings, string
and number output, flow control constructs, elementary multiplication and division, function composition, lexical
closures, and higher-order functions. These features are added to Flight without additions the kernel and without
the use of software tools other than Flight itself. The code for these features adds up to about 300 lines of Flight source
and compiles into about 2000 words of 32-bit memory.
- Flight Virtual Machines And Metacompilation
(This eventually led to the Fletcher project (see above), where I actually implemented the hardware
virtualization I speculate about here.)
A number of Gullwing virtual machines are written in Flight and their relative overheads are compared. The sources of the
overhead are explained in terms of the Popek and Goldberg requirements for machine virtualization, which are then used to
derive the necessary changes to the Gullwing architecture to optimally support virtual machines. Some minor alterations are
performed on the Flight language kernel to allow for retargeting its compilation. From this is built a framework for expressing
the Flight kernel in Flight instead of assembly. When combined with the virtual machines, this allows Flight to create a second,
nested instance of itself which cannot affect the first.
- Communicating Instances Of The Flight Programming Language
(This is all crude, but might be interesting since FPGAs have
the same dual-ported memories I use in this report.)
I've altered the Gullwing virtual machine and the Flight language
kernel allow for multiple processors to share a common memory. These changes
make it possible for an instance of Flight to compile code in memory and have
it executed by another processor under software control. This capability is
then used to connect via serial channels multiple instances of Flight running
on such multiprocessors and use the self-extensible nature of Flight to
remotely compile code and make the system appear as a single instance of Flight