Stack Computer Architecture

By "stack architecture", I refer to second-generation designs which use hardware stacks, separate from main memory, to perform expression evaluation, parameter passing, and subroutine linkage. These stacks typically make only their top-most or top two elements accessible, granting faster access than register files. Typically, such second-generation stack computers have two stacks: a Data Stack attached to the ALU, and a Return Stack which holds the return addresses of subroutine calls. Philip Koopman wrote the main text [1] on stack machines, though it strongly favours horizontally microcoded designs (which has its own pros and cons) over hardcoded logic.

Stack computers excel at control-heavy tasks, with fast lightweight subroutine calls and minimal interrupt latencies. They also require fewer transistors for a basic useful implementation than plain MIPS-like computers. However, stack computers perform relatively poorly at numerical processing tasks since we have to constantly copy and juggle items on the stack. Future designs should consider instead adding stacks to conventional register files, granting the best of both worlds. Richard L. Sites outlined this idea in a 1978 one-page paper [2].

Nonetheless, for conventional two-stack architectures, my research [3] suggests that with additional improvements (instruction folding, pipelining, an Instruction Stack) they should reach roughly the same perfomance as a plain DLX-like or MIPS-like scalar processor.

Philip Koopman, "Stack Computers: the new wave", Ellis Horwood, 1989. (see also his Stack Computers & Forth page)
Richard L. Sites, "A combined register-stack architecture", ACM SIGARCH Comput. Archit. News, vol. 6, no. 8, pp. 19-19, 1978.
Charles Eric LaForest, "Second-Generation Stack Computer Architecture", BIS Thesis, University of Waterloo, 2007 (Waterloo copy)

Forth, Stack Machines, and Virtualization

A comp.lang.forth thread with much info (and good discussion) about the last bits of stack-machine work I've done. (so far...)
Fletcher: Final Project Report
(Poster, one section per page), (Poster, all one page)
This was a continuation of my undergraduate work in stack machines: how to extend fast subroutine calls into fast context switching. The end result was a basic kernel and virtual memory model which granted clean and fast machine virtualization. (Don't get too excited. It assumes a very different software universe, where code is routinely dynamically generated.) The Project Design Document and my BIS thesis (see Whence) provide the background details, but are not necessary prior reading.
After half a day of writing straight machine code for Gullwing and realizing why people don't do that, I spent a couple days writing a simple symbolic assembler which abuses the C macro system and uses the C compiler's symbol table to do naming and name resolution. You write macros which compile to code which when run generates the actual binary memory image of your assembly code.
Here's the assembler with the source for the Flight kernel: kergen.c
And its target hardware parameters: params.h.
A more detailed description of the Flight kernel source code might help: core_words_v7.txt
(but it's pretty hard to read by itself. Check my undergrad thesis for useful background.)
Extensions To The Flight Programming Language
(Provides more explanation than in my BIS thesis.) The Flight programming language is based on a small kernel of primitives sufficient to linearly allocate memory, compile stack-based machine code, associate a name to a memory address, look up a memory address from its name, and execute the code there. Although self-contained, this kernel lacks useful features such as the freeing of allocated memory, interactive use of the underlying machine, inline compilation of code and strings, string and number output, flow control constructs, elementary multiplication and division, function composition, lexical closures, and higher-order functions. These features are added to Flight without additions the kernel and without the use of software tools other than Flight itself. The code for these features adds up to about 300 lines of Flight source and compiles into about 2000 words of 32-bit memory.
Flight Virtual Machines And Metacompilation
(This eventually led to the Fletcher project (see above), where I actually implemented the hardware virtualization I speculate about here.) A number of Gullwing virtual machines are written in Flight and their relative overheads are compared. The sources of the overhead are explained in terms of the Popek and Goldberg requirements for machine virtualization, which are then used to derive the necessary changes to the Gullwing architecture to optimally support virtual machines. Some minor alterations are performed on the Flight language kernel to allow for retargeting its compilation. From this is built a framework for expressing the Flight kernel in Flight instead of assembly. When combined with the virtual machines, this allows Flight to create a second, nested instance of itself which cannot affect the first.
Communicating Instances Of The Flight Programming Language
(This is all crude, but might be interesting since FPGAs have the same dual-ported memories I use in this report.) I've altered the Gullwing virtual machine and the Flight language kernel allow for multiple processors to share a common memory. These changes make it possible for an instance of Flight to compile code in memory and have it executed by another processor under software control. This capability is then used to connect via serial channels multiple instances of Flight running on such multiprocessors and use the self-extensible nature of Flight to remotely compile code and make the system appear as a single instance of Flight with parallelism.

fpgacpu.ca