- Code Structure and Optimization
- Some Thoughts On Machine Code Organization
Just a digression on the relationship between iterative, recursive, and threaded machine code, and how the choice of
fundamental parameter-passing structure leads to different programming environments.
- Survey of Loop Transformation Techniques
(Slides: PDF PPT)
A report on all the loop transformation techniques I had to learn in Parallelizing Compilers.
If your algorithm is memory-bound, the speedup can be remarkable: "How We Made Our Face Recognizer 25x Faster"
In C, in some cases, you'll likely have to use the obscure 'restrict' keyword to give the compiler a chance,
else the potential for pointer aliasing makes dependency analysis very hard. Not having array accesses being pointer-based
is the main strength of Fortran for such things.
- Parallelism and Message-Passing
- HashLife on GPU
A mostly-failed attempt at implementing Gosper's HashLife algorithm on a GPU using CUDA.
Hopefully this will help someone else get it right.
- Message-Passing Concurrency on Shared-Memory Multiprocessors
Inspired by Erlang, I tested various locking schemes for a message mailbox.
Even with thousands of threads contending for a single lock, it turns out that a single, simple Pthread mutex works
best due to playing very well with the Linux process scheduler.
- Chaotic Systems and Neural Engineering
- Finding the Largest Lyapunov Exponent
In short: use Rosenstein's algorithm over Wolf's. It's much faster and more accurate. (MATLAB code)
- Second-Order Wiener-Bose Model of a FitzHugh-Nagumo Giant Squid Axon Model
Finding the number of Laguerre functions is easy enough, but scaling them to the available number of data points is tricky.
I initially tried to do it all in MATLAB, but broke down and used LYSIS instead for the modelling.
The MATLAB code contains a fast memoized recursive Nth-order Wiener-Bose filter implementation,
if you need such a thing.
- Asynchronous Circuits
- Burst-Mode Locally-Clocked Asynchronous Circuits
(My initial foray into asynchronous circuits.) I chose Nowick's Burst-Mode
(hazard-free two-level logic) circuits to implement an active and a passive
version of an asynchronous four-phase handshake circuit, then used them to
build self-timed modules which can generate local clocks with a variable
period, exchange data with other asynchronous systems (with early handshake
termination if desired), or act as input/output ports with provisions against
metastability. I also outline the use of these modules to create an
asynchronous implementation of the Gullwing computer architecture.
- Optimization of Burst-Mode Circuits Using Logical Effort Theory
(Contains nice nutshell explanations, calculations, and experiments using Logical Effort Theory on control circuits.)
The logical effort and parasitic delay of static CMOS logic gates are determined through simple simulation experiments
based on Logical Effort theory. This calibrated model is then used to calculate near-optimum gate sizes and path delays for an
active/passive pair of Burst-Mode circuits under load. Finally, an extension of Logical Effort theory is applied to the circuits
in order to determine their critical delay under an equal gate delay model and compare them to other asynchronous control