OCTAVO

The Octavo soft-processor is a research CPU aimed at building FPGA overlay architectures. Instead of implementing your whole design in hardware, and waiting hours for it to place-and-route after each change, you implement just the compute-heavy parts in hardware alongside an Octavo instance, and leave the rest to software. Most design cycles now reduce to quick compiles, and both the hardware and software design jobs are simplified. This isn't a new idea, but Octavo has higher software performance and couples more directly to external hardware than previous soft-processors.

Architecture

Octavo's high performance comes from an architecture adapted the to the underlying FPGA, so the trade-offs are different than for an ASIC. The architecture works best for parallel code but the increased branching and addressing efficiency, and the more powerful ALU operations, also help sequential code. Some architectural features include:

Design Evolution

Octavo CPU Block Diagram (v1) The first version of Octavo, published in 2012, was a proof-of-concept: can we maximize operating frequency and issue one instruction per cycle? Octavo v1 had Instruction (I) and Data (A and B) memories, a simple controller (CTL) which could execute a branch based on the contents of a memory location in A, and an ALU which could do addition, subtraction, multiplications, and basic Boolean operations. It reached the 550 MHz limit of the Stratix IV FPGA, but had limitations: you had to write self-modifying code to implement indirect memory accesses, and the ALU was idle during a branch.

Octavo CPU Block Diagram (v2) The second version of Octavo, published in 2014, addressed the inefficiencies of the first version. Octavo v2 keeps the same ALU, and the same Instruction (I) and Data (A and B) memories, but adds a Branch Trigger Module (BTM) which calculates one or more branches in parallel with the current instruction based on the result of the previous instruction. Branches take zero cycles in the common case. The Address Offset Module (AOM) can alter the instruction operands before execution to implement indirect memory access with post-incrementing. Finally, the I/O Predication Module (PRD) manages the I/O ports: if an instruction operand refers to a port which is not ready, the instruction is forced to a no-op and the Controller (CTL) re-fetches the same instruction to retry again. Octavo v2 no longer reached the maximum possible operating frequency, but its improved efficiency more than made up for the loss. Octavo v2 could also be operated in a SIMD configuration, with up to 32 datapaths.

Octavo CPU Block Diagram (v3) The third version of Octavo, currently under development, fixes some limitations of Octavo v2 which was written in a hurry. The codebase was cleaned up, and computational overhead further reduced: multi-way branching with priority arbitration over 200+ branch conditions (FC), more flexible indirect addressing (AD), a Literal Pool to reduce duplication in Data memories (DM), a programmable Opcode Decoder (OD), a new three-operand ALU which supports bitwise parallelism and instruction chaining, and a new addressing mode to move twice as much data per instruction when data movement dominates computation (AS).

Clock Frequency

Although Octavo (v1 and v2) was originally aimed at Altera's Stratix IV FPGA, it performs pretty well on other Altera devices. It generally runs twice as fast as a NiosII/f, and gets fairly close to the absolute upper clock frequency limit allowed by the FPGA hardware. The Fmax of Octavo v3 seems to be 5% lower than Octavo v2, but operates with less overhead and a more powerful ISA.

We could port Octavo to Xilinx devices, but the ALU would have to be implemented differently, though architecturally the same. See the works of Cheah, Fahmy, and Kapre on the iDEA soft-processor, and its pipelining and forwarding, for their high-performance solutions on Xilinx FPGAs.

Octavo v2 Fmax on Various Altera Devices (tuned to Stratix IV)
FamilyDeviceAverageMaximumAvg/MaxLimitMax/Lim
  (MHz)(MHz)(Ratio)(MHz)(Ratio)
Stratix V 5SGXEA7N2F45C15085880.8646750.871
Stratix IV EP4S100G5H40I14704930.9535500.896
Arria V 5AGXFB5K4F40I32723000.9074000.750
Cyclone V 5CGXFC7D6F31C62392670.8953150.848
Cyclone IV EP4CGX30CF19C61871970.9493150.625

Publications

Download

You can get the complete source from the Octavo GitHub Repository, updated as work progresses.


fpgacpu.ca