from FPGA Resources by GateForge Consulting Ltd.
This is a continually evolving document. Please send any feedback to email@example.com or @elaforest.
Over time, I've found that most of the difficulty in FPGA system design comes from a lack of modularity, not having a library of building blocks, and from not using the logic optimization work done by the CAD tool to simplify design.
It matters to have a system design standard, since the implementation doesn't matter. We want to think in terms of the design and its blocks, and not concern ourselves (too much) with the implementation of these blocks. Nonetheless, a block needs to be well-implemented and understood. I describe related implementation practices in the companion Verilog Coding Standard.
This standard mainly describes how to divide a digital design into parts, including all the files required by the CAD tools which do not describe the design itself. These parts aim to be reusable, easy to understand, and cultivate a library of parts and design idioms.
This standard does not define any specification, testing, verification, validation, or documentation methodologies (though I hope you have some!).
Nor does this standard specify an implementation language, method or CAD tool. However, it was written in tandem with a Verilog Coding Standard.
Finally, this standard assumes you are already familiar with digital design in general. It is not a language or tool reference.
A module definition forms part of a general design intent and a module instance, through its name, parameters, and connections, tells you the specific design intent at that point in the design. Without extensive modularization, unless the reader already knows the design, they have to continually reverse-engineer the meaning of the logic back into the design.
A design composed of modules, when successively viewed in the CAD tool as elaboration, synthesis, and implementation schematics, allows you to check many things:
Without modules, the schematics rapidly become a large mess of random logic, which becomes increasingly hard to relate to the original code as you move from elaboration to sythesis to implementation. Dividing a design into modules preserves the logical design hierarchy (even though in hardware it is usually all flattened).
Within each module, you only have to think about the immediately relevant logic and wires, which give us the guiding principle for modular decomposition of a design: move unrelated connections into separate modules. If you have lines of logic with connections between them which themselves do not connect to the surrounding logic, then those lines should be, if possible, parameterized and encapsulated into a sub-module. Then, the other connections in the surrounding logic are necessarily more closely related and now encapsulated in the enclosing module.
By adjusting the module parameters, setting some inputs constant, and leaving some outputs unconnected, the CAD tool can optimize the general case logic of a building block, while still keeping the design intent clear in the source. A design then becomes a hierarchy of modules, connected by wires, and any random logic signifies a very local and special case (e.g.: combining two flags into one).
These building blocks should be quite small, implementing logic functions common to most any design: counters, pulse/level converters, applying Boolean logic to vectors of bits, configurable delay pipelines, universal multiplexers, Clock Domain Crossing synchronizers, static and programmable address decoders, address translators, priority arbiters, AXI transactors, adjustable power-on resets, universal Boolean operators, calculating equalities and inequalities, on-chip memories.
Using building blocks enables a better separation of control, processing, and interface in your design. Divide a sub-system into one module for functionality (processing and storage), one for control (the FSM), and one for interfacing to the rest of the system (memory-mapping the data storage and configuration registers).
This section is taken almost verbatim from Core/Instance/Adapter/Shim Architecture
In a project with multiple FPGAs and boards and connecting paths between them, all designed by separate engineering teams, a common code architecture on each FPGA makes managing various mismatches, errors, and last-minute changes much easier, as well as improving portability. It also help single-FPGA designs for the same reasons.
I'll start explaining at the bottom and work upwards. Each higher layer instantiates the one below it as a single module, and performs one specific function to support that underlying module. As we move upwards, the functions shift from creating and supporting the application logic to managing the interface to the outside world:
The Core logic of the application contains all the parts needed in the abstract, without concern for interface protocols, size of FPGA, number of instances, etc... It may culminate in a single module or a few. The only FPGA-specific parts are technology-mapping issues like which DSP or BRAM blocks are used. Other than porting to another FPGA family, this code should never change. For example, the core logic could contain the control and data paths of a soft-CPU, and the building blocks for a network-on-chip, but all as a library of parts.
The Instance culminates in a single module which instantiates the Core logic, and specifies the size and number of core logic modules, as well as all the wiring between them. The Instance primarily deals with scaling the Core logic to the specific FPGA device being used. For example, the instance defines how many data paths a soft-CPU will have if it's a SIMD system, and how many CPUs in total and the network-on-chip which connects them. There are still no concerns about particular external interfaces: buses and clocks are simple signals.
At this point, the design can be simulated to check for logical correctness, and synthesized for a preliminary estimate of its final operating frequency, assuming the instance is surrounded by a temporary test harness of registers to ensure correct static timing analysis.
The Adapter instantiates the Instance, defines the final interface ports expected by the board upon which the FPGA resides, and wires up the instance to those ports. The Adapter adds any logic necessary to support the Instance and also satisfy the FPGA CAD tools. For example, the Adapter will contain any clock generation and management hardware, add any differential buffers to signals that need it, even if unused, to satisfy CAD tool constraints, and convert interface protocols as required (e.g.: as AXI masters/slaves, bi-directional or tri-state buffers, etc...).
At this point, synthesis of the design should find any final errors, such as insufficient clock buffers or logic resources, expose any missing/violated constraints, and in general point out anything that needs attention. You should, of course, fix things to minimize the number of warnings from the CAD tool.
The Shim instantiates the Adapter, and nothing else. It has the same interface ports as the Adapter, and the Shim's ports are at the top level and physically connect to the FPGA pins. What the Shim does, however, is handle the problems that crop up with PCBs: unusable connections, schematic errors, incompatible voltage standards, etc... For example, if some top-level pins are unusable due to a design or manufacturing error, the Shim can shuffle the signals between the top-level pins and the adapter ports so as to present a uniform, maybe reduced, interface to the Adapter. If you have spare board pins, this is where you use them.
Another example is where, due to how a PCB had been designed, the pins of a bus were not connected to consecutive pins along the FPGA's I/O Banks, scattering the bus (and it's associated logic) all over the device. The Shim file translates the logical bus pins from the adapter to physically consecutive pins on the device, which helped keep related logic grouped together. Of course, any other FPGA on the PCB needs a similar Shim file to re-organize the now scrambled bus signals coming in.
The Shim file is also a good place to annotate the top level pins: which ones are clock-capable, are they usable as differential pairs, which clock region or I/O Bank they reside in, etc... which helps future alterations.
Finally, you now can do a full synthesis to get the final operating frequency, do post-placement synthesis to check things like the I/O delay lines, and do final floorplanning.
Divide different types of CAD tool constraints across different files, leaving only a select few constraints applied directly in the source as attributes on wires, registers, and modules. This keeps the design as portable as possible and easy to manage when the underlying FPGA or the surrounding PCB changes. Some constraint types to group into separate files include:
DONT_TOUCHattribute under Vivado) to keep any unused logic will give you a better ongoing estimate of the final area and timing.
However, some of these constraints may have to be applied in the source code
itself, as the CAD tool may not read the other constraints files early enough
in the synthesis, placement, or routing phases. This really depends on the CAD
tool. And some constraints, like Vivado's
ASYNC_REG which declares
registers as part of a clock domain crossing synchronizer, should be placed in
the CDC synchronizer module source code so they automatically take effect at
each CDC synchronizer module instantiation.