# Logic Design Principles 1. Constants are your friend. They optimize logic without losing generality in the source. 2. Ready/valid handshakes allow for action every clock cycle if possible and compose well. Pulses necessarily limit your max throughput by half. They are a layer of abstraction. 3. Pulse generators and pulse latches remove many FSMs and can fix race conditions (ABA problems). 4. Separate control and data: it allows you to optimise the datapath sooner and more easily. It also avoids having to explicitly describe the cartesian product of all states and outputs, which makes for tedious code that doesn't describe the design itself, but a behavioural enumeration of it, from which the design must be reverse engineered. 5. Modularity means debugging more of the *design* than its implementation 6. Writing smaller modules means a smaller context, thus more likely to avoid bugs, and easier to write meaningful code/comments. This benefit is preserved through composition. 7. Hierarchy: pulses, ready/valid, 4-phase, 2-phase (TODO: chapter to introduce/explain them, why/where use them, etc...) 8. CDC implies asynchrony, which means no notion of time, only of sequence, hence the 2ph/4ph handshakes must be present inside non-trivial CDC designs. 9. Don't generate/divide clocks: generate enables for the desired rate, synchronous to the main clock. 10. Make the code reflect the design: don't make the reader reverse-engineer the design from the implementation. 11. Non-blocking assignments for testbench logic, blocking assignment for testbench clock. (avoid races) 12. Design with retiming in mind: lets you design without worrying about where to pipeline. Makes for more readable design source. Tailors to devices/widths/etc... 13. Forward retiming works better (re: Vivado phys_opt, and general retiming algorithms) 14. Since the `clear` input, a synchronous reset, is a signal that changes the output, and derived from some control path logic, it must trigger an "output updated" signal in pulse interfaces as any other `load` or `valid` input signal, though only as a single pulse since the output only changes once even with a steady `clear`. This also implies that `clear` must be pipelined, separately and in parallel if necessary, and not broadcast to all units, which means similar latency, and better routing and distribution. 15. Amend that last one: don't signal an "output updated" value, as consecutive calculations can have identical results (so a Word Change Detector is no good here), and multiple commands can update a given output. Instead, signal when a command is done. That "done" signal also denotes when the given output is valid to sample. Internally latch the command so we know which one to report as "done" if multiple commands update one internal object (itself with a "done" signal). This does not allow concurrency or queueing by itself. 16. Following up on last one: using a "done" signal on commands also separates control and data paths. 17. Possible general design principle: connect datapaths with ready/valid handshakes at boundaries, enumerate ready/valid actions as 2-input truth table, result is computed by one of the 16 dyadic Boolean functions. 18. If a module contains latches which will hold state, then it must have a reset/clear. Else a reset of the surrounding logic would result in an inconsistent system (e.g.: a valid line from a latch stayed high after reset). 19. It's possible for a logic path to be too general, with constant inputs that don't change. This makes a slower path. Instead, have parallel, simpler paths with each hardcode one possible value of the unchanging inputs, then select at the end using the actual unchanging input. Pipelining may render this optimization moot. (e.g.: adders which depend on >3 inputs (data and control) are better split as separate simpler adders and a mux) 20. Following on above: a over-general logic path may indicate you are not implementing the complete algorithm, and should instead use all possible logic functions of the path. 21. Make signal names evolve with the computation, in lexicograpic order, by adding/changing suffixes. Thus search in text is easier, and waveform displays will be more organized. This naming scheme also make obvious when your code has a harmony, which is usually a sign that the implementation reflects the design. 22. As physical area increases, if there are any cycles in dependency (not a straight pipeline), then area becomes a limiting factor for speed. Also, distributing control signals can have too far to travel. 23. As a design gets large, routing may not get as good a result (congestion, too much effort) and so critical paths become bad routes, and bit-reductions that cannot be pipelined, despite using carry-chains for faster calculations. 24. Design paradigm: all modules with ready/valid handshakes at input/output/control to allow half/skid-buffers to be added as necessary to avoid above timing problems, without altering functionality. Are we at Kahn Networks now? 25. Follow up on above: can we make it so the handshake logic adds no latency or optimizes away when possible? (combinational paths without buffering) Hiding a note here: Wakerly - Digital Design Princicples and Practice Mins, Elliott - FSM-based Digital Design using the Verilog HDL Davis, Reese - Finite State Machine Datapath Design, Optimization, and Implementation And another: Dally, Towles, Principles and Practices of Interconnection Networks (Ch.18 Arbitration) Dimitrakopoulos, Psarras, Seitanidis, Microarchitecture of Network-on-Chip Routers (Ch.4 Arbitration Logic) Dimitrakopoulos, Kachris, Kalligeros, Scalable arbiters and multiplexers for on-FPGA interconnection networks, FPL 2011 Weber, Arbiters: Design Ideas and Coding Styles, SNUG Boston 2001