Rules for Ready/Valid Handshakes and Synchronization

from FPGA Design Elements by Charles Eric LaForest, PhD.

Ready/valid handshakes are a flexible and lightweight way to connect and control modules in composable ways, but as I designed more complex modules, I found some corner cases I couldn't quite fit into the ready/valid handshake model, and some designs started not composing well because I was implementing the handshakes inconsistently. So I worked out a set of rules. I also found new ways to distribute control logic, and I outline the synchronization process that underlies ready/valid handshakes.

AXI is not Fundamental

I originally based these rules on those written in the AMBA AXI4 Specification, Chapter A3, "Single Interface Requirements", and expanded on their meaning and consequences. I have replaced the inappropriate and inaccurate "master/slave" terminology with "source/destination", correspondingly, when refering to handshake interfaces.

Ready/valid handshakes are the underlying mechanism for AXI interfaces. Each channel of AXI4 is a ready/valid interface, and the TLAST of AXI4-Stream is merely a bit of metadata controlled by the ready/valid handshake. AXI4 and AXI4-Stream interfaces bring in too many assumptions and complications for most designs. It is simpler, smaller, and more flexible to compose your own handshaking interfaces up from ready/valid primitives such as Skid Buffers and Arbiters and Forks and Joins, to only name a few.

Interfaces

There are two complementary interface types: source and destination. Both interfaces have three signals: valid, ready, and data. Ready and valid are single-bit signals, and the data signal is of arbitrary width. All signals are synchronous to the rising edge of the clock.

The source interface outputs valid and data, and takes ready as input.
The destination interface outputs ready, and receives valid and data.

A connection always goes from the source interface to the destination interface. Other pairings cannot work.

Loops

There must be no combinational paths from input to output signals in the source interface (ready to valid), nor in the destination interface (valid to ready). Otherwise combinational loops will form when connecting interfaces.

Even if only one type of interface has a loop, thus avoiding combinational loops, the remaining combinational path will go from one interface to the other and back to the first, which will cause a longer delay. This delay may both limit your design clock frequency and make placement and routing of the modules connected by these interfaces more difficult. It is not worth saving a cycle of latency here at the expense of the performance of the rest of the design. Proper handshake interface design and pipelining will avoid this extra cycle of latency, while improving clock frequency and P&R.

ADDENDUM: Although the above points stand in theory, in practice allowing a combinational path between valid and ready within an interface can be a good tradeoff. Buffering every single source/destination connection can bloat a design and will not improve performance except at the highest speeds. Some pipeline control is most easily implemented combinationally between interfaces, without buffering. However, you must then be careful of combinational loops and of the requirement to avoid livelocks and deadlocks (see below).

Handshake Procedure

At any time, the source interface raises and holds steady valid when data is available, and the destination interface raises ready only when it can accept more data. When both valid and ready are high, the handshake is complete, the destination interface accepts the data in the same cycle, and the ready and valid outputs change state if necessary.

The destination interface can freely assert and deassert ready at any time. However, it is beneficial to have the destination interface assert ready as soon as it can accept data, before the source interface asserts valid, to shorten handshakes to a single cycle. For the same reason, the source interface should assert and hold steady valid as soon as it has data to send.

After a handshake completes, the source interface drops valid if it has no more data, otherwise it keeps valid high for the next handshake, which may complete in the same clock cycle. Similarly, the destination interface drops ready if it cannot accept more data, otherwise it keeps ready high and can complete the next handshake in the same clock cycle if valid is also high.

Delayed Handshakes

It is possible to delay a handshake and have the destination interface accept data at the moment the source interface asserts valid, then signal ready to complete the handshake, after the accepted data is processed, to move the source interface to the next data item.

Although this delayed handshake is legal and will work, it interferes with conventional dataflow pipelining since the source interface cannot begin processing and presenting the next data item concurrently with the destination interface processing the current data item. If the source and destination processing times are equal, which is the optimum when pipelining, this incorrect handshake will halve the throughput instead of doubling it!

However, when full throughput pipelining is not needed or possible (e.g.: an iterative calculation), a delayed handshake can implement a very useful control mechanism by signalling to the source interface when the next item can be processed. A Half Buffer implements this control mechanism by accepting an item from the source interface, and internally presenting that item and a valid signal at its own source interface. Detecting the rise of valid with a Pulse Generator starts the internal logic, whose control logic now only need to pulse ready when the calculation is done to complete the handshake. This simplifies control and maintains concurrency.

Sampling Changing Data

Typically, the source interface holds data steady alongside valid, changing data only after the handshake completes. Any data value outside of the clock cycle when the handshake completes is lost. It is allowable to have data be a continuously changing value, synchronously to the clock, which will get sampled whenever a handshake completes.

Sampling changing data is best done by having the source interface hold valid high, while the destination interface asserts ready at an interval. The opposite method of sampling where the source interface periodically asserts valid does not guarantee a predictable sampling interval, or cause a livelock, since it depends on the destination interface having already asserted ready. It may also cause a deadlock if the destination interface waits for valid before raising ready (see below).

Avoiding Deadlocks and Livelocks

We must constrain the behaviour of the valid and ready signals to prevent deadlocks, where the source and destination interfaces wait forever for eachother to respond, and to prevent livelocks, where both interfaces are responding but the handshake never completes.

To prevent deadlocks, the source interface must not wait until the destination interface asserts ready before asserting valid, while the destination interface can wait for the source interface to assert valid before asserting ready. This waiting will extend the handshake to two cycles, completing in the second cycle, but has applications where we want to selectively complete handshakes from multiple source interfaces, as in an Arbiter.

To prevent livelocks, when the source interface asserts valid it must remain asserted until the handshake completes, else we could end up in a situation where the source interface temporarily asserts valid and the destination interface temporarily asserts ready, but they never coincide to complete the handshake.

Reset

The reset signal to modules with source and/or destination interfaces can be active high or low, may assert asynchronously, but must deassert synchronously. However, using an active high reset limits the amount of inverted logic to read, which keeps the logic clearer. Also, I strongly recommend using a fully synchronous reset since an asynchronous reset will inhibit register retiming. Any latch holding state inside the source or destination interface must be reset, else the interface may remain in the wrong state after reset (e.g. the source interface signalling that data is valid when there isn't any).

Internal State

Any internal state of a module which affects a source or destination interface must only ever change in the same cycle as a completed handshake, when both ready and valid are raised together. Otherwise, data may be lost.

For example, if you are counting down words in a packet passing through a source interface, and you change state as soon as the counter reaches zero, you will lose the last packet word if, by coincidence, the ready signal of the destination interface goes low at the same time as the counter reaches zero. By the time the destination interface raises ready, the source interface has already moved on to the next packet and the previous data word has been lost, corrupting the previous packet and all following packets.

Thus, as a rule, define a line of code like this one:

    always @(*) begin
        handshake_complete = (ready == 1'b1) && (valid == 1'b1);
    end

and use it to gate any state transitions that affect a source or destination interface.

Underlying Synchronization

Since we very usefully partition the design of computating machinery as data and control paths, it's natural to think of one module as transferring data to another, and of one module as controlling the operation of another. The ready/valid handshake is a great fit to both data transfers and module control.

However, ready/valid handshakes are only a particular implementation of a more fundamental mechanism of synchronization (a.k.a.: rendez-vous), upon which we then build up any combination of data transfer and control we need. This is another reason "master/slave" terminology (and its analogs) should be discarded: it assumes constraints to the design process which are not fundamental or always necessary. These constraints may be useful in that they help us manage the complexity of a design, but at the cost of an increased mismatch between the desired computation and its implementation, which increases complexity and testing difficulty relative to a more closely matched design.

To illustrate this underlying synchronization mechanism, imagine two modules which operate concurrently, but must sometimes synchronize with eachother, for any number of possible reasons (e.g.: minimizing buffering, preventing pipeline stalls, keeping audio and video streams synchronized). Each module has an input named "OK_IN", and an output named "OK_OUT", and we simply connect these outputs to those inputs as expected.

When one module needs to synchronize, it raises and holds high "OK_OUT" and waits. Eventually, the other module will do the same and both modules will simultaneously see (OK_IN == 1'b1) && (OK_OUT == 1'b1) == 1'b1 during the same clock cycle, and their control logics can change state also simultaneously, if not identically. The modules have synchronized, no data was transferred, and neither one is in control of the other.

(This is similar, but not identical to passing both module outputs through a Pipeline Synchronizer, as that includes the downstream module(s), implementing a 3-way synchronization and data transfer.)

If we then use this synchronization mechanism to transfer data in one direction, we have re-invented ready/valid interfaces. We could also implement a bidirectional simultaneous data transfer the same way, with the understanding that both "OK" signals must follow the same deadlock and livelock avoidance rules mentioned before.

Finally, some readers will notice the similarity between this synchronization mechanism and the 2-phase and 4-phase handshakes used in asynchronous logic, only simplified since we have a clock as an absolute time reference.

Back to FPGA Design Elements