Decouples two sides of a ready/valid handshake to allow back-to-back
transfers without a combinational path between input and output, thus
pipelining the path to improve concurrency and/or timing. The latency from
input to output is 2 clock cycles. Any FIFO depth is allowed, not only
powers-of-2. Raising clear
will return the FIFO to the empty state.
NOTE: This module is not suitable for pipelining long combinational paths since it depends on a central buffer. If you need to pipeline a path to improve timing rather than concurrency, use a Skid Buffer Pipeline instead.
Since a FIFO buffer stores variable amounts of data, it will smooth out irregularities in the transfer rates of the input and output interfaces, and when used in pipeline loops, can store enough data to prevent artificial deadlocks (re: Kahn Process Networks with bounded channels).
This module is a generalization of the Skid Buffer, which acts as a 2-deep FIFO buffer. Go read it to get a deeper treatment of pipeline buffering requirements, and the particular idiom used to implement the control logic.
`default_nettype none module Pipeline_FIFO_Buffer #( parameter WORD_WIDTH = 0, parameter DEPTH = 0, parameter RAMSTYLE = "" ) ( input wire clock, input wire clear, input wire input_valid, output reg input_ready, input wire [WORD_WIDTH-1:0] input_data, output wire output_valid, input wire output_ready, output wire [WORD_WIDTH-1:0] output_data ); initial begin input_ready = 1'b1; // Empty at start, so accept data end
From the FIFO DEPTH
, we derive the bit width of the buffer addresses and
item count, and construct the constants we work with.
`include "clog2_function.vh" localparam WORD_ZERO = {WORD_WIDTH{1'b0}}; localparam ADDR_WIDTH = clog2(DEPTH); localparam ADDR_ONE = {{ADDR_WIDTH-1{1'b0}},1'b1}; localparam ADDR_ZERO = {ADDR_WIDTH{1'b0}}; localparam ADDR_LAST = DEPTH-1;
Since the stored data count has to be able to represent DEPTH
itself, and
not a zero to DEPTH-1
count of that quantity, we need an extra bit, to
guarantee sufficient range.
localparam COUNT_WIDTH = ADDR_WIDTH + 1; localparam COUNT_ONE = {{COUNT_WIDTH-1{1'b0}},1'b1}; localparam COUNT_ZERO = {COUNT_WIDTH{1'b0}}; localparam COUNT_LAST = DEPTH; localparam COUNT_UP = 1'b0; localparam COUNT_DOWN = 1'b1;
The buffer itself is a synchronous dual-port memory: one write port to insert data, and one read port to concurrently remove data. Typically this memory will be a dedicated Block RAM, but can also be built from LUT RAM if the width and depth are small, or even plain registers for very small cases. If your CAD tool does not support inference of memories as registers, you can instead chain together a few Skid Buffers, though the latency will increase by 1 clock cycle per Skid Buffer.
NOTE: There will NEVER be a concurrent read and write to the same
address, so write-forwarding logic is not necessary, as specififed by the
values of the READ_NEW_DATA
and RW_ADDR_COLLISION
parameters. Guide
your CAD tool as necessary to tell it there will never be read/write
address collisions, so you can obtain the highest possible operating
frequency.
We initialize the read/write enables to zero, signifying an idle system.
reg buffer_wren = 1'b0; wire [ADDR_WIDTH-1:0] buffer_write_addr; reg buffer_rden = 1'b0; wire [ADDR_WIDTH-1:0] buffer_read_addr; RAM_Simple_Dual_Port #( .WORD_WIDTH (WORD_WIDTH), .ADDR_WIDTH (ADDR_WIDTH), .DEPTH (DEPTH), .RAMSTYLE (RAMSTYLE), .READ_NEW_DATA (0), .RW_ADDR_COLLISION ("no"), .USE_INIT_FILE (0), .INIT_FILE (), .INIT_VALUE (WORD_ZERO) ) buffer ( .clock (clock), .wren (buffer_wren), .write_addr (buffer_write_addr), .write_data (input_data), .rden (buffer_rden), .read_addr (buffer_read_addr), .read_data (output_data) );
Since the buffer is a synchronous RAM, it takes a clock cycle before the
read data updates, so we pass along the read enable line (buffer_rden
)
through a register to synchronize it and signal valid read data
(output_valid
). However, the register needs separate control: for
example, when the output interface reads out the last item out of the
buffer, the output becomes invalid even though the buffer output is
unchanged.
reg update_output_valid = 1'b0; Register #( .WORD_WIDTH (1), .RESET_VALUE (1'b0) ) output_data_valid ( .clock (clock), .clock_enable (update_output_valid), .clear (clear), .data_in (buffer_rden), .data_out (output_valid) );
The buffer read and write addresses are stored in counters, which both
start at (and clear
to) ADDR_ZERO
. Each counter can only increment by
ADDR_ONE
at each read or write, and will wrap around to ADDR_ZERO
if
incremented past a value of DEPTH-1
, labelled as ADDR_LAST
. The depth
can be any positive number, not only a power-of-2.
NOTE: The counters never pass eachother. If the write counter runs ahead of the read counter enough to wrap-around and reach the read counter from behind, the buffer is full and all writes to the buffer halt until after a read happens. Conversely, if the read counter catches up to the write counter from behind, the buffer is empty and all reads halt until after a write happens. In both the full and empty cases, both counters are equal but they get to that state by different paths, so we will need some extra state further down to make these two cases look different.
reg increment_buffer_write_addr = 1'b0; reg load_buffer_write_addr = 1'b0; Counter_Binary #( .WORD_WIDTH (ADDR_WIDTH), .INCREMENT (ADDR_ONE), .INITIAL_COUNT (ADDR_ZERO) ) write_address ( .clock (clock), .clear (clear), .up_down (1'b0), // 0/1 --> up/down .run (increment_buffer_write_addr), .load (load_buffer_write_addr), .load_count (ADDR_ZERO), .carry_in (1'b0), // verilator lint_off PINCONNECTEMPTY .carry_out (), .carries (), .overflow (), // verilator lint_on PINCONNECTEMPTY .count (buffer_write_addr) ); reg increment_buffer_read_addr = 1'b0; reg load_buffer_read_addr = 1'b0; Counter_Binary #( .WORD_WIDTH (ADDR_WIDTH), .INCREMENT (ADDR_ONE), .INITIAL_COUNT (ADDR_ZERO) ) read_address ( .clock (clock), .clear (clear), .up_down (1'b0), // 0/1 --> up/down .run (increment_buffer_read_addr), .load (load_buffer_read_addr), .load_count (ADDR_ZERO), .carry_in (1'b0), // verilator lint_off PINCONNECTEMPTY .carry_out (), .carries (), .overflow (), // verilator lint_on PINCONNECTEMPTY .count (buffer_read_addr) );
To distinguish between the empty and full cases, which both identically
show as equal read and write counters, we also count the number of words
currently stored in the FIFO. This is not the most compact way of
expressing the difference between the empty and full cases, but it is the
simplest and fastest, as we don't need to track the read and write
addresses to see if they have wrapped around before they came to be equal
to determine if the FIFO is full or empty. All we need to do is count up
or down by COUNT_ONE
.
Put another way, instead of having to compare the whole value of both address counters plus an extra empty/full bit, we only need to compare one counter to two constant values (for the empty and full cases) which is simpler and more local logic.
Put yet another way, we found an equivalence between all possible pairs of read and write address values which represent the same amount of stored data, so we went from having to deal with (for a FIFO of depth N) ceil(22log2(N)+1) possible states (two counters and an empty/full bit), to only ceil(2log2(N)) actual states (the number of stored words).
reg update_buffer_data_count = 1'b0; reg incr_decr_buffer_data_count = 1'b0; wire [COUNT_WIDTH-1:0] buffer_data_count; Counter_Binary #( .WORD_WIDTH (COUNT_WIDTH), .INCREMENT (COUNT_ONE), .INITIAL_COUNT (COUNT_ZERO) ) data_count ( .clock (clock), .clear (clear), .up_down (incr_decr_buffer_data_count), // 0/1 --> up/down .run (update_buffer_data_count), .load (1'b0), .load_count (COUNT_ZERO), .carry_in (1'b0), // verilator lint_off PINCONNECTEMPTY .carry_out (), .carries (), .overflow (), // verilator lint_on PINCONNECTEMPTY .count (buffer_data_count) );
For a depth of N, the FIFO buffer has N states:
The operations which transition between these states are:
+
)-
)+-
)We also descriptively name each transition between states. These names will show up later in the code as datapath transformations.
/--\ +- flow /--\ +- flow | | | | load | v load load | v load ------- + ------- + + ------- + ------- | Empty | ---> | Busy | ---> ---> | Busy | ----> | Full | | 0 | | 1 | . . . | N-1 | | N | | | <--- | | <--- <--- | | <--- | | ------- - ------- - - ------- - ------- unload unload unload unload
We can see from the resulting state diagram that when the datapath is empty, it can only support an insertion, and when it is full, it can only support a removal. If the interfaces try to remove while Empty, or insert while Full, data will be duplicated or lost, respectively.
We can also see that the state of the FIFO is exactly represented by the
number of items currently stored in it, and that number can only stay put,
increment by 1, or decrement by 1. Thus, the data_count
counter will be
our state variable, and we do not need to uniquely distinguish the state
transitions.
reg empty = 1'b0; reg busy = 1'b0; reg full = 1'b0; always @(*) begin empty = (buffer_data_count == COUNT_ZERO); full = (buffer_data_count == COUNT_LAST); busy = (empty == 1'b0) && (full == 1'b0); end
The input interface inserts directly into the buffer, and only signals the other end to stop if the buffer is full.
reg insert = 1'b0; always @(*) begin input_ready = (full == 1'b0); insert = (input_valid == 1'b1) && (input_ready == 1'b1); end
The buffer output is registered, so the output interface is necessarily
pipelined and works in parallel with the input interface. Keeping that
pipeline full and signalling to the read_address
and data_count
counters when an item is removed from the buffer requires a separate little
engine to continually read from the buffer when possible.
Recall from above that output_valid
is a registered version of
buffer_rden
, matching the latency of a buffer read and signalling when
a buffer read has completed and new output_data
is available.
We update output_valid
and try to read from the buffer if the other end
of the output interface is ready (output_ready
is high), or if
output_data
is currently invalid, so even if output_ready
is low, we
try and pre-load the output interface. A buffer read happens if the buffer
is not empty. Otherwise, output_valid
updates from a buffer_rden
which
is necessarily low, and thus output_valid
goes (or remains) low in the
next cycle. If we can read from the buffer, and output_ready
is high,
then we have removed an item from the buffer.
reg remove = 1'b0; always @(*) begin update_output_valid = (output_valid == 1'b0) || (output_ready == 1'b1); buffer_rden = (empty == 1'b0) && (update_output_valid == 1'b1); remove = (buffer_rden == 1'b1) && (output_ready == 1'b1); end
Now that we have defined the possible states and operations of the datapath, we can define the datapath transformations, which are the edges between states. Or put otherwise: in which state(s) can the insert and/or remove operations happen, and what do they mean.
After this point, we can describe the rest of the control logic using only transformations, without reference to states, I/O signals, or operations.
reg load = 1'b0; // Inserts data into buffer. reg unload = 1'b0; // Remove data from buffer. reg flow = 1'b0; // Inserts new data into buffer as stored data is removed. always @(*) begin load = (full == 1'b0) && (insert == 1'b1) && (remove == 1'b0); unload = (empty == 1'b0) && (insert == 1'b0) && (remove == 1'b1); flow = (busy == 1'b1) && (insert == 1'b1) && (remove == 1'b1); end
Calculating the next state after each datapath transformation becomes the
control lines of the data_count
counter, where we increment, decrement,
or leave constant the count of items in the buffer, which defines the state
of the FIFO.
always @(*) begin update_buffer_data_count = (load == 1'b1) || (unload == 1'b1); incr_decr_buffer_data_count = (load == 1'b1) ? COUNT_UP : COUNT_DOWN; end
Here, we increment the read/write addresses as necessary (they only ever increment), enable a write to the buffer as necessary, and wrap-around the read/write addresses when they reach the end of the buffer.
always @(*) begin increment_buffer_write_addr = (load == 1'b1) || (flow == 1'b1); increment_buffer_read_addr = (unload == 1'b1) || (flow == 1'b1); buffer_wren = increment_buffer_write_addr; load_buffer_write_addr = (increment_buffer_write_addr == 1'b1) && (buffer_write_addr == ADDR_LAST [ADDR_WIDTH-1:0]); load_buffer_read_addr = (increment_buffer_read_addr == 1'b1) && (buffer_read_addr == ADDR_LAST [ADDR_WIDTH-1:0]); end endmodule