Pipelines data words through a number of register stages, with both parallel and serial inputs and outputs. Besides the obvious uses for serial/parallel conversion and pipeline alignment, a register pipeline can be part of shift-and-add algorithms such as multiplication through conditional addition.
Each cycle clock_enable
is high, the pipeline shifts by one from LSB to
MSB, or loads a new set of parallel values. Load overrides shift.
pipe_in
feeds the LSB, and pipe_out
read from the MSB.
NOTE: PIPE_DEPTH
must be 1 or greater. (Supporting a depth of zero
would make this code far too messy and leave the parallel input/output
ports unconnected, which will raise CAD warnings. See the Simple Register
Pipeline instead.)
Depending on how you parameterize and use it, a register pipeline can act as a delay pipeline or a shift register:
WORD_WIDTH
to the width of the data word, then
PIPE_DEPTH
to the number of delay stages. This will move whole data words
along the pipeline. WORD_WIDTH
to 1, and PIPE_DEPTH
to the
width of the data word you wish to shift in or out bit-by-bit. Load the
word via parallel_in
, then shift it out through pipe_out
. Or, shift in
PIPE_DEPTH
bits through pipe_in
, then read the data word on
parallel_out
.If no parallel loads are required, hardwire parallel_load
to zero, and
the multiplexers will optimize away, if any, and you'll end up with a pure
shift register (but see the Simple Register
Pipeline if this is your main use-case).
Conversely, hardwire parallel_load
to one, and tie off the pipe_in
input, and you'll end up with a conveniently packaged bank of registers.
The RESET_VALUES
parameter allows each pipeline stage to start loaded
with a known initial value, which can simplify system startup. The pipeline
will also clear to the same values. Set RESET_VALUES
to the concatenation
of all initial/reset values, with the rightmost value being the first one
(at the least-significant bit (LSB)).
`default_nettype none module Register_Pipeline #( parameter WORD_WIDTH = 0, parameter PIPE_DEPTH = 0, // Don't set at instantiation parameter TOTAL_WIDTH = WORD_WIDTH * PIPE_DEPTH, // concatenation of each stage initial/reset value parameter [TOTAL_WIDTH-1:0] RESET_VALUES = 0 ) ( input wire clock, input wire clock_enable, input wire clear, input wire parallel_load, input wire [TOTAL_WIDTH-1:0] parallel_in, output reg [TOTAL_WIDTH-1:0] parallel_out, input wire [WORD_WIDTH-1:0] pipe_in, output reg [WORD_WIDTH-1:0] pipe_out ); localparam WORD_ZERO = {WORD_WIDTH{1'b0}}; initial begin pipe_out = WORD_ZERO; end
Each pipeline state is composed of a Multiplexer feeding a Register, so we can select either the output of the previous Register, or the parallel load data. So we need a set of input and ouput wires for each stage.
wire [WORD_WIDTH-1:0] pipe_stage_in [PIPE_DEPTH-1:0]; wire [WORD_WIDTH-1:0] pipe_stage_out [PIPE_DEPTH-1:0];
The following attributes prevent the implementation of the multiplexer with DSP blocks. This can be a useful implementation choice sometimes, but here it's terrible, since FPGA flip-flops usually have separate data and synchronous load inputs, giving us a 2:1 mux for free. If not, then we should use LUTs instead, or other multiplexers built into the logic blocks.
(* multstyle = "logic" *) // Quartus (* use_dsp = "no" *) // Vivado
We strip out first iteration of module instantiations to avoid having to
refer to index -1 in the generate loop, and also to connect to pipe_in
rather than the output of a previous register.
Multiplexer_Binary_Behavioural #( .WORD_WIDTH (WORD_WIDTH), .ADDR_WIDTH (1), .INPUT_COUNT (2) ) pipe_input_select ( .selector (parallel_load), .words_in ({parallel_in[0 +: WORD_WIDTH], pipe_in}), .word_out (pipe_stage_in[0]) ); Register #( .WORD_WIDTH (WORD_WIDTH), .RESET_VALUE (RESET_VALUES[0 +: WORD_WIDTH]) ) pipe_stage ( .clock (clock), .clock_enable (clock_enable), .clear (clear), .data_in (pipe_stage_in[0]), .data_out (pipe_stage_out[0]) ); always @(*) begin parallel_out[0 +: WORD_WIDTH] = pipe_stage_out[0]; end
Now repeat over the remainder of the pipeline stages, starting at stage 1, connecting each pipeline stage to the output of the previous pipeline stage.
generate genvar i; for(i=1; i < PIPE_DEPTH; i=i+1) begin : pipe_stages (* multstyle = "logic" *) // Quartus (* use_dsp = "no" *) // Vivado Multiplexer_Binary_Behavioural #( .WORD_WIDTH (WORD_WIDTH), .ADDR_WIDTH (1), .INPUT_COUNT (2) ) pipe_input_select ( .selector (parallel_load), .words_in ({parallel_in[WORD_WIDTH*i +: WORD_WIDTH], pipe_stage_out[i-1]}), .word_out (pipe_stage_in[i]) ); Register #( .WORD_WIDTH (WORD_WIDTH), .RESET_VALUE (RESET_VALUES[WORD_WIDTH*i +: WORD_WIDTH]) ) pipe_stage ( .clock (clock), .clock_enable (clock_enable), .clear (clear), .data_in (pipe_stage_in[i]), .data_out (pipe_stage_out[i]) ); always @(*) begin parallel_out[WORD_WIDTH*i +: WORD_WIDTH] = pipe_stage_out[i]; end end endgenerate
And finally, connect the output of the last register to the module pipe output.
always @(*) begin pipe_out = pipe_stage_out[PIPE_DEPTH-1]; end endmodule