// /--\ +- flow // | | // load | v fill // ------- + ------ + ------ //| | ---> | | ---> | | //| Empty | | Busy | | Full | //| | <--- | | <--- | | // ------- - ------ - ------ // unload flush //// We can see from the resulting state diagram that when the datapath is // empty, it can only support an insertion, and when it is full, it can only // support a removal. *These constraints will become very important later on.* // If the interfaces try to remove while Empty, or insert while Full, data // will be duplicated or lost, respectively. // This simple FSM description helped us clarify the problem, but it also // glossed over the potential complexity of the implementation: 3 states, each // connected to 2 signals (valid/ready) per interface, for a total of 16 // possible transitions out of each state, or 48 possible state transitions // total. // We don't want to have to manually enumerate all the transitions to then // coalesce the equivalent ones and rule out all the impossible or illegal // ones. Instead, if we express in logic the constraints on removals and // insertions we determined from the state diagram, and the possible // transformations on the datapath, we then get the state transition logic and // datapath control signal logic almost for free. // Lets describe the possible states of the datapath, and initialize it. This // code describes a binary state encoding, but the CAD tool can re-encode and // re-number the state encoding. Usually this is beneficial, but if the // states+inputs fit in a single LUT, forcing binary encoding reduces area. // See what works best (i.e.: reaches the highest speed) for your given FPGA. localparam STATE_BITS = 2; localparam [STATE_BITS-1:0] EMPTY = 'd0; // Output and buffer registers empty localparam [STATE_BITS-1:0] BUSY = 'd1; // Output register holds data localparam [STATE_BITS-1:0] FULL = 'd2; // Both output and buffer registers hold data // There is no case where only the buffer register would hold data. // No handling of erroneous and unreachable state 3. // We could check and raise an error flag. wire [STATE_BITS-1:0] state; reg [STATE_BITS-1:0] state_next = EMPTY; // Now, let's express the constraints we figured out from the state diagram: // * The input interface can only insert when the datapath is not full. // * The output interface can only remove data when the datapath is not empty. // We do this by computing the allowable output read/valid handshake signals // based on the datapath state. We use `state_next` so we can have nice // registered outputs. This little bit of code prunes away a large number of // invalid state transitions. If some other logic seems to be missing, first // see if this code has made it unnecessary. // *This tiny bit of code is critical* since it also implies the fundamental // operating assumptions of a skid buffer: that one interface cannot have its // current state depend on the current state of the other interface, as that // would be a combinational path between both interfaces. // Compute `ready` for the input interface Register #( .WORD_WIDTH (1), .RESET_VALUE (1'b1) // EMPTY at start, so accept data ) input_ready_reg ( .clock (clock), .clock_enable (1'b1), .clear (clear), .data_in (state_next != FULL), .data_out (input_ready) ); // Compute `valid` for the output interface Register #( .WORD_WIDTH (1), .RESET_VALUE (1'b0) ) output_valid_reg ( .clock (clock), .clock_enable (1'b1), .clear (clear), .data_in (state_next != EMPTY), .data_out (output_valid) ); // After, let's describe the interface signal conditions which implement our // two basic operations on the datapath: insert and remove. This also weeds // out a number of possible state transitions. reg insert = 1'b0; reg remove = 1'b0; always @(*) begin insert = (input_valid == 1'b1) && (input_ready == 1'b1); remove = (output_valid == 1'b1) && (output_ready == 1'b1); end // Now that we have our datapath states and operations, let's use them to // describe the possible transformations to the datapath, and in which state // they can happen. You'll see that these exactly describe each of the // 5 edges in the state diagram, and since we've pruned the space of possible // interface conditions, we only need the minimum logic to describe them, and // this logic gets re-used a lot later on, simplifying the code. reg load = 1'b0; // Empty datapath inserts data into output register. reg flow = 1'b0; // New inserted data into output register as the old data is removed. reg fill = 1'b0; // New inserted data into buffer register. Data not removed from output register. reg flush = 1'b0; // Move data from buffer register into output register. Remove old data. No new data inserted. reg unload = 1'b0; // Remove data from output register, leaving the datapath empty. always @(*) begin load = (state == EMPTY) && (insert == 1'b1) && (remove == 1'b0); flow = (state == BUSY) && (insert == 1'b1) && (remove == 1'b1); fill = (state == BUSY) && (insert == 1'b1) && (remove == 1'b0); flush = (state == FULL) && (insert == 1'b0) && (remove == 1'b1); unload = (state == BUSY) && (insert == 1'b0) && (remove == 1'b1); end // And now we simply need to calculate the next state after each datapath // transformations: always @(*) begin state_next = (load == 1'b1) ? BUSY : state; state_next = (flow == 1'b1) ? BUSY : state_next; state_next = (fill == 1'b1) ? FULL : state_next; state_next = (flush == 1'b1) ? BUSY : state_next; state_next = (unload == 1'b1) ? EMPTY : state_next; end Register #( .WORD_WIDTH (STATE_BITS), .RESET_VALUE (EMPTY) // Initial state ) state_reg ( .clock (clock), .clock_enable (1'b1), .clear (clear), .data_in (state_next), .data_out (state) ); // Similarly, from the datapath transformations, we can compute the necessary // control signals to the datapath. These are not registered here, as they end // at registers in the datapath. always @(*) begin data_out_wren = (load == 1'b1) || (flow == 1'b1) || (flush == 1'b1); data_buffer_wren = (fill == 1'b1); use_buffered_data = (flush == 1'b1); end endmodule // For a 64-bit connection, the resulting skid buffer uses 128 registers for // the buffers, 4 to 9 registers (and associated LUTs) for the FSM and // interface outputs, depending on the particular state encoding chosen by the // CAD tool, and easily reaches a high operating speed.