ISERDES bit aligment

Usage

In the clk_main domain, pulse start_alignment high for one cycle, then wait for done_alignment to pulse high for one cycle, signalling bit-alignment is complete. All other signals are in the clk_rxio_frame domain, which also runs the SERDES.

Algorithm Sources

Uses the positive and negative deserialized data to measure the region of stable data ("the eye") and then adjust the delay of the data so we sample the data close to the middle of the stable region.

This algorithm is based on "SPI-4.2 Dynamic Phase Alignment" by Robert Le and Kyle Locke at BDTIC (Shenzhen Xingu Integrated Circuit Co., Ltd.). http://www.bdtic.com/download/Xilinx/WP249.pdf https://docs.xilinx.com/v/u/en-US/wp249 (Xilinx WP249 (v1.3) July 6, 2011)

This algorithm has simpler logic than the one described in XAPP855 and XAPP860, but depends on not having to use SERDES width expansion: both positive and negative data SERDES must be independent and in MASTER mode.

This algorithm also enables the possibility of dynamic adjustment of the delay while the SERDES is operating (see XAP860), which is not implemented here.

Operation

Assume a constant stream of a single constant training word of width equal to the SERDES output width. We send the word bits as differential data, with each polarity deserialized by a dedicated SERDES (P and N), with each deserialized word framed by a common valid pulse, which is the step forward signal for the alignment process.

We have two input delay chains, one for the P SERDES and one for the N SERDES.

Init the P delay at tap 0 and the N delay at tap 2 (TAP_OFFSET = 2).

This means the N SERDES sees a bit from two cycles ago relative to the P SERDES seeing the curent bit. We can imagine this as the N SERDES sampling a bit 2 cycles earlier in the data word. Thus, incrementing the P and N delays together moves the sampling points backwards in a data word we imagine to be fixed.

At each step, we compare the results of the P and N SERDES bits for the whole word.

Assuming we begin in a stable area, as we move backwards, the N SERDES will hit the transition area between bits first, causing the N and P SERDES output to mismatch, and the P SERDES will be the last one to exit the transition area, when the P and N SERDES output match again.

If we happen to begin in an unstable area, the logic remains the same: increment P and N taps until the SERDES output match again.

We find the first stable location after a transition area, and initialize a counter of valid taps (V) to zero. We then increment the P and N delay, and the V counter, until the SERDES output do not match, signalling the end of the stable area.

At this point, we now have V+1 valid taps (equal to P plus the one tap before the N tap). We can then re-set the P and N delays to the (nearest) middle of the stable area with P = N = P + (V >> 1) - (TAP_OFFSET - 2).

Corner Cases

This algorithm does assume the total jitter is not so bimodally distributed that a stable area of width greater than TAP_OFFSET delay taps exists inside the bit transition areas.

All tap counts use their full bit widths and will correctly wrap around. This means the algorithm will keep searching until it finds the eye, even through a wraparound.

There is no guarantee of finding the minimum working tap value, only a working one. This means you must assume the worst case jitter possibly introduced by the IDELAY tap chain.

`default_nettype none

module iserdes_bit_alignment
#(
    parameter TAP_OFFSET        = 2,  // How much the initial N IDELAY lags the P IDELAY, for bit alignment.
    parameter WORD_WIDTH        = 12,

    // Do not set at instantiation, except in Vivado IPI
    parameter TAP_COUNTER_WIDTH = 5
)
(
    // clk_rxio_frame domain, for SERDES data and control

    input   wire                            clk_rxio_frame,
    input   wire                            rst_rxio_frame_n,

    input   wire                            datain_parallel_valid,  // There is a handshake, but no slack for stalling!
    output  reg                             datain_parallel_ready,  // Must always be ready before valid!
    input   wire    [WORD_WIDTH-1:0]        datain_p_parallel,      // Deserialized positive data, framed by datain_parallel_valid
    input   wire    [WORD_WIDTH-1:0]        datain_n_parallel,      // Deserialized negative data, framed by datain_parallel_valid

    input   wire    [TAP_COUNTER_WIDTH-1:0] tap_p_current,      // Current value of delay tap
    output  reg     [TAP_COUNTER_WIDTH-1:0] tap_p_load_value,   // New value of delay tap
    output  reg                             tap_p_load,         // Load new delay tap value

    input   wire    [TAP_COUNTER_WIDTH-1:0] tap_n_current,      // Current value of delay tap
    output  reg     [TAP_COUNTER_WIDTH-1:0] tap_n_load_value,   // New value of delay tap
    output  reg                             tap_n_load,         // Load new delay tap value

    // System control signals in clk_main domain

    input   wire                            clk_main,           // General logic clock

    output  reg                             sync_train,         // Signal sensor to output training word

    input   wire                            start_alignment,    // Preferably a one-cycle pulse
    output  wire                            done_alignment      // Pulsed high means serdes data is bit-aligned
);

    localparam TAP_ZERO  = {TAP_COUNTER_WIDTH{1'b0}};
    localparam TAP_ONE   = {{TAP_COUNTER_WIDTH-1{1'b0}},1'b1};
    localparam TAP_TWO   = {{TAP_COUNTER_WIDTH-2{1'b0}},2'b10};

    initial begin
        datain_parallel_ready   = 1'b1; // Always ready (no backpressure possible)
        tap_p_load_value        = TAP_ZERO;
        tap_p_load              = 1'b0;
        tap_n_load_value        = TAP_ZERO;
        tap_n_load              = 1'b0;
        sync_train              = 1'b0;
    end

Datapath Operations

Transfer the control signals to/from the SERDES clock domain, and have the bit-alignment logic work in the SERDES clock domain since we cannot have the extra latency of passing the SERDES data into the main clock domain without much complication of the state machine. (we could not tell if a new data value was one affected by the latest change in tap delay)

FIXME: It's unclear what the control interface should be here, but the CDC of control signals, not data, is certain.

Transfer the pulse signalling start of training into the SERDES clock domain. A pulse send during training in progress is lost and has no effect.

    wire start_alignment_rxio;

    CDC_Pulse_Synchronizer_2phase
    #(
        .CDC_EXTRA_DEPTH        (0)
    )
    start_alignment_transfer
    (
        .sending_clock          (clk_main),
        .sending_pulse_in       (start_alignment),
        // verilator lint_off PINCONNECTEMPTY
        .sending_ready          (),
        // verilator lint_on  PINCONNECTEMPTY

        .receiving_clock        (clk_rxio_frame),
        .receiving_pulse_out    (start_alignment_rxio)
    );

Transfer the pulse signalling the end of training from the SERDES clock domain into the main system clock domain.

    reg done_alignment_rxio = 1'b0;

    CDC_Pulse_Synchronizer_2phase
    #(
        .CDC_EXTRA_DEPTH        (0)
    )
    done_alignment_transfer
    (
        .sending_clock          (clk_rxio_frame),
        .sending_pulse_in       (done_alignment_rxio),
        // verilator lint_off PINCONNECTEMPTY
        .sending_ready          (),
        // verilator lint_on  PINCONNECTEMPTY

        .receiving_clock        (clk_main),
        .receiving_pulse_out    (done_alignment)
    );

Sensor Training Mode

When we signal to start alignment, turn on and hold sync_train so the sensor sends out a continuous stream of the training word. Once done alignment, drop sync_train.

    wire sync_train_pulse;

    Pulse_Generator
    sync_train_pulse_generator
    (
        .clock              (clk_main),
        .level_in           (start_alignment),
        .pulse_posedge_out  (sync_train_pulse),
        // verilator lint_off PINCONNECTEMPTY
        .pulse_negedge_out  (),
        .pulse_anyedge_out  ()
        // verilator lint_on  PINCONNECTEMPTY
    );

    wire sync_train_latched;

    Pulse_Latch
    #(
        .RESET_VALUE    (1'b0)
    )
    sync_train_pulse_latch
    (
        .clock          (clk_main),
        .clear          (done_alignment),
        .pulse_in       (sync_train_pulse),
        .level_out      (sync_train_latched)
    );

    always @(*) begin
        sync_train = sync_train_pulse || sync_train_latched;
    end

Check when the SERDES output words differ. Then latch it so we always know what the last match state was. This way we don't have to synchronize some events to datain_parallel_valid.

    reg serdes_outputs_match = 1'b0;

    always @(*) begin
        serdes_outputs_match = (datain_p_parallel == ~datain_n_parallel);
    end

    wire serdes_outputs_match_latched;

    Register
    #(
        .WORD_WIDTH     (1),
        .RESET_VALUE    (1'b0)
    )
    serdes_outputs_latest_state
    (
        .clock          (clk_rxio_frame),
        .clock_enable   (datain_parallel_valid == 1'b1),
        .clear          (~rst_rxio_frame_n),
        .data_in        (serdes_outputs_match),
        .data_out       (serdes_outputs_match_latched)
    );

Count the number of taps in the stable data area.

    reg                             stable_tap_count_increment  = 1'b0;
    reg                             stable_tap_count_load       = 1'b0;
    wire [TAP_COUNTER_WIDTH-1:0]    stable_tap_count;

    Counter_Binary
    #(
        .WORD_WIDTH     (TAP_COUNTER_WIDTH),
        .INCREMENT      (TAP_ONE),
        .INITIAL_COUNT  (TAP_ZERO)
    )
    stable_tap_counter
    (
        .clock          (clk_rxio_frame),
        .clear          (~rst_rxio_frame_n),

        .up_down        (1'b0), // 0/1 --> up/down
        .run            (stable_tap_count_increment),

        .load           (stable_tap_count_load),
        .load_count     (TAP_ZERO),

        .carry_in       (1'b0),
        // verilator lint_off PINCONNECTEMPTY
        .carry_out      (),
        .carries        (),
        .overflow       (),
        // verilator lint_on  PINCONNECTEMPTY

        .count          (stable_tap_count)
    );

Calculate the next tap values and the final aligned tap value.

Normally I'd use Adder_Subtractor modules, but here we know all numbers are unsigned and of the same width, no carry in/out is needed, and wrap-around is expected and desired. There are no corner-cases. So let the CAD tool synthesize the math here.

    reg [TAP_COUNTER_WIDTH-1:0] tap_p_next  = TAP_ZERO;
    reg [TAP_COUNTER_WIDTH-1:0] tap_n_next  = TAP_ZERO;
    reg [TAP_COUNTER_WIDTH-1:0] tap_aligned = TAP_ZERO;

    always @(*) begin
        tap_p_next  = tap_p_current + TAP_ONE;
        tap_n_next  = tap_n_current + TAP_ONE;
        tap_aligned = tap_p_current - (stable_tap_count >> 1) - (TAP_OFFSET [TAP_COUNTER_WIDTH-1:0] - TAP_TWO);
    end

State Logic

    localparam  STATE_WIDTH                         = 2;
    localparam [STATE_WIDTH-1:0] STATE_IDLE         = 'd0;
    localparam [STATE_WIDTH-1:0] STATE_FIND_FIRST   = 'd1;
    localparam [STATE_WIDTH-1:0] STATE_THRU_FIRST   = 'd2;
    localparam [STATE_WIDTH-1:0] STATE_FIND_SECOND  = 'd3;

    wire [STATE_WIDTH-1:0] state;
    reg  [STATE_WIDTH-1:0] state_next = STATE_IDLE;
    
    Register
    #(
        .WORD_WIDTH     (STATE_WIDTH),
        .RESET_VALUE    (STATE_IDLE)
    )
    state_reg
    (
        .clock          (clk_rxio_frame),
        .clock_enable   (1'b1),
        .clear          (~rst_rxio_frame_n),
        .data_in        (state_next),
        .data_out       (state)
    );

Datapath Transformations

    wire all_taps_stable;           // See Pulse_Divider below. Pulses high when all 2**TAP_COUNTER_WIDTH taps have been tried without a transition found.

    reg init_taps           = 1'b0; // Load both P and N IDELAY with the start delay tap values: P is 0 and N is TAP_OFFSET.
    reg init_find_first     = 1'b0; // Starting from a stable area, start looking for the start of the first data transition.
    reg init_thru_first     = 1'b0; // Starting from a transition area, start looking for the end of that first data transition area.

    reg finding_first       = 1'b0; // Currently in stable area, keep incrementing taps.
    reg none_first          = 1'b0; // From inside a stable area, reach the end of possible taps because there is NO transition area (perfect P/N alignment)
    reg found_first         = 1'b0; // From inside a stable area, found the start of the first data transition.

    reg exiting_first       = 1'b0; // Currently in a transition area, keep incrementing taps.
    reg exited_first        = 1'b0; // From inside the first transition area, found the start of the stable area. This is one end of the eye.

    reg finding_second      = 1'b0; // Currently in stable area, keep incrementing taps.
    reg none_second         = 1'b0; // From inside a stable area, reach the end of possible taps because there is NO transition area (perfect P/N alignment)
    reg found_second        = 1'b0; // From the stable area, found the start of the second transition. This is the other end of the eye.

    always @(*) begin
        init_taps           = (state == STATE_IDLE) && (start_alignment_rxio         == 1'b1);
        init_find_first     = (state == STATE_IDLE) && (serdes_outputs_match_latched == 1'b1); // latch, so no sync to valid needed
        init_thru_first     = (state == STATE_IDLE) && (serdes_outputs_match_latched == 1'b0);

        finding_first       = (state == STATE_FIND_FIRST)  && (serdes_outputs_match == 1'b1) && (datain_parallel_valid == 1'b1);
        none_first          = (finding_first == 1'b1) && (all_taps_stable == 1'b1);
        found_first         = (state == STATE_FIND_FIRST)  && (serdes_outputs_match == 1'b0) && (datain_parallel_valid == 1'b1);

        exiting_first       = (state == STATE_THRU_FIRST)  && (serdes_outputs_match == 1'b0) && (datain_parallel_valid == 1'b1);
        exited_first        = (state == STATE_THRU_FIRST)  && (serdes_outputs_match == 1'b1) && (datain_parallel_valid == 1'b1);

        finding_second      = (state == STATE_FIND_SECOND) && (serdes_outputs_match == 1'b1) && (datain_parallel_valid == 1'b1);
        none_second         = (finding_second == 1'b1) && (all_taps_stable == 1'b1);
        found_second        = (state == STATE_FIND_SECOND) && (serdes_outputs_match == 1'b0) && (datain_parallel_valid == 1'b1);
    end

Signal when all possible taps have been tried. If we reach that point without a transition, then the alignement is already perfect (or close enough), so we use the signal to skip searching for transition regions. The system will then naturally find the middle tap as the best one.

    localparam [TAP_COUNTER_WIDTH-1+1:0] TAP_COUNT = 2**TAP_COUNTER_WIDTH;

    Pulse_Divider
    #(
        .WORD_WIDTH         (TAP_COUNTER_WIDTH+1),
        .INITIAL_DIVISOR    (TAP_COUNT)
    )
    all_taps_stable_detector
    (
        .clock          (clk_rxio_frame),
        .restart        (1'b0),
        .divisor        (TAP_COUNT),
        .pulses_in      (finding_first | finding_second),
        .pulse_out      (all_taps_stable),
        // verilator lint_off PINCONNECTEMPTY
        .div_by_zero    ()
        // verilator lint_on  PINCONNECTEMPTY
    );

State Transitions

    always @(*) begin
        state_next = init_taps && init_find_first ? STATE_FIND_FIRST  : state;
        state_next = init_taps && init_thru_first ? STATE_THRU_FIRST  : state_next;
        state_next = found_first                  ? STATE_THRU_FIRST  : state_next;
        state_next = none_first                   ? STATE_FIND_SECOND : state_next;
        state_next = exited_first                 ? STATE_FIND_SECOND : state_next;
        state_next = found_second                 ? STATE_IDLE        : state_next;
        state_next = none_second                  ? STATE_IDLE        : state_next;
    end

Control Signals

    always @(*) begin
        tap_p_load_value            = init_taps                  ? TAP_ZERO    : tap_p_next;
        tap_p_load_value            = found_second | none_second ? tap_aligned : tap_p_load_value;
        tap_p_load                  = init_taps | finding_first | exiting_first | finding_second | found_second | none_second;

        tap_n_load_value            = init_taps                  ? TAP_OFFSET  : tap_n_next;
        tap_n_load_value            = found_second | none_second ? tap_aligned : tap_n_load_value;
        tap_n_load                  = init_taps | finding_first | exiting_first | finding_second | found_second | none_second;

        stable_tap_count_load       = exited_first | none_first;
        stable_tap_count_increment  = finding_second;

        done_alignment_rxio         = found_second | none_second;
    end

endmodule

Back to FPGA Design Elements

fpgacpu.ca