In the clk_main domain, pulse start_alignment
high for one cycle, then
wait for done_alignment
to pulse high for one cycle, signalling
bit-alignment is complete. All other signals are in the clk_rxio_frame
domain, which also runs the SERDES.
Uses the positive and negative deserialized data to measure the region of stable data ("the eye") and then adjust the delay of the data so we sample the data close to the middle of the stable region.
This algorithm is based on "SPI-4.2 Dynamic Phase Alignment" by Robert Le and Kyle Locke at BDTIC (Shenzhen Xingu Integrated Circuit Co., Ltd.). http://www.bdtic.com/download/Xilinx/WP249.pdf https://docs.xilinx.com/v/u/en-US/wp249 (Xilinx WP249 (v1.3) July 6, 2011)
This algorithm has simpler logic than the one described in XAPP855 and XAPP860, but depends on not having to use SERDES width expansion: both positive and negative data SERDES must be independent and in MASTER mode.
This algorithm also enables the possibility of dynamic adjustment of the delay while the SERDES is operating (see XAP860), which is not implemented here.
Assume a constant stream of a single constant training word of width equal to the SERDES output width. We send the word bits as differential data, with each polarity deserialized by a dedicated SERDES (P and N), with each deserialized word framed by a common valid pulse, which is the step forward signal for the alignment process.
We have two input delay chains, one for the P SERDES and one for the N SERDES.
Init the P delay at tap 0 and the N delay at tap 2 (TAP_OFFSET = 2).
This means the N SERDES sees a bit from two cycles ago relative to the P SERDES seeing the curent bit. We can imagine this as the N SERDES sampling a bit 2 cycles earlier in the data word. Thus, incrementing the P and N delays together moves the sampling points backwards in a data word we imagine to be fixed.
At each step, we compare the results of the P and N SERDES bits for the whole word.
Assuming we begin in a stable area, as we move backwards, the N SERDES will hit the transition area between bits first, causing the N and P SERDES output to mismatch, and the P SERDES will be the last one to exit the transition area, when the P and N SERDES output match again.
If we happen to begin in an unstable area, the logic remains the same: increment P and N taps until the SERDES output match again.
We find the first stable location after a transition area, and initialize a counter of valid taps (V) to zero. We then increment the P and N delay, and the V counter, until the SERDES output do not match, signalling the end of the stable area.
At this point, we now have V+1 valid taps (equal to P plus the one tap before the N tap). We can then re-set the P and N delays to the (nearest) middle of the stable area with P = N = P + (V >> 1) - (TAP_OFFSET - 2).
This algorithm does assume the total jitter is not so bimodally distributed that a stable area of width greater than TAP_OFFSET delay taps exists inside the bit transition areas.
All tap counts use their full bit widths and will correctly wrap around. This means the algorithm will keep searching until it finds the eye, even through a wraparound.
There is no guarantee of finding the minimum working tap value, only a working one. This means you must assume the worst case jitter possibly introduced by the IDELAY tap chain.
`default_nettype none module iserdes_bit_alignment #( parameter TAP_OFFSET = 2, // How much the initial N IDELAY lags the P IDELAY, for bit alignment. parameter WORD_WIDTH = 12, // Do not set at instantiation, except in Vivado IPI parameter TAP_COUNTER_WIDTH = 5 ) ( // clk_rxio_frame domain, for SERDES data and control input wire clk_rxio_frame, input wire rst_rxio_frame_n, input wire datain_parallel_valid, // There is a handshake, but no slack for stalling! output reg datain_parallel_ready, // Must always be ready before valid! input wire [WORD_WIDTH-1:0] datain_p_parallel, // Deserialized positive data, framed by datain_parallel_valid input wire [WORD_WIDTH-1:0] datain_n_parallel, // Deserialized negative data, framed by datain_parallel_valid input wire [TAP_COUNTER_WIDTH-1:0] tap_p_current, // Current value of delay tap output reg [TAP_COUNTER_WIDTH-1:0] tap_p_load_value, // New value of delay tap output reg tap_p_load, // Load new delay tap value input wire [TAP_COUNTER_WIDTH-1:0] tap_n_current, // Current value of delay tap output reg [TAP_COUNTER_WIDTH-1:0] tap_n_load_value, // New value of delay tap output reg tap_n_load, // Load new delay tap value // System control signals in clk_main domain input wire clk_main, // General logic clock output reg sync_train, // Signal sensor to output training word input wire start_alignment, // Preferably a one-cycle pulse output wire done_alignment // Pulsed high means serdes data is bit-aligned ); localparam TAP_ZERO = {TAP_COUNTER_WIDTH{1'b0}}; localparam TAP_ONE = {{TAP_COUNTER_WIDTH-1{1'b0}},1'b1}; localparam TAP_TWO = {{TAP_COUNTER_WIDTH-2{1'b0}},2'b10}; initial begin datain_parallel_ready = 1'b1; // Always ready (no backpressure possible) tap_p_load_value = TAP_ZERO; tap_p_load = 1'b0; tap_n_load_value = TAP_ZERO; tap_n_load = 1'b0; sync_train = 1'b0; end
Transfer the control signals to/from the SERDES clock domain, and have the bit-alignment logic work in the SERDES clock domain since we cannot have the extra latency of passing the SERDES data into the main clock domain without much complication of the state machine. (we could not tell if a new data value was one affected by the latest change in tap delay)
FIXME: It's unclear what the control interface should be here, but the CDC of control signals, not data, is certain.
Transfer the pulse signalling start of training into the SERDES clock domain. A pulse send during training in progress is lost and has no effect.
wire start_alignment_rxio; CDC_Pulse_Synchronizer_2phase #( .CDC_EXTRA_DEPTH (0) ) start_alignment_transfer ( .sending_clock (clk_main), .sending_pulse_in (start_alignment), // verilator lint_off PINCONNECTEMPTY .sending_ready (), // verilator lint_on PINCONNECTEMPTY .receiving_clock (clk_rxio_frame), .receiving_pulse_out (start_alignment_rxio) );
Transfer the pulse signalling the end of training from the SERDES clock domain into the main system clock domain.
reg done_alignment_rxio = 1'b0; CDC_Pulse_Synchronizer_2phase #( .CDC_EXTRA_DEPTH (0) ) done_alignment_transfer ( .sending_clock (clk_rxio_frame), .sending_pulse_in (done_alignment_rxio), // verilator lint_off PINCONNECTEMPTY .sending_ready (), // verilator lint_on PINCONNECTEMPTY .receiving_clock (clk_main), .receiving_pulse_out (done_alignment) );
When we signal to start alignment, turn on and hold sync_train
so the
sensor sends out a continuous stream of the training word. Once done
alignment, drop sync_train
.
wire sync_train_pulse; Pulse_Generator sync_train_pulse_generator ( .clock (clk_main), .level_in (start_alignment), .pulse_posedge_out (sync_train_pulse), // verilator lint_off PINCONNECTEMPTY .pulse_negedge_out (), .pulse_anyedge_out () // verilator lint_on PINCONNECTEMPTY ); wire sync_train_latched; Pulse_Latch #( .RESET_VALUE (1'b0) ) sync_train_pulse_latch ( .clock (clk_main), .clear (done_alignment), .pulse_in (sync_train_pulse), .level_out (sync_train_latched) ); always @(*) begin sync_train = sync_train_pulse || sync_train_latched; end
Check when the SERDES output words differ. Then latch it so we always know
what the last match state was. This way we don't have to synchronize some
events to datain_parallel_valid
.
reg serdes_outputs_match = 1'b0; always @(*) begin serdes_outputs_match = (datain_p_parallel == ~datain_n_parallel); end wire serdes_outputs_match_latched; Register #( .WORD_WIDTH (1), .RESET_VALUE (1'b0) ) serdes_outputs_latest_state ( .clock (clk_rxio_frame), .clock_enable (datain_parallel_valid == 1'b1), .clear (~rst_rxio_frame_n), .data_in (serdes_outputs_match), .data_out (serdes_outputs_match_latched) );
Count the number of taps in the stable data area.
reg stable_tap_count_increment = 1'b0; reg stable_tap_count_load = 1'b0; wire [TAP_COUNTER_WIDTH-1:0] stable_tap_count; Counter_Binary #( .WORD_WIDTH (TAP_COUNTER_WIDTH), .INCREMENT (TAP_ONE), .INITIAL_COUNT (TAP_ZERO) ) stable_tap_counter ( .clock (clk_rxio_frame), .clear (~rst_rxio_frame_n), .up_down (1'b0), // 0/1 --> up/down .run (stable_tap_count_increment), .load (stable_tap_count_load), .load_count (TAP_ZERO), .carry_in (1'b0), // verilator lint_off PINCONNECTEMPTY .carry_out (), .carries (), .overflow (), // verilator lint_on PINCONNECTEMPTY .count (stable_tap_count) );
Calculate the next tap values and the final aligned tap value.
Normally I'd use Adder_Subtractor modules, but here we know all numbers are unsigned and of the same width, no carry in/out is needed, and wrap-around is expected and desired. There are no corner-cases. So let the CAD tool synthesize the math here.
reg [TAP_COUNTER_WIDTH-1:0] tap_p_next = TAP_ZERO; reg [TAP_COUNTER_WIDTH-1:0] tap_n_next = TAP_ZERO; reg [TAP_COUNTER_WIDTH-1:0] tap_aligned = TAP_ZERO; always @(*) begin tap_p_next = tap_p_current + TAP_ONE; tap_n_next = tap_n_current + TAP_ONE; tap_aligned = tap_p_current - (stable_tap_count >> 1) - (TAP_OFFSET [TAP_COUNTER_WIDTH-1:0] - TAP_TWO); end
State Logic
localparam STATE_WIDTH = 2; localparam [STATE_WIDTH-1:0] STATE_IDLE = 'd0; localparam [STATE_WIDTH-1:0] STATE_FIND_FIRST = 'd1; localparam [STATE_WIDTH-1:0] STATE_THRU_FIRST = 'd2; localparam [STATE_WIDTH-1:0] STATE_FIND_SECOND = 'd3; wire [STATE_WIDTH-1:0] state; reg [STATE_WIDTH-1:0] state_next = STATE_IDLE; Register #( .WORD_WIDTH (STATE_WIDTH), .RESET_VALUE (STATE_IDLE) ) state_reg ( .clock (clk_rxio_frame), .clock_enable (1'b1), .clear (~rst_rxio_frame_n), .data_in (state_next), .data_out (state) );
Datapath Transformations
wire all_taps_stable; // See Pulse_Divider below. Pulses high when all 2**TAP_COUNTER_WIDTH taps have been tried without a transition found. reg init_taps = 1'b0; // Load both P and N IDELAY with the start delay tap values: P is 0 and N is TAP_OFFSET. reg init_find_first = 1'b0; // Starting from a stable area, start looking for the start of the first data transition. reg init_thru_first = 1'b0; // Starting from a transition area, start looking for the end of that first data transition area. reg finding_first = 1'b0; // Currently in stable area, keep incrementing taps. reg none_first = 1'b0; // From inside a stable area, reach the end of possible taps because there is NO transition area (perfect P/N alignment) reg found_first = 1'b0; // From inside a stable area, found the start of the first data transition. reg exiting_first = 1'b0; // Currently in a transition area, keep incrementing taps. reg exited_first = 1'b0; // From inside the first transition area, found the start of the stable area. This is one end of the eye. reg finding_second = 1'b0; // Currently in stable area, keep incrementing taps. reg none_second = 1'b0; // From inside a stable area, reach the end of possible taps because there is NO transition area (perfect P/N alignment) reg found_second = 1'b0; // From the stable area, found the start of the second transition. This is the other end of the eye. always @(*) begin init_taps = (state == STATE_IDLE) && (start_alignment_rxio == 1'b1); init_find_first = (state == STATE_IDLE) && (serdes_outputs_match_latched == 1'b1); // latch, so no sync to valid needed init_thru_first = (state == STATE_IDLE) && (serdes_outputs_match_latched == 1'b0); finding_first = (state == STATE_FIND_FIRST) && (serdes_outputs_match == 1'b1) && (datain_parallel_valid == 1'b1); none_first = (finding_first == 1'b1) && (all_taps_stable == 1'b1); found_first = (state == STATE_FIND_FIRST) && (serdes_outputs_match == 1'b0) && (datain_parallel_valid == 1'b1); exiting_first = (state == STATE_THRU_FIRST) && (serdes_outputs_match == 1'b0) && (datain_parallel_valid == 1'b1); exited_first = (state == STATE_THRU_FIRST) && (serdes_outputs_match == 1'b1) && (datain_parallel_valid == 1'b1); finding_second = (state == STATE_FIND_SECOND) && (serdes_outputs_match == 1'b1) && (datain_parallel_valid == 1'b1); none_second = (finding_second == 1'b1) && (all_taps_stable == 1'b1); found_second = (state == STATE_FIND_SECOND) && (serdes_outputs_match == 1'b0) && (datain_parallel_valid == 1'b1); end
Signal when all possible taps have been tried. If we reach that point without a transition, then the alignement is already perfect (or close enough), so we use the signal to skip searching for transition regions. The system will then naturally find the middle tap as the best one.
localparam [TAP_COUNTER_WIDTH-1+1:0] TAP_COUNT = 2**TAP_COUNTER_WIDTH; Pulse_Divider #( .WORD_WIDTH (TAP_COUNTER_WIDTH+1), .INITIAL_DIVISOR (TAP_COUNT) ) all_taps_stable_detector ( .clock (clk_rxio_frame), .restart (1'b0), .divisor (TAP_COUNT), .pulses_in (finding_first | finding_second), .pulse_out (all_taps_stable), // verilator lint_off PINCONNECTEMPTY .div_by_zero () // verilator lint_on PINCONNECTEMPTY );
State Transitions
always @(*) begin state_next = init_taps && init_find_first ? STATE_FIND_FIRST : state; state_next = init_taps && init_thru_first ? STATE_THRU_FIRST : state_next; state_next = found_first ? STATE_THRU_FIRST : state_next; state_next = none_first ? STATE_FIND_SECOND : state_next; state_next = exited_first ? STATE_FIND_SECOND : state_next; state_next = found_second ? STATE_IDLE : state_next; state_next = none_second ? STATE_IDLE : state_next; end
Control Signals
always @(*) begin tap_p_load_value = init_taps ? TAP_ZERO : tap_p_next; tap_p_load_value = found_second | none_second ? tap_aligned : tap_p_load_value; tap_p_load = init_taps | finding_first | exiting_first | finding_second | found_second | none_second; tap_n_load_value = init_taps ? TAP_OFFSET : tap_n_next; tap_n_load_value = found_second | none_second ? tap_aligned : tap_n_load_value; tap_n_load = init_taps | finding_first | exiting_first | finding_second | found_second | none_second; stable_tap_count_load = exited_first | none_first; stable_tap_count_increment = finding_second; done_alignment_rxio = found_second | none_second; end endmodule