Accumulates a power-of-two number of signed integer samples, then divides the total by the same power-of-two, giving us an average without needing a full divider.
Accepts 2POWER_OF_TWO_EXPONENT input samples, then makes the
average available until it is read out. The output is buffered, so a new
average may be started before the previous average is read out. A stall
will happen if two averages are pending read-out. A positive edge on
restart_average
discards the current input and accumulation, and starts
a new average. Adjust EXTRA_PIPE_STAGES
to maintain clock speed if the
Accumulator adder becomes the critical path.
The width of the Accumulator is internally adjusted so it can never
overflow, regardless of the input sample sequence. However, should an
overflow somehow happen, the input_overflow
signal will hold high until
either the average is read out or restarted.
With apologies to Ned Washington regarding the module instance names...
`default_nettype none module Averager_Powers_of_Two #( parameter WORD_WIDTH = 0, parameter POWER_OF_TWO_EXPONENT = 0, parameter EXTRA_PIPE_STAGES = 0 ) ( input wire clock, input wire clear, input wire restart_average, input wire input_valid, output wire input_ready, input wire [WORD_WIDTH-1:0] input_sample, output wire input_overflow, output wire output_valid, input wire output_ready, output wire [WORD_WIDTH-1:0] output_average );
We rarely need this value explicitly, so update it here if your system use a different width for Verilog integers.
localparam VERILOG_INT_WIDTH = 32;
The accumulator needs to be wide enough to hold, in the extreme, the sum of 2**POWER_OF_TWO_EXPONENT samples, all of maximum magnitude.
localparam WORD_ZERO = {WORD_WIDTH{1'b0}}; localparam ACCUMULATOR_WIDTH = WORD_WIDTH + POWER_OF_TWO_EXPONENT; localparam ACCUMULATOR_ZERO = {ACCUMULATOR_WIDTH{1'b0}};
The counter counts samples from 2^N to 1, and we declare the acumulation ready when the counter hits zero, so that's (2^N)+1 cases, so we need an extra bit in the counter.
`include "clog2_function.vh" localparam SAMPLE_COUNT = 2 ** POWER_OF_TWO_EXPONENT; localparam COUNTER_WIDTH = clog2(SAMPLE_COUNT) + 1; localparam COUNTER_ONE = {{COUNTER_WIDTH-1{1'b0}}, 1'b1}; localparam COUNTER_ZERO = {COUNTER_WIDTH{1'b0}};
First, convert the input handshake into a pulse interface to the Accumulator_Binary.
wire [WORD_WIDTH-1:0] input_sample_passed; wire sample_valid; reg input_sample_next = 1'b0; Pipeline_to_Pulse #( .WORD_WIDTH (WORD_WIDTH) ) bring_em_in ( .clock (clock), .clear (clear), // Pipeline input .valid_in (input_valid), .ready_in (input_ready), .data_in (input_sample), // Pulse interface to connected module input .module_data_in (input_sample_passed), .module_data_in_valid (sample_valid), // Signal that the module can accept the next input .module_ready (input_sample_next) );
Then, widen the signed sample to the accumulator width.
wire [ACCUMULATOR_WIDTH-1:0] sample; Width_Adjuster #( .WORD_WIDTH_IN (WORD_WIDTH), .SIGNED (1), .WORD_WIDTH_OUT (ACCUMULATOR_WIDTH) ) widen_em ( .original_input (input_sample_passed), .adjusted_output (sample) );
Then, accumulate the samples together, taking any pipelining latency into
account. We let the Accumulator_Binary set the pace with its done
signals. Although the accumulator is wide enough to never overflow, let's
provide that signal just in case.
reg clear_accumulator = 1'b0; wire clear_done; wire sample_done; wire [ACCUMULATOR_WIDTH-1:0] sample_sum; wire sample_overflow; Accumulator_Binary #( .EXTRA_PIPE_STAGES (EXTRA_PIPE_STAGES), .WORD_WIDTH (ACCUMULATOR_WIDTH), .INITIAL_VALUE (ACCUMULATOR_ZERO) ) add_em_up ( .clock (clock), .clock_enable (1'b1), .clear (clear_accumulator), .clear_done (clear_done), .increment_carry_in (1'b0), .increment_add_sub (1'b0), // 0/1 --> +/- .increment_value (sample), .increment_valid (sample_valid), .increment_done (sample_done), .load_value (ACCUMULATOR_ZERO), .load_valid (1'b0), // verilator lint_off PINCONNECTEMPTY .load_done (), // verilator lint_on PINCONNECTEMPTY .accumulated_value (sample_sum), // verilator lint_off PINCONNECTEMPTY .accumulated_value_carry_out (), .accumulated_value_carries (), // verilator lint_on PINCONNECTEMPTY .accumulated_value_signed_overflow (sample_overflow) );
Each time we accumulate a sample, decrement the counter one step. When the counter reaches zero after a sample is accumulated, we are done.
reg reset_counter = 1'b0; wire [COUNTER_WIDTH-1:0] samples_remaining; Counter_Binary #( .WORD_WIDTH (COUNTER_WIDTH), .INCREMENT (COUNTER_ONE), .INITIAL_COUNT (SAMPLE_COUNT [COUNTER_WIDTH-1:0]) ) count_em_down ( .clock (clock), .clear (reset_counter), .up_down (1'b1), // 0/1 --> up/down .run (sample_valid), .load (1'b0), .load_count (COUNTER_ZERO), .carry_in (1'b0), // verilator lint_off PINCONNECTEMPTY .carry_out (), .carries (), .overflow (), // verilator lint_on PINCONNECTEMPTY .count (samples_remaining) );
Since we allow signed samples, division by a power of two is only mostly a right-shift. There's a little correction required, done here in the Divider_Integer_Signed_by_Powers_of_Two module. Since the exponent is a constant power-of-two here, the divider should reduce to a bit of adder logic, even though we have to extend the exponent to match the accumulator width.
wire [ACCUMULATOR_WIDTH-1:0] EXPONENT_EXTENDED; Width_Adjuster #( .WORD_WIDTH_IN (VERILOG_INT_WIDTH), .SIGNED (0), .WORD_WIDTH_OUT (ACCUMULATOR_WIDTH) ) make_it_wide ( .original_input (POWER_OF_TWO_EXPONENT), .adjusted_output (EXPONENT_EXTENDED) ); wire [ACCUMULATOR_WIDTH-1:0] raw_average; Divider_Integer_Signed_by_Powers_of_Two #( .WORD_WIDTH (ACCUMULATOR_WIDTH) ) split_em_up ( .numerator (sample_sum), .exponent_of_two (EXPONENT_EXTENDED), .quotient (raw_average), // verilator lint_off PINCONNECTEMPTY .remainder () // verilator lint_on PINCONNECTEMPTY );
Then, truncate the result back down to WORD_WIDTH. Because we made sure the accumulator should never overflow, and because we work with power-of-two number of samples, truncation should never lose information.
wire [WORD_WIDTH-1:0] truncated_average; Width_Adjuster #( .WORD_WIDTH_IN (ACCUMULATOR_WIDTH), .SIGNED (1), .WORD_WIDTH_OUT (WORD_WIDTH) ) cut_em_down ( .original_input (raw_average), .adjusted_output (truncated_average) );
Finally, convert the pulse-controlled output to the output pipeline handshake interface.
reg truncated_average_valid = 1'b0; wire average_read_out; Pulse_to_Pipeline #( .WORD_WIDTH (WORD_WIDTH), .OUTPUT_BUFFER_TYPE ("SKID"), // "HALF", "SKID", "FIFO" .OUTPUT_BUFFER_CIRCULAR (0), .FIFO_BUFFER_DEPTH (), // Only for "FIFO" .FIFO_BUFFER_RAMSTYLE () // Only for "FIFO" ) bring_em_out ( .clock (clock), .clear (clear), // Pipeline output .valid_out (output_valid), .ready_out (output_ready), .data_out (output_average), // Pulse interface from connected module .module_data_out (truncated_average), .module_data_out_valid (truncated_average_valid), // Signal that the module can accept the next input .module_ready (average_read_out) );
Firstly, since sample_overflow
is reset by consecutive accumulations,
let's hold it until we start a new average, either by clearing, restarting,
or reading out the current average once ready.
reg clear_overflow = 1'b0; Pulse_Latch #( .RESET_VALUE (1'b0) ) hold_em_high ( .clock (clock), .clear (clear_overflow), .pulse_in (sample_overflow), .level_out (input_overflow) );
Then, catch any interruption of the averaging process by a positive edge on
restart_average
, cleaning it up to a single cycle pulse.
wire restart; Pulse_Generator turn_em_round ( .clock (clock), .level_in (restart_average), .pulse_posedge_out (restart), // verilator lint_off PINCONNECTEMPTY .pulse_negedge_out (), .pulse_anyedge_out () // verilator lint_on PINCONNECTEMPTY );
At reset, when reading out an average, or when restarting, clear the accumulator, any status signals, and the counter.
always @(*) begin clear_accumulator = (clear == 1'b1) || (average_read_out == 1'b1) || (restart == 1'b1); clear_overflow = (clear_accumulator == 1'b1); reset_counter = (clear_accumulator == 1'b1); end
Finally, accept a new sample once either the current sample has been accumulated, or we are done clearing the accumulator. Provide a new average once all samples have been accumulated.
always @(*) begin input_sample_next = ((sample_done == 1'b1) && (samples_remaining != COUNTER_ZERO)) || (clear_done == 1'b1); truncated_average_valid = (sample_done == 1'b1) && (samples_remaining == COUNTER_ZERO); end endmodule