Calculates the signed integer quotient
and remainder
of the signed
integer dividend
and divisor
. This divider can handle large integers
(e.g. 128 bits), without impacting clock speed, at the expense of extra
latency.
22 / 7 = 3 rem 1 -22 / 7 = -3 rem -1 22 / -7 = -3 rem 1 -22 / -7 = 3 rem -1 7 / 22 = 0 rem 7 -7 / -22 = 0 rem -7 -7 / 22 = 0 rem -7 7 / -22 = 0 rem 7 0 / 0 = -1 rem 0 (raises divide_by_zero) 22 / 0 = -1 rem 22 (raises divide_by_zero) -22 / 0 = 1 rem 22 (raises divide_by_zero)
This module implements division as iterated conditional subtraction of
multiples of the divisor
from the dividend
, just as one would do
manually on paper. The algorithm is described in more detail inside the
Remainder and
Quotient modules.
Iterated subtraction is not the fastest algorithm, but all faster algorithms depend on multiplication, which requires a quadratically increasing amount of multiplication hardware as the bit width increases.
The Remainder and Quotient modules synchronize each division step calculation through a Skid Buffer Pipeline and coordinate the start and end of their calculations with Pipeline Fork and Pipeline Join modules. Each division step addition/subtraction is done using a Multiprecision Adder/Subtractor to avoid excessive area and carry-chain critical paths. This division of work allows buffering control and data as necessary to maintain a high clock frequency.
Start a division by completing the input ready/valid handshake, and read
out the results by completing the output handshake. Set WORD_WIDTH
to the
width of your integers, and STEP_WORD_WIDTH
to an equal or smaller
number, which will be the width of the internal adder/subtractors. If the
area is too large, or the carry-chains form the critical path, decrease
STEP_WORD_WIDTH
, which will proportionately decrease the area and the
carry-chain length, and increase the latency. If control between Remainder
and Quotient calculations becomes the critical path, increase
PIPELINE_STAGES_SYNC
, which has a negligible impact on area and
moderately increases latency.
The latency is approximately equal to WORD_WIDTH / STEP_WORD_WIDTH
cycles per bit, plus one cycle per bit for each PIPELINE_STAGES_SYNC
,
plus one cycle overall for the input and output handshakes each.
`default_nettype none module Divider_Integer_Signed #( parameter WORD_WIDTH = 0, parameter STEP_WORD_WIDTH = 0, parameter PIPELINE_STAGES_SYNC = 0 ) ( input wire clock, input wire clear, input wire input_valid, output wire input_ready, input wire [WORD_WIDTH-1:0] dividend, input wire [WORD_WIDTH-1:0] divisor, output wire output_valid, input wire output_ready, output wire [WORD_WIDTH-1:0] quotient, output wire [WORD_WIDTH-1:0] remainder, output wire divide_by_zero ); localparam UNIT_COUNT = 2; localparam TOTAL_INPUT_WIDTH = WORD_WIDTH * UNIT_COUNT; localparam TOTAL_OUTPUT_WIDTH = WORD_WIDTH + 1;
Buffers the input handshake and replicates it to the Remainder and Quotient modules.
wire input_valid_remainder; wire input_ready_remainder; wire input_valid_quotient; wire input_ready_quotient; wire [WORD_WIDTH-1:0] dividend_remainder; wire [WORD_WIDTH-1:0] divisor_remainder; wire [WORD_WIDTH-1:0] dividend_quotient; wire [WORD_WIDTH-1:0] divisor_quotient; Pipeline_Fork_Eager #( .WORD_WIDTH (TOTAL_INPUT_WIDTH), .OUTPUT_COUNT (UNIT_COUNT) ) write_to_units ( .clock (clock), .clear (clear), .input_valid (input_valid), .input_ready (input_ready), .input_data ({dividend, divisor}), .output_valid ({input_valid_remainder, input_valid_quotient}), .output_ready ({input_ready_remainder, input_ready_quotient}), .output_data ({dividend_remainder, divisor_remainder, dividend_quotient, divisor_quotient}) );
At each division step, the Remainder module signals to the Quotient module if the current division step is OK (meaning: the current addition/subtraction left a valid intermediate remainder), and so that the Quotient should be updated.
wire output_valid_remainder; wire output_ready_remainder; wire control_valid_remainder; wire control_ready_remainder; wire [WORD_WIDTH-1:0] remainder_internal; wire divide_by_zero_internal; wire step_ok_remainder; Remainder_Integer_Signed #( .WORD_WIDTH (WORD_WIDTH), .STEP_WORD_WIDTH (STEP_WORD_WIDTH) ) remainder_calc ( .clock (clock), .clear (clear), .input_valid (input_valid_remainder), .input_ready (input_ready_remainder), .dividend (dividend_remainder), .divisor (divisor_remainder), .output_valid (output_valid_remainder), .output_ready (output_ready_remainder), .remainder (remainder_internal), .divide_by_zero (divide_by_zero_internal), .control_valid (control_valid_remainder), .control_ready (control_ready_remainder), .step_ok (step_ok_remainder) );
Since the Remainder and Quotient modules can get physically large, we need to optionally buffer the control path between them.
wire control_valid_quotient; wire control_ready_quotient; wire step_ok_quotient; Skid_Buffer_Pipeline #( .WORD_WIDTH (1), .PIPE_DEPTH (PIPELINE_STAGES_SYNC) ) buffer_sync ( .clock (clock), .clear (clear), .input_valid (control_valid_remainder), .input_ready (control_ready_remainder), .input_data (step_ok_remainder), .output_valid (control_valid_quotient), .output_ready (control_ready_quotient), .output_data (step_ok_quotient) );
wire output_valid_quotient; wire output_ready_quotient; wire [WORD_WIDTH-1:0] quotient_internal; Quotient_Integer_Signed #( .WORD_WIDTH (WORD_WIDTH), .STEP_WORD_WIDTH (STEP_WORD_WIDTH) ) quotient_calc ( .clock (clock), .clear (clear), .input_valid (input_valid_quotient), .input_ready (input_ready_quotient), .dividend_sign (dividend_quotient [WORD_WIDTH-1]), .divisor_sign (divisor_quotient [WORD_WIDTH-1]), .output_valid (output_valid_quotient), .output_ready (output_ready_quotient), .quotient (quotient_internal), .control_valid (control_valid_quotient), .control_ready (control_ready_quotient), .step_ok (step_ok_quotient) );
Synchronizes and buffers the output handshakes of the Remainder and Quotient modules into the output handshake.
A dummy
signal is necessary since we have to use the same data width for
all handshakes, and the Quotient output is short one bit. The dummy wire,
and its associated logic, are not connected to any destination and so will
optimize away. (This may raise a warning in your CAD tools.)
// verilator lint_off UNUSED wire dummy; // verilator lint_on UNUSED Pipeline_Join #( .WORD_WIDTH (TOTAL_OUTPUT_WIDTH), .INPUT_COUNT (UNIT_COUNT) ) read_from_units ( .clock (clock), .clear (clear), .input_valid ({output_valid_remainder, output_valid_quotient}), .input_ready ({output_ready_remainder, output_ready_quotient}), .input_data ({remainder_internal, divide_by_zero_internal, quotient_internal, 1'b0}), .output_valid (output_valid), .output_ready (output_ready), .output_data ({remainder, divide_by_zero, quotient, dummy}) ); endmodule