Source

License

Index

Power Hammering Test

This circuit maximizes FPGA dynamic power consumption by toggling many long pipelines of registers at very high frequency, under duty cycle control of an active-high external input with internal pull-down, which is useful to test the behaviour of your power supplies and cooling solutions under heavy load.

WARNING: THIS CIRCUIT CAN PERMANENTLY DAMAGE OR DESTROY HARDWARE
YOU MUST READ AND UNDERSTAND THIS SOURCE FILE BEFORE USE.
YOU WILL NEED TO ADAPT IT TO YOUR SPECIFIC DEVICE AND CAD TOOL.

The duty cycle input modulates the amount of switching activity, and so allows run-time control of the power consumption, from idle to max. Adjust the depth and number of pipelines to control the max possible dynamic power usage. You must be careful of not exceeding (or equalling) the max number of LUTs or FFs on the FPGA, else the CAD tool synthesis will take forever, then fail.

This particular implementation is specific to AMD/Xilinx devices, but it is very easy to port to other FPGA device families.

Failure Modes and Failsafes

DO NOT OPERATE WITHOUT A FAILSAFE

Excessive power draw into an FPGA can damage/destroy the FPGA and/or its associated power supply. You must have at least one failsafe in place to prevent such a case. Some failure modes initially manifest as a drop in power consumption as the FPGA loses function. This may or may not indicate damage.

Possible Failure Modes

This is not an exhaustive list.

Possible Failsafes

This is not an exhaustive list. Use at least one, preferably two.

Build Issues

This circuit uses up almost all of the FPGA resources, and the CAD tool (Vivado, here), does not handle well being asked to synthesize logic larger than the FPGA can hold, resulting in endless build time, segfaults, internal errors, etc...

Thus the total number of toggling elements must not exceed, or even equal the number of LUTs or flip-flops on the FPGA. Each toggling element requires 1 LUT and 1 FF, depending on synthesis.

For example, for an xc7a100 (Artix-7 100), 60k entries (as PIPE_COUNT * PIPE_DEPTH) is near the maximum possible. It is also better to have many shallower pipelines: fewer and deeper pipelines (e.g.: 10k elements) will cause Vivado to exceed its stack space and segfault.

Parameters and Ports

`default_nettype none

module Power_Hammering
#(
    parameter PIPE_DEPTH_REG            = 500,
    parameter PIPE_COUNT_REG            = 200,
    parameter PIPE_COUNT_SRL            = 16000,
parameter PIPE_COUNT_FIFO           = 130,
    parameter EXTRA_DEPTH               = 2,            // Some extra CDC stages since we are close to the spec'ed switching limits
    parameter CLK_GCLK_IBUF_LOW_POWER   = "FALSE",      // Not aiming for low power here...
    parameter CLK_GCLK_IOSTANDARD       = "DEFAULT"
)
(
    input wire      clock,      // 100 MHz reference clock
    (* PULLTYPE = "PULLDOWN" *)
    input wire      duty_cycle  // control input (pull up to enable. **ADD EXTERNAL PULL-DOWN FOR SAFETY**)
);

High-Speed Clock Generation

Take in the 100 MHz reference clock and generate as fast a clock as we can.

    wire clock_buffered;

    IBUF #(
        .IBUF_LOW_PWR   (CLK_GCLK_IBUF_LOW_POWER),      // Low power (TRUE) vs. performance (FALSE) setting for referenced I/O standards
        .IOSTANDARD     (CLK_GCLK_IOSTANDARD)           // Specify the input I/O standard
    ) 
    clock_buffer 
    (
        .O              (clock_buffered),               // Buffer output
        .I              (clock)                         // Buffer input (connect directly to top-level port)
    );

    wire clock_distributed;

    BUFG 
    clock_distribution
    (
        .O              (clock_distributed), // 1-bit output: Clock output port
        .I              (clock_buffered)     // 1-bit input: Clock buffer input driven by an IBUF, MMCM or local interconnect
    );

DS181 specifies a clock buffer max of 628 MHz for an xc7a100-2.

However, 600 MHz will exceed the minimum pulse width requirement for SRL FIFOs, and 500 MHz will still have too much clock skew to meet timing. There is a lot of clock skew since we are distributing one signal to many different, non-neighbouring locations across the device.

So let's aim for a nice round 400 MHz via 100 MHz * 12 for 1200 MHz PLL VCO, then divide by 3 for 400 MHz.

    // Local feedback since we don't have to align to anything else

    wire clock_fb;

    wire clock_max_fast_internal;
    wire clock_max_fast;

    PLLE2_BASE #(
        .BANDWIDTH          ("OPTIMIZED"),  // OPTIMIZED, HIGH, LOW
        .CLKFBOUT_MULT      (12),           // Multiply value for all CLKOUT, (2-64)
        .CLKFBOUT_PHASE     (0.0),          // Phase offset in degrees of CLKFB, (-360.000-360.000).
        .CLKIN1_PERIOD      (10),           // Input clock period in ns to ps resolution (i.e. 33.333 is 30 MHz).
        // CLKOUT0_DIVIDE - CLKOUT5_DIVIDE: Divide amount for each CLKOUT (1-128)
        .CLKOUT0_DIVIDE     (3),
        .CLKOUT1_DIVIDE     (1),
        .CLKOUT2_DIVIDE     (1),
        .CLKOUT3_DIVIDE     (1),
        .CLKOUT4_DIVIDE     (1),
        .CLKOUT5_DIVIDE     (1),
        // CLKOUT0_DUTY_CYCLE - CLKOUT5_DUTY_CYCLE: Duty cycle for each CLKOUT (0.001-0.999).
        .CLKOUT0_DUTY_CYCLE (0.5),
        .CLKOUT1_DUTY_CYCLE (0.5),
        .CLKOUT2_DUTY_CYCLE (0.5),
        .CLKOUT3_DUTY_CYCLE (0.5),
        .CLKOUT4_DUTY_CYCLE (0.5),
        .CLKOUT5_DUTY_CYCLE (0.5),
        // CLKOUT0_PHASE - CLKOUT5_PHASE: Phase offset for each CLKOUT (-360.000-360.000).
        .CLKOUT0_PHASE      (0.0),
        .CLKOUT1_PHASE      (0.0),
        .CLKOUT2_PHASE      (0.0),
        .CLKOUT3_PHASE      (0.0),
        .CLKOUT4_PHASE      (0.0),
        .CLKOUT5_PHASE      (0.0),
        .DIVCLK_DIVIDE      (1),       // Master division value, (1-56)
        .REF_JITTER1        (0.0),      // Reference input jitter in UI, (0.000-0.999).
        .STARTUP_WAIT       ("TRUE" )   // Delay DONE until PLL Locks, ("TRUE"/"FALSE")
    )
    max_fast_clock_generator
    (
        // Clock Outputs: 1-bit (each) output: User configurable clock outputs
        .CLKOUT0            (clock_max_fast_internal),  // 1-bit output: CLKOUT0
        // verilator lint_off PINCONNECTEMPTY
        .CLKOUT1            (),         // 1-bit output: CLKOUT1
        .CLKOUT2            (),         // 1-bit output: CLKOUT2
        .CLKOUT3            (),         // 1-bit output: CLKOUT3
        .CLKOUT4            (),         // 1-bit output: CLKOUT4
        .CLKOUT5            (),         // 1-bit output: CLKOUT5
        // Feedback Clocks: 1-bit (each) output: Clock feedback ports
        .CLKFBOUT           (clock_fb),           // 1-bit output: Feedback clock
        .LOCKED             (),                     // 1-bit output: LOCK
        // verilator lint_on  PINCONNECTEMPTY
        .CLKIN1             (clock_distributed), // 1-bit input: Input clock
        // Control Ports: 1-bit (each) input: PLL control ports
        .PWRDWN             (1'b0),         // 1-bit input: Power-down
        .RST                (1'b0),         // 1-bit input: Reset
        // Feedback Clocks: 1-bit (each) input: Clock feedback ports
        .CLKFBIN            (clock_fb)    // 1-bit input: Feedback clock
    );

    BUFG
    clock_max_fast_buffer
    (
        .O(clock_max_fast),           // 1-bit output: Clock output
        .I(clock_max_fast_internal)   // 1-bit input: Clock input
    );

Duty Cycle Control

The duty cycle control input expects a variable duty cycle active-high logic input, with min pulse width greater than 2.5ns (1.5x the 600 MHz clock period), so that's up to a 400 MHz control signal at 50% duty cycle. In reality, you should not needs to go anywhere near that fast.

First, let's cross into the fast clock domain. We do CDC with possibly extra stages since we are running close to the limits of the silicon. The CDC registers will default to a low output (disabling downstream logic) after configuration so we don't get a short burst of activity before the state of the input control pin gets through.

    wire duty_cycle_synchronized;

    CDC_Bit_Synchronizer
    #(
        .EXTRA_DEPTH        (EXTRA_DEPTH)  // Must be 0 or greater
    )
    duty_cycle_input_sync
    (
        .receiving_clock    (clock_max_fast),
        .bit_in             (duty_cycle),
        .bit_out            (duty_cycle_synchronized)
    );

Bit Toggle

The duty cycle control enables a self-toggling register, which is the source of all the switching activity.

THIS ASSUMES AN INTERNAL/EXTERNAL PULL-DOWN ON THE INPUT PIN, SO THE CIRCUIT IS DISABLED BY DEFAULT.

    wire toggling_bit_first;

    Register_Toggle
    #(
        .WORD_WIDTH     (1),
        .RESET_VALUE    (1'b0)
    )
    switching_activity_source
    (
        .clock          (clock_max_fast),
        .clock_enable   (duty_cycle_synchronized),
        .clear          (1'b0),
        .toggle         (1'b1),
        .data_in        (1'b0),
        .data_out       (toggling_bit_first)
    );

Dynamic Switching Amplification

Here, a number of plain pipelines of registers propagates the toggling_bit over their whole length, amplifying the switching activity.

We do not modulate the pipeline with the duty_cycle_synchronized since distributing that signal all over the chip may limit how fast we can run. Also, it will let the pipeline fill and empty gradually, which sounds like a nice idea even if it may have no tangible effect.

    integer i; // index to chain registers/SRLs together

    (* DONT_TOUCH = "TRUE" *)
    reg [PIPE_COUNT_REG-1:0] toggling_bit_last_reg;

    generate
    genvar pipeline_count_reg;
        for (pipeline_count_reg = 0; pipeline_count_reg < PIPE_COUNT_REG; pipeline_count_reg = pipeline_count_reg + 1) begin: per_pipeline_reg

            (* DONT_TOUCH = "TRUE" *)
            reg [PIPE_DEPTH_REG-1:0] pipeline_reg;

            always @(posedge clock_max_fast) begin
                pipeline_reg [0] <= toggling_bit_first;
                for (i=1; i < PIPE_DEPTH_REG; i=i+1) begin
                    pipeline_reg [i] <= pipeline_reg [i-1];
                end
                toggling_bit_last_reg [pipeline_count_reg] <= pipeline_reg [PIPE_DEPTH_REG-1]; 
            end
        end
    endgenerate

    (* DONT_TOUCH = "TRUE" *)
    wire [PIPE_COUNT_SRL-1:0] toggling_bit_last_srl;

    generate
    genvar pipeline_count_srl;

        SRLC32E
        #(
            .INIT(32'h00000000) // Initial Value of Shift Register
        ) 
        pipeline_srl_first
        (
            .Q      (),                                             // SRL data output
            .Q31    (toggling_bit_last_srl [0]),                    // SRL cascade output pin
            .A      (5'b11111),                                     // 5-bit shift depth select input
            .CE     (1'b1),                                         // Clock enable input
            .CLK    (clock_max_fast),                               // Clock input
            .D      (toggling_bit_first)                            // SRL data input
        );

        for (pipeline_count_srl = 1; pipeline_count_srl < PIPE_COUNT_SRL; pipeline_count_srl = pipeline_count_srl + 1) begin: per_pipeline_srl
            SRLC32E
            #(
                .INIT(32'h00000000) // Initial Value of Shift Register
            ) 
            pipeline_srl
            (
                .Q      (),                                             // SRL data output
                .Q31    (toggling_bit_last_srl [pipeline_count_srl]),   // SRL cascade output pin
                .A      (5'b11111),                                     // 5-bit shift depth select input
                .CE     (1'b1),                                         // Clock enable input
                .CLK    (clock_max_fast),                               // Clock input
                .D      (toggling_bit_last_srl [pipeline_count_srl-1])  // SRL data input
            );
        end

    endgenerate
localparam PIPE_WIDTH_FIFO = 72; // Hardcoded at FIFO instantiation

(* DONT_TOUCH = "TRUE" *)
wire [PIPE_WIDTH_FIFO-1:0] toggling_bit_last_fifo [PIPE_COUNT_FIFO-1:0];
(* DONT_TOUCH = "TRUE" *)
wire [PIPE_COUNT_FIFO-1:0] fifo_empty;
(* DONT_TOUCH = "TRUE" *)
wire [PIPE_COUNT_FIFO-1+1:0] fifo_full;

assign fifo_full [PIPE_COUNT_FIFO] = 1'b0;

generate
genvar pipeline_count_fifo;

    FIFO36E1 #(
        .ALMOST_EMPTY_OFFSET      (13'h0000),                 // Sets the almost empty threshold
        .ALMOST_FULL_OFFSET       (13'h1fff),                 // Sets almost full threshold
        .DATA_WIDTH               (72),                       // Sets data width to 4-72
        .DO_REG                   (1),                        // Enable output register (1-0) Must be 1 if EN_SYN = FALSE
        .EN_ECC_READ              ("TRUE"),                   // Enable ECC decoder, FALSE, TRUE
        .EN_ECC_WRITE             ("TRUE"),                   // Enable ECC encoder, FALSE, TRUE
        .EN_SYN                   ("TRUE"),                   // Specifies FIFO as Asynchronous (FALSE) or Synchronous (TRUE)
        .FIFO_MODE                ("FIFO36_72"),              // Sets mode to "FIFO36" or "FIFO36_72" 
        .FIRST_WORD_FALL_THROUGH  ("FALSE"),                  // Sets the FIFO FWFT to FALSE, TRUE
        .INIT                     (72'h000000000000000000),   // Initial values on output port
        .SIM_DEVICE               ("7SERIES"),                // Must be set to "7SERIES" for simulation behavior
        .SRVAL                    (72'h000000000000000000)    // Set/Reset value for output port
    )
    pipeline_fifo_first (
        // ECC Signals: 1-bit (each) output: Error Correction Circuitry ports
        .DBITERR                  (),                         // 1-bit output: Double bit error status
        .ECCPARITY                (),                         // 8-bit output: Generated error correction parity
        .SBITERR                  (),                         // 1-bit output: Single bit error status
        // Read Data: 64-bit (each) output: Read output data
        .DO                       (toggling_bit_last_fifo [0] [63:0]),   // 64-bit output: Data output
        .DOP                      (toggling_bit_last_fifo [0] [71:64]),  // 8-bit output: Parity data output
        // Status: 1-bit (each) output: Flags and other FIFO status outputs
        .ALMOSTEMPTY              (),                         // 1-bit output: Almost empty flag
        .ALMOSTFULL               (),                         // 1-bit output: Almost full flag
        .EMPTY                    (fifo_empty [0]),           // 1-bit output: Empty flag
        .FULL                     (fifo_full  [0]),           // 1-bit output: Full flag
        .RDCOUNT                  (),                         // 13-bit output: Read count
        .RDERR                    (),                         // 1-bit output: Read error
        .WRCOUNT                  (),                         // 13-bit output: Write count
        .WRERR                    (),                         // 1-bit output: Write error
        // ECC Signals: 1-bit (each) input: Error Correction Circuitry ports
        .INJECTDBITERR            (1'b0),                     // 1-bit input: Inject a double bit error input
        .INJECTSBITERR            (1'b0),
        // Read Control Signals: 1-bit (each) input: Read clock, enable and reset input signals
        .RDCLK                    (clock_max_fast),           // 1-bit input: Read clock
        .RDEN                     (fifo_full [1] == 1'b0),    // 1-bit input: Read enable
        .REGCE                    (1'b1),                     // 1-bit input: Clock enable
        .RST                      (1'b0),                     // 1-bit input: Reset
        .RSTREG                   (1'b0),                     // 1-bit input: Output register set/reset
        // Write Control Signals: 1-bit (each) input: Write clock and enable input signals
        .WRCLK                    (clock_max_fast),           // 1-bit input: Rising edge write clock.
        .WREN                     (duty_cycle_synchronized),  // 1-bit input: Write enable
        // Write Data: 64-bit (each) input: Write input data
        .DI                       ({64{toggling_bit_first}}), // 64-bit input: Data input
        .DIP                      ({8{toggling_bit_first}})   // 8-bit input: Parity input
    );

    for (pipeline_count_fifo = 1; pipeline_count_fifo < PIPE_COUNT_FIFO; pipeline_count_fifo = pipeline_count_fifo + 1) begin: per_pipeline_fifo
        FIFO36E1 #(
            .ALMOST_EMPTY_OFFSET      (13'h0000),                 // Sets the almost empty threshold
            .ALMOST_FULL_OFFSET       (13'h1fff),                 // Sets almost full threshold
            .DATA_WIDTH               (72),                       // Sets data width to 4-72
            .DO_REG                   (1),                        // Enable output register (1-0) Must be 1 if EN_SYN = FALSE
            .EN_ECC_READ              ("TRUE"),                   // Enable ECC decoder, FALSE, TRUE
            .EN_ECC_WRITE             ("TRUE"),                   // Enable ECC encoder, FALSE, TRUE
            .EN_SYN                   ("TRUE"),                   // Specifies FIFO as Asynchronous (FALSE) or Synchronous (TRUE)
            .FIFO_MODE                ("FIFO36_72"),              // Sets mode to "FIFO36" or "FIFO36_72" 
            .FIRST_WORD_FALL_THROUGH  ("FALSE"),                  // Sets the FIFO FWFT to FALSE, TRUE
            .INIT                     (72'h000000000000000000),   // Initial values on output port
            .SIM_DEVICE               ("7SERIES"),                // Must be set to "7SERIES" for simulation behavior
            .SRVAL                    (72'h000000000000000000)    // Set/Reset value for output port
        )
        pipeline_fifo_first (
            // ECC Signals: 1-bit (each) output: Error Correction Circuitry ports
            .DBITERR                  (),                         // 1-bit output: Double bit error status
            .ECCPARITY                (),                         // 8-bit output: Generated error correction parity
            .SBITERR                  (),                         // 1-bit output: Single bit error status
            // Read Data: 64-bit (each) output: Read output data
            .DO                       (toggling_bit_last_fifo [pipeline_count_fifo] [63:0]),   // 64-bit output: Data output
            .DOP                      (toggling_bit_last_fifo [pipeline_count_fifo] [71:64]),  // 8-bit output: Parity data output
            // Status: 1-bit (each) output: Flags and other FIFO status outputs
            .ALMOSTEMPTY              (),                         // 1-bit output: Almost empty flag
            .ALMOSTFULL               (),                         // 1-bit output: Almost full flag
            .EMPTY                    (fifo_empty [pipeline_count_fifo]),           // 1-bit output: Empty flag
            .FULL                     (fifo_full  [pipeline_count_fifo]),           // 1-bit output: Full flag
            .RDCOUNT                  (),                         // 13-bit output: Read count
            .RDERR                    (),                         // 1-bit output: Read error
            .WRCOUNT                  (),                         // 13-bit output: Write count
            .WRERR                    (),                         // 1-bit output: Write error
            // ECC Signals: 1-bit (each) input: Error Correction Circuitry ports
            .INJECTDBITERR            (1'b0),                     // 1-bit input: Inject a double bit error input
            .INJECTSBITERR            (1'b0),
            // Read Control Signals: 1-bit (each) input: Read clock, enable and reset input signals
            .RDCLK                    (clock_max_fast),           // 1-bit input: Read clock
            .RDEN                     (fifo_full [pipeline_count_fifo+1] == 1'b0),    // 1-bit input: Read enable
            .REGCE                    (1'b1),                     // 1-bit input: Clock enable
            .RST                      (1'b0),                     // 1-bit input: Reset
            .RSTREG                   (1'b0),                     // 1-bit input: Output register set/reset
            // Write Control Signals: 1-bit (each) input: Write clock and enable input signals
            .WRCLK                    (clock_max_fast),           // 1-bit input: Rising edge write clock.
            .WREN                     (fifo_empty [pipeline_count_fifo-1] == 1'b0),  // 1-bit input: Write enable
            // Write Data: 64-bit (each) input: Write input data
            .DI                       (toggling_bit_last_fifo [pipeline_count_fifo-1] [63:0]), // 64-bit input: Data input
            .DIP                      (toggling_bit_last_fifo [pipeline_count_fifo-1] [71:64])   // 8-bit input: Parity input
        );
    end

endgenerate
endmodule

Back to FPGA Design Elements

fpgacpu.ca