Verilog Coding Standard

from FPGA Resources by GateForge Consulting Ltd.

This is a continually evolving document. Please send any feedback to eric@fpgacpu.ca or @elaforest.

Over time, I've found that most of the difficulty with Verilog is a problem with programming practice which goes away with a certain coding style, which I describe here. It's not perfect, but it works and makes Verilog programming easier, more pleasant, and more reliable.

It matters to have a Verilog coding standard, because Verilog doesn't matter. We want to quickly get past the nuts-and-bolts of Verilog itself so we can focus on the system we are trying to build. I describe the related system design practice in the companion System Design Standard.

Scope

This standard was written with FPGAs in mind so some details may not apply, or may be missing, in relation to ASICs. Nonetheless, the standard should still largely be applicable to ASIC design.

This standard mainly defines a restricted form of Verilog which uses a limited number of programming idioms. These restrictions reduce bugs, make the code easier to understand, and aim to give consistent synthesis results across CAD tools, whether the code is manually written or machine-generated.

This standard does not define any typesetting conventions such as brace style, indentation, variable naming, comment format, etc... You will see code examples typeset in a style I recommend, but it is not essential. Like regular written language prose, anything that helps readability is good.

Nor does this standard define any testing, simulation, verification, or validation methodologies (though I hope you have some!). However, it will address some pitfalls and low-level details.

Finally, this standard assumes you are already familiar with Verilog and digital design in general. It is not a language or tool reference.

Verilog Language Versions

Use Verilog-2001, specifically its synthesizable subset, as it's well supported across CAD tools. Anything not synthesizable is for simulation/verification and outside the scope of this standard.

The previous version, Verilog-1995, lacks several useful features (named port connections, vector part select, generate blocks, etc...) without which your code will get longer and more difficult. Avoid mixing Verilog-1995 forms into Verilog-2001 code, despite the backwards compatibility.

The following version, SystemVerilog, brings in too many features and thus has uneven support across CAD tools. It does have desirable features (interfaces, parameterizable port instances, structs, enums, etc...), but until support matures and a similarly restricted, synthesizable subset can be agreed upon, it should be avoided or used for simulation/verification (which it is well suited for!).

Definitions and Inclusions

Use `include directives only to bring in common function definitions which, due to a limitation of Verilog-2001, must be defined inside the body of a module. A common example is an integer logarithm function to calculate address width. Place the function in a separate file and include it at the start of the body of modules which need it.

Use `define directives only for values which remain valid and constant everywhere and always in the design, don't depend on any other less-constant values, and can't be passed as module parameters (e.g.: opcode encodings for ALU functional units). Place these definitions into a separate file and `include it at the beginning of files which need them. Otherwise, construct constants within a module using localparam.

Basic Types and Values

All variables, modules, constants, etc... must be defined before they are used in a file, going from top to bottom. If you have a wire connecting the output of a later module to the input of an earlier module, define the wire first, before the earliest module using it.

Defaults

Use `default_nettype none at the start of each file, before module definition. This causes any undefined variable to be an error. Otherwise, accidentally undefined variables become 1-bit wires, which may cause subtle bugs during synthesis.

For code you cannot control which assumes a default net type (e.g. vendor-provided models and IP blocks), wrap the code as follows inside the provided source file:

`default_nettype wire

...

`default_nettype none

Initialization and Logic Values

Use only reg and wire types. All signals are registers except where a wire is mandatory: module input ports, connecting module ports together, and inferring tri-state I/O.

Only use 0,1,X,Z values, and avoid X and Z if at all possible. All registers must be initialized (and reset, if necessary) with a value which does not contain X or Z, either at definition or using an initial block. Do not assign X to registers or wires, as X propagation will make testing difficult, and isn't supported in 2-state simulators (e.g.: Verilator). Instead, have a default catch-all or fall-through logic expression which takes effect when all others don't. The same idea applies to Boolean tests. Z is only used to define tri-state I/O.

Parameterization

Wherever possible, parameterize the width of your variables rather than using constants. It's more work initially, but makes modules more general and avoids tedious and error-prone edits when something inevitably changes. If the width can be set by the user, then use a module parameter. If it's an internal module constant, use a localparam. Very, very rarely, use a `define constant. Never use a hardcoded literal integer: name it with a localparam instead.

Unfortunately, in Verilog-2001, module parameters and localparams are not valid width specifiers for literal values, only `define constants and literal integers are. This makes initializing registers of parameterized width with values of the exact same width impossible, and it's unclear if the implicit width-extension for values like 'b0 or 'd1 work correctly past 32 or 64 bits width, depending on the CAD tool. As a workaround, use concatenation and replication to create the value you need of the proper bit width.

// Unclear if extended with zeros or Xs past 32 or 64 bits
localparam WORD_WIDTH = 72;
reg [WORD_WIDTH-1:0] foo = 'b0;

// Not legal
localparam WORD_WIDTH = 72;
reg [WORD_WIDTH-1:0] foo = WORD_WIDTH'b0;

// Legal workarounds
localparam WORD_WIDTH     = 72;
localparam WORD_ZERO      = {WORD_WIDTH{1'b0}};
localparam WORD_ONE       = {{WORD_WIDTH-1{1'b0}},1'b1};
localparam WORD_MINUS_ONE = ~WORD_ZERO;

reg [WORD_WIDTH-1:0] foo  = WORD_ZERO;

// Also legal to define bit width
localparam [WORD_WIDTH-1:0] WORD_CONSTANT = `SOME_INTEGER_CONSTANT;

Bit Widths

Always match bit widths of variable assignments. Even with correct implicit zero/sign-extension, if the source and sink have different width, it will raise a pointless warning in the CAD tool, obscuring other more important warnings. And if the extension is incorrect, it will cause subtle bugs. As before, concatenation and replication can avoid this problem, and then any bit width mismatch warnings become significant.

// Common

localparam SOURCE_WIDTH = 32;
localparam SINK_WIDTH   = 16;
localparam SOURCE_ZERO  = {SOURCE_WIDTH{1'b0}};
localparam SINK_ZERO    = {SINK_WIDTH{1'b0}};

reg [SOURCE_WIDTH-1:0] source = SOURCE_ZERO; 
reg [SINK_WIDTH-1:0]   sink   = SINK_ZERO;

// Zero extension

always @(*) begin
    sink <= {SOURCE_ZERO, source};
end

// Signed extension

reg source_sign = 1'b0;

// Not a localparam since it's not a constant value
always @(*) begin
    source_sign = source[SOURCE_WIDTH-1];
    sink        = {{SOURCE_WIDTH{source_sign}}, source};
end

Concatenations

Use concatenations to pack/unpack values instead of selecting ranges of bits, as possible. Concatenating ranges of bits also works. This idiom is very useful to meaningfully extract fields from raw data words (and to concisely document the field format), to group related signals into a single wide pipeline stage register, or to do fixed permutations of bits.

localparam A_WIDTH      = 1;
localparam B_WIDTH      = 2;
localparam C_WIDTH      = 3;
localparam PACK_WIDTH   = A_WIDTH + B_WIDTH + C_WIDTH;

reg [A_WIDTH-1:0]       foo;
reg [B_WIDTH-1:0]       bar;
reg [C_WIDTH-1:0]       baz;
reg [PACK_WIDTH-1:0]    all;

// Pack fields into a word
always @(*) begin
    all = {foo,bar,baz};
end

// Unpack the fields
always @(*) begin
    {foo,bar,baz} = all;
end

Module Definition

Parameters

When defining a module, give all parameters a default value of zero or an empty string, to both tell the user what kind of value is expected, and also make module elaboration fail if any of the parameters are not set at module instantiation.

Use module parameters to define the bit widths of the module ports, as localparams can only be defined in a module body, which is too late. If a width is globally constant, or calculated from other module parameters and not directly provided by the user, then use another module parameter to compute and hold the value, and place a comment noting that this parameter is not to be used at module instantiation. This is commonly done when a parameterized number of items are passed to a module via concatenation, since Verilog-2001 ports are always present (no conditional instantiation) and simple bit vectors (no arrays of items).

Ports

Always define the direction (input, output, inout), type (wire or reg), and name of each module port to reduce boilerplate code, hint at the structure of the module to help the user, and to avoid some synthesis and simulation problems:

If you have any register output ports, use an initial block immediately after the module port definitions to initialize them to their startup value, as ports cannot be initialized at definition like register variables.

`default_nettype none

module Foo
#(
    parameter INPUT_WIDTH = 0,
    parameter INPUT_COUNT = 0,

    // Do not set at instantiation
    parameter TOTAL_WIDTH = INPUT_WIDTH * INPUT_COUNT,
    parameter CONST_WIDTH = `SOME_GLOBAL_WIDTH
)
(
    input  wire                     clock,
    input  wire [TOTAL_WIDTH-1:0]   this_input,
    output wire [CONST_WIDTH-1:0]   that_output,
    output reg                      another_output,
    ...
);

    initial begin
        another_output = 1'b0;
    end

...

endmodule

When a module is moderately complicated and the clearest name of an internal signal does not match the module port name it connects to, decouple the names as follows, assuming you already haven't implicitly done so via a final pipeline register stage.

// Port must be a reg type instead of wire, which denotes a special case to the reader,
// and must be initial'ized for proper simulation.
always @(*) begin
    port_name <= internal_name;
end

Use port name suffixes to denote if they are inputs or outputs only when there is unavoidable ambiguity (usually for very simple modules with data_in and data_out ports or similar). See a note by @tom_verbeure on why _o and _i port name suffixes can cause problems.

Procedural Blocks and Assignments

Assign Statements

Avoid the use of assign statements, except where demanded by Verilog, as it favours poor coding style. I have seen assigns used as global variables all over a module, and all too often, as an amorphous large number of assigns to define logic followed by a single always block to register all the outputs. These coding styles make it hard to see the underlying design, and make the implementation hard to control for size and performance. There are exactly two cases where you must use assign:

Always Blocks

For synchronous designs, there are only two events normally needed to trigger an always block: @(posedge clock) (clocked) and @(*) (combinational). The only use of @(negedge clock) is for capturing incoming I/O data which is sent out on the posedge of the I/O clock, so as to sample the data close the center (in the absence of programmable delay lines on the I/O). The one exception where posedge and negedge logic may be (carefully!) mixed is in Clock Domain Crossing (CDC) circuits.

Blocking and Non-Blocking Assignments

By default, all assignments in any always block should be non-blocking (<=). Contrary to common wisdom, and CAD tool warnings, this is always legal. Non-blocking assignments makes the parallelism of the assignments easier to think about, and gives blocking assignments (=) a specific meaning: to break-down complex logic into simpler expressions we can then think about sequentially.

Blocking assignments should only be used in combinational blocks. Although it's legal when used with clocked logic, the CAD tool considers the destination of each blocking assignment as an unused register if the value is later used in the same always block (since the value is used before the next clock edge, thus not from the output of the inferred register), which then raises a useless warning. Thus, while simple logic can be computed and assigned in a single line in a clocked always block, it's often easier to implement more complex logic in a combinational block using blocking assignments and then register the results in an immediately following clocked always block using non-blocking assignments.

Given the above design rules, it's easy to selectively pipeline logic by having the second always block be clocked or not, without altering the logic or the layout of the code. It also gives an easy way to estimate how much logic will be placed between registers, and thus get an early grasp on critical paths.

Do not mix blocking and non-blocking assignments within an always block. It's legal, but not necessary and error-prone.

So we do end up with the same guidelines as common wisdom, but now their origin have a reason.

// Breaking down nested ternary operators into two simpler lines,
// with unrelated and parallel logic alongside. 

always @(*) begin
    part_one        = (cond1 == 1'b1) ? foo : bar;
    part_two        = (cond2 == 1'b0) ? baz : part_one;
    other_result    = wibble1 ^ wibble2;
end

// If we don't want to register these values,
// simply change the block trigger to @(*).

always @(posedge clock) begin
    part_two_reg        <= part_two;
    other_result_reg    <= other_result;
    another_result      <= blob1 & blob2;
end

Logic Design

Boolean Expressions

Express Boolean values behaviourally as equality/inequality tests against the expected value, which clarifies the intent of the code, removes the need to understand the polarity of each logic signal, and makes the bit width explicit. If you must invert a comparison, be sure to use the logical negation operator (!), which always returns 1 bit, rather than the bitwise negation (~) of all bits.

// Rather than this
always @(*) begin
    C <= A & ~B;
end

// Do this
always @(*) begin
    C <= (A == 1'b1) && (B == 1'b0);
end

if/else vs. ternary operators

Unless otherwise necessary, use ternary operators (?:) instead of if/else statements. There are four main reasons:

On the other hand, if/else is necessary to conditionally instantiate logic in generate blocks, required by some vendor code templates to infer specific hardware, and unavoidable for some reset code.

Never nest ternary operators, where one term of a ternary operator is itself a ternary operator. That make for unreadable code. Instead, split the logic into two blocking assignments, with the second ternary operator using the output of the first as one of its terms. This style gives you useful intermediate value signals during simulation, extends to an arbitrary number of expressions which we can easily reason about in sequence, and becomes a useful programming pattern for FSMs and other complex logic.

// Rather than this...
always @(*) begin
    result <= (foo == 1'b1) ? ((bar == 1'b1) ? A : B) : C;
end

// ...do this!
always @(*) begin
    partial = (bar == 1'b1) ? A       : B;
    result  = (foo == 1'b1) ? partial : C;
end

Estimating Logic Usage and Speed

When desiging logic, keep track of how many unique inputs are needed to generate the output, and match that to the target FPGA Look-Up Tables (LUTs). For example, if the FPGA has 6-input LUTs (6-LUTs), then any logic expression of up to 6 terms can (and usually will) map to a single 6-LUT per bit of output width. If you construct your logic as series of expressions of 6 or fewer terms with registers in between, then you minimize the logic and interconnect delay, and give the CAD tool more freedom to place and route.

Keeping the number of unique input terms in mind particularly applies to multiplexers. A 4:1 mux has 6 inputs terms (4 input bits and 2 selector bits) and so maps exactly to one 6-LUT per result bit, and can be registered "for free". If you want to maximize speed, be wary of multiplexers wider than 8:1, and avoid designing logic as a single large selection from many options: better to pipeline a sequence of smaller selections.

I'm glossing over fracturable LUTs and logic packing here, but those are things we can usually take for granted from the CAD tool.

State Machines

Separate your FSM from your data processing, which also enables you to break a larger FSM into smaller ones. In other words, don't place the state and data processing logic together into a single large case statement, with one case per FSM state, and nested if/else statements inside each case.

Instead, take advantage of the sequential ordering of blocking assignments and of ternary operators, using the idiom of starting with a register and sequentially testing and passing along its updated value in a combinational always block, and registering the updated value in an immediately following clocked always block. If none of the conditions are met, the register remains unchanged.

To illustrate this idiom, here is a contrived (untested!) example of a controller which takes a start signal, then enables two external modules and waits for them to be done. When they are done, the controller deasserts the enables, raises a reset signal to the external modules, and waits for them to respond by deasserting their done signals. The controller then waits for another start signal. If both of the external modules fail to reset, the controller raises an error and stays put. Each of the external modules would contain a similar FSM module driving a data processing module, and not have to concern itself with the sequencing and error detection this controller module does.

module Controller
// No parameters
(
    input   wire    clock,
    input   wire    reset,

    input   wire    start,
    input   wire    part_1_done,
    input   wire    part_2_done,
    output  reg     enable_out,
    output  reg     reset_out,
    output  reg     error_out
);

// --------------------------------------------------------------------

    initial begin
        enable_out <= 1'b0;
        reset_out  <= 1'b0;
        error_out  <= 1'b0;
    end

// --------------------------------------------------------------------
// Let's define our states. Binary encoding.

    localparam STATE_WIDTH = 2;

    localparam [STATE_WIDTH-1:0] STATE_START  = 'd0;
    localparam [STATE_WIDTH-1:0] STATE_DOING  = 'd1;
    localparam [STATE_WIDTH-1:0] STATE_DONE   = 'd2;
    localparam [STATE_WIDTH-1:0] STATE_ERROR  = 'd3;

    reg [STATE_WIDTH-1:0] state         = STATE_START;
    reg [STATE_WIDTH-1:0] state_next    = STATE_START;

// --------------------------------------------------------------------
// Calculate the control signals

    reg transaction_complete = 1'b0;
    reg enable_next          = 1'b0;
    reg reset_next           = 1'b0;
    reg error_next           = 1'b0;

    always @(*) begin
        transaction_complete = (part_1_done == 1'b1)          && (part_2_done == 1'b1);
        enable_next          = (transaction_complete == 1'b0) && (state == STATE_DOING);
        reset_next           = (transaction_complete == 1'b1) && (state == STATE_DONE);
        error_next           =                                   (state == STATE_ERROR);
    end

    always @(posedge clock) begin
        enable_out <= (reset == 1'b1) ? 1'b0 : enable_next;
        reset_out  <= (reset == 1'b1) ? 1'b0 : reset_next;
        error_out  <= (reset == 1'b1) ? 1'b0 : error_next;
    end

// --------------------------------------------------------------------
// Do our state transitions. Listed in sequence.

    always @(*) begin
        state_next = (state == STATE_START) && (start == 1'b1)                ? STATE_DOING : state;
        state_next = (state == STATE_DOING) && (transaction_complete == 1'b1) ? STATE_DONE  : state_next;
        state_next = (state == STATE_DONE)  && (transaction_complete == 1'b0) ? STATE_START : state_next;
        state_next = (state == STATE_START) && (transaction_complete == 1'b1) ? STATE_ERROR : state_next;
    end 

    always @(posedge clock) begin
        state <= (reset == 1'b1) ? STATE_START : state_next;
    end

endmodule

Resets

(Resets are one place where FPGA and ASIC design practices diverge.)

Make use of the implicit power-on-reset in FPGAs, limit the number of things which need an explicit reset signal, and keep that reset synchronous to the clock. This means passing an external reset through a CDC synchronizer.

Asynchronous resets are an exception, when a clock isn't available (e.g.: PLL reset).

The configuration bitstream of an FPGA includes the initial state of all registers and (most) on-chip memories, so most logic does not require a reset signal to properly start operating. Set the initial state of registers by assigning it at declaration, or assigning it inside an initial block for module register output ports. Most on-chip memories can be similarly initialized with a known content via a $readmemh() directive, or some code in an initial block. See Ken Chapman's Get Smart About Reset: Think Local, Not Global for details.

The common idiom for resets uses an if/else statement, where each register must be assigned a value in both clauses (else a latch may be synthesized), which means all registers must be reset, maximizing the size of the reset tree, which uses more routing and makes timing harder to meet.

// Common idiom, which has a problem...
always @(posedge clock) begin
    if (reset == 1'b1) begin
        foo <= FOO_RESET;
        bar <= BAR_RESET; // but does not need a reset!
    end
    else begin
        foo <= foo_next;
        bar <= bar_next;
    end
end

If you must reset some registers, use one of the following idioms instead to minimize the size of the reset tree:

// Ternary operator assignment for reset
always @(posedge clock) begin
    foo <= (reset == 1'b1) ? FOO_RESET : foo_next;
    bar <= bar_next
end

// Using "last assignment wins" semantics for reset
always @(posedge clock) begin
    foo <= foo_next;
    bar <= bar_next;

    if (reset == 1'b1) begin
        foo <= FOO_RESET;
    end
end

Credit for the "last assignment wins" idiom goes to Olof Kindgren (@olofkindgren): Resetting reset handling. This idiom also applies to VHDL.

Alse see this twitter thread which discusses a subtlety of "last assignment wins" which explains a subtlety of non-blocking assignments and ternary operators. Credit to Clifford Wolf (@oe1cxw).

Reset Sequencing

Various parts of a design may have to stay in reset for a minimum amount of time to properly initialize (e.g.: 200us for DDR2 RAM), and have to come out of reset in a certain order to avoid receiving undefined signals. You can sequence the resets by creating delayed copies of the power-on-reset with a few counters driven by the main clock, with the count values set as top-level design module parameters.

Some Corner Cases

Simulated Clock Generation

In simulation, a race condition can exist at time zero between the initial value assignment of a register and the first clock edge. For example:

reg clock = 1'b0; // Counts as a negedge at time zero! (X -> 0)
reg foo   = 1'b0; // Also does X -> 0 at time zero.

// Simulate the clock
always begin
    #`HALF_PERIOD clock <= ~clock;
end

// It is unclear if the clock edge or the "foo" initialization will happen first,
// so "bar" can get X for one simulation cycle...
always @(negedge clock) begin
    bar <= foo;
end

This race condition is another reason to only use @(posedge clock) in internal logic, but the same race condition will happen if the simulation clock happens to be initialized to 1'b1.

Instead, the following clock generation idiom (credit: Clifford Wolf (@oe1cxw)) avoids the race condition by making use of undefined values and the identity operator (===) instead of the equality (==) operator:

// NOTE: clock is left uninitialized, and thus X in most simulators, and will
// not trigger a (X -> 0) edge until after the simulated clock half-period delay.
always begin
    #`HALF_PERIOD clock <= (clock === 1'b0);
end

fpgacpu.ca