Computes one of the 256 possible three-variable (triadic) Boolean operations,
selectable at run-time, on the `A`

, `B`

, and `C`

input words, with optional
dual output.

We could implement a triadic Boolean operator in hardware the same way as we did the dyadic Boolean operator, but with an 8:1 multiplexer instead of a 4:1 multiplexer. However, unlike a 4:1 mux, an 8:1 mux doesn't fit into a 6-LUT on a modern FPGA, which distances our design from the underlying FPGA hardware: we might not be able to pipeline it as required for best operating speed, we are more at the mercy of the register retiming and logic packing of the CAD tool, and as we'll see, we lock ourselves out of a useful break in abstraction which gives a triadic operator special uses.

Instead, we can decompose a triadic Boolean function `f(A,B,C)`

into two
dyadic sub-functions based on the two possible values of `C`

: ```
g(A,B)
= f(A,B,0)
```

and `h(A,B) = f(A,B,1)`

. We can then reconstruct the original
triadic function like so: `f(A,B,C) = (g(A,B) & ~C) | (h(A,B) & C)`

: the
value of `C`

selects one of the two dyadic Boolean functions of `A`

and
`B`

. Note that `C`

selects bit-wise, not word-wise: each bit of `C`

selects the corresponding bit of `g(A,B)`

or `h(A,B)`

. This useful
mathematical identity is known as "Shannon Decomposition" or "Boole's
Expansion Theorem".

We can use a triadic operator to implement very diverse functions:

- Majority:
`AB | AC | BC`

, which is used in Triple Modular Redundancy (TMR). - Minority, which tells you if a TMR unit had a corrected error.
- Given
`A`

,`B`

, and their sum`A+B`

, you can recover the carries into each bit position like so:`A ^ B ^ (A+B)`

. This can allow you to find overflows in packed sub-word parallel arithmetic (e.g.: adding two 32-bit words as two vectors of 4 bytes) - Bitfield masking and swapping between
`A`

and`B`

, as selected by the bitmask`C`

.

When used to implement Boolean functions with 4 or more variables
(tetradic or n-adic), triadic functions approximately halve the number of steps
required compared to using ordinary dyadic functions. This is likely one
reason the NVIDIA Maxwell GPUs implemented triadic Boolean functions with
the `LOP3`

instruction, and Intel CPUs with AVX-512 support with the
`VPTERNLOG`

instruction.

However, if you look at the implementation of `f(A,B,C)`

above, we always
throw away half of the total work done by both dyadic halves `g(A,B)`

and
`h(A,B)`

. Discarded work is a clue that we are missing out on some
computational capacity or efficiency. So let's provide a way to output that
discarded work. We can add a second output multiplexer, driven by the same
functions `g(A,B)`

and `h(A,B)`

, but controlled by an inverted version of
`C`

, called `D`

, which outputs the bits not selected by `C`

. And, instead
of hardcoding `D`

as the inverse of `C`

, we can control it with a 1-bit
`dual`

signal. If `dual`

is not set, then `C`

and `D`

are the same, and
both triadic outputs are identical.

With two outputs, our triadic Boolean operator can now do more. When `dual`

is set:

- We can compute 2 independent dyadic Boolean functions.
- We can set
`g(A,B)`

and`h(A,B)`

to always output`A`

or`B`

, and depending on`C`

, send those outputs either straight through or crossed-over. This acts as a Banyan Switch, which is the building block of a lot of switching networks.

I have not gone through the trouble of defining all 256 triadic Boolean
operations, but given the definitions of the 16 possible dyadic Boolean
operations, you can easily create
triadic definitions as needed: write out the 8-entry truth table for your
desired triadic function, split the table into two 4-entry truth tables
with the most-significant input bit (`C`

, as above) always 1 in one table,
and always 0 in the other, then use those two dyadic definitions as
functions `g(A,B)`

and `h(A,B)`

as explained above.

You can, of course, repeat this process to implement tetradic or n-adic
functions, but the *size* of the truth tables grows exponentially
(2^{n}), and the *number* of truth tables grows super-exponentially
(65,536 possible tetradic functions, or 2^{2n} for
n-adic functions), so there are increasingly fewer functions of general
interest in that space, with some exceptions such as bit
reductions and word reductions.

`include "Dyadic_Boolean_Operations.vh" `default_nettype none module Triadic_Boolean_Operator #( parameter WORD_WIDTH = 0 ) ( input wire [`DYADIC_TRUTH_TABLE_WIDTH-1:0] dyadic_truth_table_1, input wire [`DYADIC_TRUTH_TABLE_WIDTH-1:0] dyadic_truth_table_2, input wire [WORD_WIDTH-1:0] word_A, input wire [WORD_WIDTH-1:0] word_B, input wire [WORD_WIDTH-1:0] word_C, input wire dual, output wire [WORD_WIDTH-1:0] result_1, output wire [WORD_WIDTH-1:0] result_2 ); localparam WORD_ZERO = {WORD_WIDTH{1'b0}};

First dyadic Boolean operation: `g(A,B)`

wire [WORD_WIDTH-1:0] g; Dyadic_Boolean_Operator #( .WORD_WIDTH (WORD_WIDTH) ) first ( .truth_table (dyadic_truth_table_1), .word_A (word_A), .word_B (word_B), .result (g) );

Second dyadic Boolean operation: `h(A,B)`

wire [WORD_WIDTH-1:0] h; Dyadic_Boolean_Operator #( .WORD_WIDTH (WORD_WIDTH) ) second ( .truth_table (dyadic_truth_table_2), .word_A (word_A), .word_B (word_B), .result (h) );

Now we select each bit from either `g(A,B)`

or `h(A,B)`

, giving us the
first result.

Multiplexer_Bitwise_2to1 #( .WORD_WIDTH (WORD_WIDTH) ) select_1 ( .bitmask (word_C), .word_in_0 (g), .word_in_1 (h), .word_out (result_1) );

Then, conditionally invert `word_C`

if `dual`

is set.

reg [WORD_WIDTH-1:0] word_D = WORD_ZERO; always @(*) begin word_D = (dual == 1'b1) ? ~word_C : word_C; end

And select again for the second result, but selected by `word_D`

, giving us
all the bits *not* selected by the previous multiplexer if `dual`

is set,
else the same bits.

Multiplexer_Bitwise_2to1 #( .WORD_WIDTH (WORD_WIDTH) ) select_2 ( .bitmask (word_D), .word_in_0 (g), .word_in_1 (h), .word_out (result_2) ); endmodule