Extracting INT8 Multipliers from INT18 Multipliers Bogdan Pasca, - PowerPoint PPT Presentation

Extracting INT8 Multipliers from INT18 Multipliers Bogdan Pasca, Martin Langhammer, Gregg Baeckler, Sergey Gribok Intel Corporation

Context • Machine learning → increase density of small-precision arithmetic • INT8 - commonly used for inferencing • INT8-based block FP can also be used for training 1 High Density and Performance Multiplication for FPGA - Martin Langhammer, Gregg Baeckler - ARITH25 (2018) Intel Corporation 2 INTEL PUBLIC September 9, 2019

Context • Machine learning → increase density of small-precision arithmetic • INT8 - commonly used for inferencing • INT8-based block FP can also be used for training • Logic-based multiplier for Intel FPGAs investigated in 1 1 High Density and Performance Multiplication for FPGA - Martin Langhammer, Gregg Baeckler - ARITH25 (2018) Intel Corporation 2 INTEL PUBLIC September 9, 2019

Context • Machine learning → increase density of small-precision arithmetic • INT8 - commonly used for inferencing • INT8-based block FP can also be used for training • Logic-based multiplier for Intel FPGAs investigated in 1 This work Extracting INT8 multipliers from commonly available INT18 multipliers 1 High Density and Performance Multiplication for FPGA - Martin Langhammer, Gregg Baeckler - ARITH25 (2018) Intel Corporation 2 INTEL PUBLIC September 9, 2019

General Idea - partial product separation Bit weight 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 b5 b4 b3 b2 b1 b0 0 0 0 0 0 0 c5 c4 c3 c2 c1 c0 P Q 0 0 0 0 0 0 0 0 0 0 0 0 a5 a4 a3 a2 a1 a0 y11 y10 y9 y8 y7 y6 y5 y4 y3 y2 y1 y0 z11 z10 z9 z8 z7 z6 z5 z4 z3 z2 z1 z0 O=PxQ o25 o24 o23 o22 o21 o20 o19 o18 o17 o16 o15 o14 o13 o12 o11 o10 o9 o8 o7 o6 o5 o4 o3 o2 o1 o0 Intel Corporation 3 INTEL PUBLIC September 9, 2019

General Idea - partial product separation Bit weight 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 b5 b4 b3 b2 b1 b0 0 0 0 0 0 0 c5 c4 c3 c2 c1 c0 P Q 0 0 0 0 0 0 0 0 0 0 0 0 a5 a4 a3 a2 a1 a0 y11 y10 y9 y8 y7 y6 y5 y4 y3 y2 y1 y0 z11 z10 z9 z8 z7 z6 z5 z4 z3 z2 z1 z0 O=PxQ o25 o24 o23 o22 o21 o20 o19 o18 o17 o16 o15 o14 o13 o12 o11 o10 o9 o8 o7 o6 o5 o4 o3 o2 o1 o0 What happens for inputs beyond 6 bits? Intel Corporation 3 INTEL PUBLIC September 9, 2019

Unsigned Int8, shared input • compute Y = A · C and Z = A · B using an 18x18 multiplier • A , B and C 8-bit unsigned numbers • the 18x18 multiplier is configured as an unsigned multiplier Intel Corporation 4 INTEL PUBLIC September 9, 2019

Unsigned Int8, shared input • compute Y = A · C and Z = A · B using an 18x18 multiplier • A , B and C 8-bit unsigned numbers • the 18x18 multiplier is configured as an unsigned multiplier • map A , B and C to the Int18 inputs: Bit weight 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 P Q Intel Corporation 4 INTEL PUBLIC September 9, 2019

Unsigned Int8, shared input • compute Y = A · C and Z = A · B using an 18x18 multiplier • A , B and C 8-bit unsigned numbers • the 18x18 multiplier is configured as an unsigned multiplier • map A , B and C to the Int18 inputs: Bit weight 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 b7 b6 b5 b4 b3 b2 b1 b0 0 0 c7 c6 c5 c4 c3 c2 c1 c0 P Q 0 0 0 0 0 0 0 0 0 0 a7 a6 a5 a4 a3 a2 a1 a0 Intel Corporation 4 INTEL PUBLIC September 9, 2019

Unsigned Int8, shared input • compute Y = A · C and Z = A · B using an 18x18 multiplier • A , B and C 8-bit unsigned numbers • the 18x18 multiplier is configured as an unsigned multiplier • map A , B and C to the Int18 inputs: Bit weight 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 b7 b6 b5 b4 b3 b2 b1 b0 0 0 c7 c6 c5 c4 c3 c2 c1 c0 P Q 0 0 0 0 0 0 0 0 0 0 a7 a6 a5 a4 a3 a2 a1 a0 O=PxQ o25 o24 o23 o22 o21 o20 o19 o18 o17 o16 o15 o14 o13 o12 o11 o10 o9 o8 o7 o6 o5 o4 o3 o2 o1 o0 Intel Corporation 4 INTEL PUBLIC September 9, 2019

Unsigned Int8, shared input • compute Y = A · C and Z = A · B using an 18x18 multiplier • A , B and C 8-bit unsigned numbers • the 18x18 multiplier is configured as an unsigned multiplier • map A , B and C to the Int18 inputs: Bit weight 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 b7 b6 b5 b4 b3 b2 b1 b0 0 0 c7 c6 c5 c4 c3 c2 c1 c0 P Q 0 0 0 0 0 0 0 0 0 0 a7 a6 a5 a4 a3 a2 a1 a0 y15 y14 y13 y12 y11 y10 y9 y8 y7 y6 y5 y4 y3 y2 y1 y0 z15 z14 z13 z12 z11 z10 z9 z8 z7 z6 z5 z4 z3 z2 z1 z0 O=PxQ o25 o24 o23 o22 o21 o20 o19 o18 o17 o16 o15 o14 o13 o12 o11 o10 o9 o8 o7 o6 o5 o4 o3 o2 o1 o0 Intel Corporation 4 INTEL PUBLIC September 9, 2019

Unsigned Int8, shared input • compute Y = A · C and Z = A · B using an 18x18 multiplier • A , B and C 8-bit unsigned numbers • the 18x18 multiplier is configured as an unsigned multiplier • map A , B and C to the Int18 inputs: Bit weight 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 b7 b6 b5 b4 b3 b2 b1 b0 0 0 c7 c6 c5 c4 c3 c2 c1 c0 P Q 0 0 0 0 0 0 0 0 0 0 a7 a6 a5 a4 a3 a2 a1 a0 y15 y14 y13 y12 y11 y10 y9 y8 y7 y6 y5 y4 y3 y2 y1 y0 z15 z14 z13 z12 z11 z10 z9 z8 z7 z6 z5 z4 z3 z2 z1 z0 O=PxQ o25 o24 o23 o22 o21 o20 o19 o18 o17 o16 o15 o14 o13 o12 o11 o10 o9 o8 o7 o6 o5 o4 o3 o2 o1 o0 How to obtain the rest of the bits of Y and Z ? Intel Corporation 4 INTEL PUBLIC September 9, 2019

Unsigned Int8, shared input Bit weight 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 b7 b6 b5 b4 b3 b2 b1 b0 0 0 c7 c6 c5 c4 c3 c2 c1 c0 P Q 0 0 0 0 0 0 0 0 0 0 a7 a6 a5 a4 a3 a2 a1 a0 y15 y14 y13 y12 y11 y10 y9 y8 y7 y6 y5 y4 y3 y2 y1 y0 z15 z14 z13 z12 z11 z10 z9 z8 z7 z6 z5 z4 z3 z2 z1 z0 O=PxQ o25 o24 o23 o22 o21 o20 o19 o18 o17 o16 o15 o14 o13 o12 o11 o10 o9 o8 o7 o6 o5 o4 o3 o2 o1 o0 • Observe: { o 25 ,..., o 10 } = { y 15 ,..., y 10 } + { z 15 ,..., z 0 } = { z 15 ,..., z 6 , y 15 ,..., y 10 } + { z 5 ,..., z 0 } • Therefore: { z 15 ,..., z 6 , y 15 ,..., y 10 } = { o 25 ,..., o 10 }−{ z 5 ,... z 0 } Intel Corporation 5 INTEL PUBLIC September 9, 2019

Unsigned Int8, shared input - architecture P Q {b5,...,b0} {a5,...,a0} LSB mult 18x18 6x6 mult {z5,...,z0} {o25,...,o10} subtractor {y9,...,y0} {z15,...,z6,y15,...,y10} {z5,...,z0} • { z 5 ,..., z 0 } = { a 5 ,.., a 0 }{ c 5 ,.., c 0 } [ 5 : 0 ] • Z 5:0 obtained using truncated (LSB) multiplier Intel Corporation 6 INTEL PUBLIC September 9, 2019

Unsigned Int8, shared input - architecture P Q {b5,...,b0} {a5,...,a0} LSB mult 18x18 6x6 mult {z5,...,z0} {o25,...,o10} subtractor {y9,...,y0} {z15,...,z6,y15,...,y10} {z5,...,z0} • { z 5 ,..., z 0 } = { a 5 ,.., a 0 }{ c 5 ,.., c 0 } [ 5 : 0 ] • Z 5:0 obtained using truncated (LSB) multiplier • technique also extends to other multiplier sizes • the wider the overlap Y , Z overlap, the larger the area Intel Corporation 6 INTEL PUBLIC September 9, 2019

Signed Int8, shared input • comptue Y = A · C and Z = A · B with A , B and C 8-bit signed numbers • 18x18 multiplier is a signed multiplier with pre-adder Intel Corporation 7 INTEL PUBLIC September 9, 2019

Signed Int8, shared input • comptue Y = A · C and Z = A · B with A , B and C 8-bit signed numbers • 18x18 multiplier is a signed multiplier with pre-adder • map A , B and C to the multiplier inputs: 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 P operation Q (P+Q)R R Intel Corporation 7 INTEL PUBLIC September 9, 2019

Signed Int8, shared input • comptue Y = A · C and Z = A · B with A , B and C 8-bit signed numbers • 18x18 multiplier is a signed multiplier with pre-adder • map A , B and C to the multiplier inputs: 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 b7 b6 b5 b4 b3 b2 b1 b0 c7 c7 c7 c6 c5 c4 c3 c2 c1 c0 P operation Q c7 c7 c7 c7 c7 c7 c7 c7 0 0 0 0 0 0 0 0 0 0 (P+Q)R R a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a6 a5 a4 a3 a2 a1 a0 Intel Corporation 7 INTEL PUBLIC September 9, 2019

Signed Int8, shared input • comptue Y = A · C and Z = A · B with A , B and C 8-bit signed numbers • 18x18 multiplier is a signed multiplier with pre-adder • map A , B and C to the multiplier inputs: 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 b7 b6 b5 b4 b3 b2 b1 b0 c7 c7 c7 c6 c5 c4 c3 c2 c1 c0 P operation Q c7 c7 c7 c7 c7 c7 c7 c7 0 0 0 0 0 0 0 0 0 0 (P+Q)R R a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a6 a5 a4 a3 a2 a1 a0 o25 o24 o23 o22 o21 o20 o19 o18 o17 o16 o15 o14 o13 o12 o11 o10 o9 o8 o7 o6 o5 o4 o3 o2 o1 o0 Intel Corporation 7 INTEL PUBLIC September 9, 2019

Extracting INT8 Multipliers from INT18 Multipliers Bogdan Pasca, - PowerPoint PPT Presentation

Extracting INT8 Multipliers from INT18 Multipliers Bogdan Pasca, Martin Langhammer, Gregg Baeckler, Sergey Gribok Intel Corporation Context Machine learning increase density of small-precision arithmetic INT8 - commonly used for

Classes of Herz-Schur multipliers Ivan Todorov April 2014 Toronto Content Positive multipliers

8-bit Inference with TensorRT Szymon Migacz, NVIDIA May 8, 2017 Intro Goal: Convert FP32

1 Methods of Extracting or Obtaining Essential Oils The most common method for extracting

Decomposable Schur multipliers and non-commutative Fourier multipliers Christoph Kriegler

5 Multipliers Of IMPACT How do you measure IMPACT? The 5 Multipliers of IMPACT Awareness

Littlewood-Paley Theory and Multipliers George Kinnear September 11, 2009 George Kinnear

Norms of idempotent Schur multipliers Rupert Levene University College Dublin Banach Algebras

A simple and robust A simple and robust algorithm for extracting algorithm for extracting

Extracting Tables from PDFs Extracting Tables from PDFs Using Camelot and Excalibur to

Extracting Gait Parameters Extracting Gait Parameters from Raw Data from Raw Data

Program Analysis Program Analysis Extracting information, in order to present Extracting

CKM 2006 CKM 2006 Extracting CKM phase from phase from Extracting CKM B K

Using multipliers to study rebound effects Oluwafisyao Alabi, Antonios Katris Centre for Energy

Uncertainty Principles for Fourier Multipliers Michael Northington V School of Mathematics

FPGA Multipliers Bogdan PASCA projet Ar enaire, ENS-Lyon/INRIA/CNRS/Universit e de Lyon,

Cash Flow Multipliers and Optimal Investment Decisions Holger Kraft 1 Eduardo S. Schwartz 2 1

Quiz I Give our two primary interpretations of matrix-vector multiplication. I Give the

MULTIPLICATION p = x y x (multiplicand), y (multiplier), and p (product) signed integers

A hierarchical graph-based approach to generating formally-proofed Galois-field multipliers

Matrix Multiplication Matrix multiplication is an operation with properties quite different from

Computational Optimization Augmented Lagrangian NW 17.3 Upcoming Schedule No class April 18

Combining ACL2 and an Automated Verification Tool to Verify a Multiplier Jun Sawada and Erik

High-level State Machines & RTL Design Prof. Usagi Recap: Clock signal 0ns 10ns 20ns

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Extracting INT8 Multipliers from INT18 Multipliers Bogdan Pasca, - PowerPoint PPT Presentation

Extracting INT8 Multipliers from INT18 Multipliers Bogdan Pasca, Martin Langhammer, Gregg Baeckler, Sergey Gribok Intel Corporation Context Machine learning increase density of small-precision arithmetic INT8 - commonly used for

Classes of Herz-Schur multipliers Ivan Todorov April 2014 Toronto Content Positive multipliers

8-bit Inference with TensorRT Szymon Migacz, NVIDIA May 8, 2017 Intro Goal: Convert FP32

1 Methods of Extracting or Obtaining Essential Oils The most common method for extracting

Decomposable Schur multipliers and non-commutative Fourier multipliers Christoph Kriegler

5 Multipliers Of IMPACT How do you measure IMPACT? The 5 Multipliers of IMPACT Awareness

Littlewood-Paley Theory and Multipliers George Kinnear September 11, 2009 George Kinnear

Norms of idempotent Schur multipliers Rupert Levene University College Dublin Banach Algebras

A simple and robust A simple and robust algorithm for extracting algorithm for extracting

Extracting Tables from PDFs Extracting Tables from PDFs Using Camelot and Excalibur to

Extracting Gait Parameters Extracting Gait Parameters from Raw Data from Raw Data

Program Analysis Program Analysis Extracting information, in order to present Extracting

CKM 2006 CKM 2006 Extracting CKM phase from phase from Extracting CKM B K

Using multipliers to study rebound effects Oluwafisyao Alabi, Antonios Katris Centre for Energy

Uncertainty Principles for Fourier Multipliers Michael Northington V School of Mathematics

FPGA Multipliers Bogdan PASCA projet Ar enaire, ENS-Lyon/INRIA/CNRS/Universit e de Lyon,

Cash Flow Multipliers and Optimal Investment Decisions Holger Kraft 1 Eduardo S. Schwartz 2 1

Quiz I Give our two primary interpretations of matrix-vector multiplication. I Give the

MULTIPLICATION p = x y x (multiplicand), y (multiplier), and p (product) signed integers

A hierarchical graph-based approach to generating formally-proofed Galois-field multipliers

Matrix Multiplication Matrix multiplication is an operation with properties quite different from

Computational Optimization Augmented Lagrangian NW 17.3 Upcoming Schedule No class April 18

Combining ACL2 and an Automated Verification Tool to Verify a Multiplier Jun Sawada and Erik

High-level State Machines &amp; RTL Design Prof. Usagi Recap: Clock signal 0ns 10ns 20ns

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

High-level State Machines & RTL Design Prof. Usagi Recap: Clock signal 0ns 10ns 20ns