CMSC5743 L06: Binary/Ternary Network Bei Yu (Latest update: - PowerPoint PPT Presentation

CMSC5743 L06: Binary/Ternary Network Bei Yu (Latest update: November 2, 2020) Fall 2020 1 / 21

These slides contain/adapt materials developed by ◮ Ritchie Zhao et al. (2017). “Accelerating binarized convolutional neural networks with software-programmable FPGAs”. In: Proc. FPGA , pp. 15–24 ◮ Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542 2 / 21

Motivation Binary / Ternary Net: Motivation 6400 4800 Count 3200 => 1600 0 0 1 -1 0 1 − 0.05 0 0.05 Weight Value 3 / 21

� Binarized Neural Networks (BNN) CNN Key Differences 1. Inputs are binarized ( − 1 or +1) 2.4 6.2 … 5.0 9.1 … ∗ 0.8 0.1 3.3 1.8 4.3 7.8 = 2. Weights are binarized ( − 1 or +1) 0.3 0.8 … … 3. Results are binarized after Weights batch normalization Input Map Output Map BNN Batch Normalization 4 23 = 1 23 − 5 : + < 1 −1 … 1 −3 … 1 −1 … 6 7 − 8 ∗ 1 −1 1 1 3 −7 1 −1 = → 1 −1 … … … = 23 = >+1 if 4 23 ≥ 0 Weights 1 23 Input Map −1 otherwise Output Map (Binary) (Binary) (Binary) (Integer) Binarization 6 4 / 21

BNN CIFAR-10 Architecture [2] Feature map 32x32 dimensions 16x16 8x8 4x4 10 512 256 512 128 256 3 128 Number of feature maps 1024 1024 � 6 conv layers, 3 dense layers, 3 max pooling layers � All conv filters are 3x3 � First conv layer takes in floating-point input � 13.4 Mbits total model size (after hardware optimizations) [2] M. Courbariaux et al. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 . arXiv:1602.02830 , Feb 2016. 7 4 / 21

Advantages of BNN 1. Floating point ops replaced with binary logic ops b 1 b 2 b 1 1 ⨯ ⨯ b 2 b 1 b 2 b 1 1 XO XOR b 2 +1 +1 +1 0 0 0 +1 −1 −1 0 1 1 −1 +1 −1 1 0 1 −1 −1 +1 1 1 0 – Encode {+1, − 1} as {0,1} à multiplies become XORs – Conv/dense layers do dot products à XOR and popcount – Operations can map to LUT fabric as opposed to DSPs 2. Binarized weights may reduce total model size – Fewer bits per weight may be offset by having more weights 8 4 / 21

BNN vs CNN Parameter Efficiency Architecture Depth Param Bits Param Bits Error Rate (Float) (Fixed-Point) (%) ResNet [3] 164 51.9M 13.0M* 11.26 (CIFAR-10) BNN [2] 9 - 13.4M 11.40 * Assuming each float param can be quantized to 8-bit fixed-point � Comparison: – Conservative assumption: ResNet can use 8-bit weights – BNN is based on VGG (less advanced architecture) – BNN seems to hold promise! [2] M. Courbariaux et al. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 . arXiv:1602.02830 , Feb 2016. [3] K. He, X. Zhang, S. Ren, and J. Sun. Identity Mappings in Deep Residual Networks. ECCV 2016. 9 4 / 21

Overview Minimize the Quantization Error Reduce the Gradient Error 5 / 21

Overview Minimize the Quantization Error Reduce the Gradient Error 6 / 21

1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

Training Binary Weight Networks Naive S Solution: � ! �� .1�� 1�.� �� .�� ! ��2� .�� 1��. ��.�� 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

AlexNet'TopX1'(%)'ILSVRC2012' 60' 56.7' 50' 40' 30' 20' 10' 0.2' 0' Full'Precision' '' ' Naïve' 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

R W W '.'.'.'' ''.'.'.''' R R Binarization B W B '.'.'.'' ''.'.'.''' B B 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

Binary Weight Network Train f for b binary w y weights: 1. Randomly initialize W 2. For iter = 1 to N R '.'.'.'' ''.'.'.''' R R 3. Load a random input image X W B = sign( W ) 4. α = k W k ` 1 5. n Forward pass with α, W B 6. Compute loss function C 7. @ W = Backward pass with α, W B @ C 8. Update W ( W = W − @ C @ W ) 9. 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

AlexNet'TopX1'(%)'ILSVRC2012' 60' 56.8' 56.7' 50' 40' 30' 20' 10' 0.2' 0' '' ' Naïve' Full'Precision' Binary'Weight' 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

(1) Binarizing Weights = R B (2) Binarizing Input Redundant computation in overlapping areas = R B Inefficient = sign( X ) X (2) Binarizing Input = = � | X : , : , i | B Efficient c sign( X ) c" Average Filter (3) Convolution with XNOR-Bitcount ≈ R B R B sign( X ) 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

AlexNet'TopX1'(%)'ILSVRC2012' 60' 56.7' 56.8' 50' 40' 30.5' 30' 20' 10' 0.2' 0' '' ' 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

Network Structure in XNOR-Networks BNorm' Conv' AcIv' Pool' +1' sign(x) ! ' X1' A'typical'block'in'CNN' MaxXPooling' ✗ InformaIon'Loss' ✓ MulIple'Maximums' 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

Network Structure in XNOR-Networks BNorm' Conv' AcIv' Pool' ' ✗ InformaIon'Loss' ✓ MulIple'Maximums' 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

Network Structure in XNOR-Networks BNorm' BNorm' Conv' Pool' AcIv' AcIv' ' ✓ InformaIon'Loss' ✓ MulIple'Maximums' 1 1 Mohammad Rastegari et al. (2016). “XNOR-NET: Imagenet classification using binary convolutional neural networks”. In: Proc. ECCV , pp. 525–542. 6 / 21

CMSC5743 L06: Binary/Ternary Network Bei Yu (Latest update: - PowerPoint PPT Presentation

CMSC5743 L06: Binary/Ternary Network Bei Yu (Latest update: November 2, 2020) Fall 2020 1 / 21 These slides contain/adapt materials developed by Ritchie Zhao et al. (2017). Accelerating binarized convolutional neural networks with

Binary and Ternary Kloosterman sums Kseniya Garaschuk University of Victoria July 22, 2010

CENG 3420 Lecture 06: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L06.1 Spring 2020

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Odds and Ends http://cs.mst.edu Ternary Operator expression1 ? expression2 : expression3

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

CMSC5743 L09: Network Architecture Search Bei Yu (Latest update: September 13, 2020) Fall 2020

Ternary Expansions of Powers of 2 Je ff Lagarias , University of Michigan Workshop on Discovery

T erna ry and Quaterna ry Lattice Diagrams Singapur, Septemb er 1997 1 ' $ TERNARY

TERNARY RELATIONAL SEMANTICS STANDARD WITHOUT A SET OF DESIGNATED POINTS RELEVANCE LOGICS

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

LECTURE 2 Review 1 Binary Math and Assembly BINARY MATH In this section, we review Binary

Soft SUSY breaking in Type IIA flux compactifications Dagoberto Escobar Instituto de F sica

terminal rasa every music begins with silence Introduction How can the user interact

Roadmap for OO Design What is a Relational Database? Database = collection of tables RDBMS

3-algebras and (2 , 0) Supersymmetry Neil Lambert CERN Galileo Galilei Institute, Florence, 28

INF4820 Algorithms for AI and NLP Summing up Exam preparations Murhaf Fares &

Modeling the dynamics of use and acquisition in language change . Christopher Ahern University

Joint E oint Eur uropean opean Stak Stakeholder Gr eholder Group oup Tuesday 17 March 2015:

Distributions for Higgs + Jet at Hadron Colliders: MSSM vs SM Oliver Brein Institute for

CMSC5743 L06: Binary/Ternary Network Bei Yu (Latest update: - PowerPoint PPT Presentation

CMSC5743 L06: Binary/Ternary Network Bei Yu (Latest update: November 2, 2020) Fall 2020 1 / 21 These slides contain/adapt materials developed by Ritchie Zhao et al. (2017). Accelerating binarized convolutional neural networks with

Binary and Ternary Kloosterman sums Kseniya Garaschuk University of Victoria July 22, 2010

CENG 3420 Lecture 06: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L06.1 Spring 2020

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Odds and Ends http://cs.mst.edu Ternary Operator expression1 ? expression2 : expression3

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

CMSC5743 L09: Network Architecture Search Bei Yu (Latest update: September 13, 2020) Fall 2020

Ternary Expansions of Powers of 2 Je ff Lagarias , University of Michigan Workshop on Discovery

T erna ry and Quaterna ry Lattice Diagrams Singapur, Septemb er 1997 1 ' $ TERNARY

TERNARY RELATIONAL SEMANTICS STANDARD WITHOUT A SET OF DESIGNATED POINTS RELEVANCE LOGICS

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

LECTURE 2 Review 1 Binary Math and Assembly BINARY MATH In this section, we review Binary

Soft SUSY breaking in Type IIA flux compactifications Dagoberto Escobar Instituto de F sica

terminal rasa every music begins with silence Introduction How can the user interact

Roadmap for OO Design What is a Relational Database? Database = collection of tables RDBMS

3-algebras and (2 , 0) Supersymmetry Neil Lambert CERN Galileo Galilei Institute, Florence, 28

INF4820 Algorithms for AI and NLP Summing up Exam preparations Murhaf Fares &amp;

Modeling the dynamics of use and acquisition in language change . Christopher Ahern University

Joint E oint Eur uropean opean Stak Stakeholder Gr eholder Group oup Tuesday 17 March 2015:

Distributions for Higgs + Jet at Hadron Colliders: MSSM vs SM Oliver Brein Institute for

INF4820 Algorithms for AI and NLP Summing up Exam preparations Murhaf Fares &