Adaptive Mixed Precision Kernel Recursive Least Squares JunKyu Lee, - PowerPoint PPT Presentation

Adaptive Mixed Precision Kernel Recursive Least Squares JunKyu Lee, Hans Vandierendonck, Dimitrios S. Nikolopoulos En Entrans

Key Message from this talk (Two Fold) • Introduction to Transprecision Computing: - What? Why? How? • Case-Study for Transprecision Computing: - Adaptive Mixed Precision Kernel Recursive Least Squares

What and Why Transprecision Computing? Transprecision Technique : Any (Precision) Technique to Minimise Execution Time/Energy Consumption without Accuracy Loss (or Minor Accuracy Loss) and HW Resource Increment Transprecision Computing : Computing utilising Transprecision Techniques Parallel Computing with m X Cores m X Speedup m X Power 1 X Energy Need some techniques for energy saving? Transprecision Computing without increasing cores n X Speedup 1 X Power 1/n X Energy Energy Savings!

Transprecision Computing on GPUs/FPGAs NVidia Pascal GPU (P100) with NVLink: Half precision: 21.2 TeraFlops Single precision: 10.6 TeraFlops Double precision: 5.3 TeraFlops NVidia Volta GPU (V100) with NVLink: Half precision (Tensor Core): 125 TeraFlops Single precision: 15.7 TeraFlops Double precision: 7.8 TeraFlops FPGAs/ASIC : ALU Precision Size of Adder: Linear increase with precision Lower Higher Size of Multiplier: Quadratic Smaller ALUs Larger ALUs increase with precision Number of ALUs Clock Rate Shorter Wires Number of TRs in Fixed Area Shorter Pipeline

Transprecision (OPRECOMP) vs Mixed Precision Fast Data transfer from/to Memory => Minimizing Runtime Transprecision Exploiting Minor Disruptive H/W Accuracy Loss techniques Dynamic Precision Mixed Precision Skipping Operations Dynamic Static variable Algorithm precision arithmetic Any techniques caused by precision utilization

Transprecision Computing - Lower Precision, More Energy Savings - But, Lower Precision, Lower Accuracy? - Transprecision Computing! Energy Savings without accuracy loss! - How?

Transprecision Techniques (TTs) TT 1: To explore error propagation for each computing component Computing Error e 1 Error e 21 Component 2 Computing Output y Input x Computing Component Component 4 Error e f 1 Computing Component Error e 1 Error e 31 3

Transprecision Techniques (TTs) TT 1: To explore error propagation for each computing component A + ∆A x + ∆x (input error) y + ∆y FULL = Ax + (A ∆x + ∆Ax + ∊ F " (Ax)) Computing Component 2 Full Precision Arithmetic

Transprecision Techniques (TTs) TT 1: To explore error propagation for each computing component A + ∆A x + ∆x (input error) y + ∆y FULL = Ax + (A ∆x + ∆Ax + ∊ F " (Ax)) Computing Component 2 Full Precision Arithmetic ||∆y FULL || includes err una (Unavoidable) and err rnd (Controllable: Precision arithmetic related) Key Idea: Extremely small ∊ then err una dominant. You can lower ∊ until ||∆y FULL || not affected

Transprecision Techniques (TTs) TT 1: To explore error propagation for each computing component A + ∆A x + ∆x (input error) y + ∆y Reduced Computing Component 2 Reduced Precision Arithmetic err una err rnd ≳ When size of Unavoidable Error equivalent of size of rounding off error => Reduced Arithmetic can be applied to the Computing Component

Transprecision Techniques (TTs) TT 1: To explore error propagation for each computing component A + ∆A x + ∆x (input error) y + ∆y Reduced Computing Component 2 Reduced Precision Arithmetic || A∆x + ∆Ax || || ∊ R " (Ax) || ≳ (unavoidable err) (controllable rounding-off error) Reduced Arithmetic can be applied to the Computing Component

Transprecision Techniques (TTs) TT 1: To explore error propagation for each computing component A + ∆A x + ∆x (input error) y + ∆y Reduced Computing Component 2 Case 1: Linear Solver Reduced Precision Arithmetic err rnd ∝ " (A) # R Case 2: Matrix-vector err rnd ∝ " (A) # R Case 3: Dot-product err rnd ∝ (| x 1 | T | x 2 |)/| x 1 T x 2 | # R Question is: How can we use such properties? Let us look at a Case Study with a kernel machine learning algorithm.

Kernel Recursive Least Squares (KRLS) Applications: Non-linear regressions - Simple RLS mechanism - No local minima approach (Always approach to a global minimum) - Fast Convergence / Good Prediction Accuracy - Online Adaptive Learning (Learning weights One by One – Fit for large scale Machine Learning)

Kernel Recursive Least Squares (KRLS) Linear Regression: Seek w = (w 0 , w 1 , w 2 , w 3 ) to estimate y given an input x = (1, x 1 , x 2 , x 3 ) with est(y) = x T w = w 0 1 + w 1 x 1 + w 2 x 2 + w 3 x 3. Non-linear Regression (Kernel Method): Perform linear regression in higher dimension Hilbert space ! est(y) = " ( x ) T w = " ( x ) T [ " ( # $ n )] α = % & . T α $ 1 ) … " ( # Input x [ " ( # $ 1 ) … " ( # $ n )] Input Space Feature Space est(y) = % & T α Higher Dimension Hilbert Space kernel vector % &

Computing Components for KRLS n b x t y t " #$% (' # ) T * ! + #$% Compute ! " #$% (' # ) Large condition number no e t ALD d t > n ? Error e tolerant to reduced yes precision arithmetic ! $% update P t update ) # Small condition number yes/no * + # update

Transprecision Computing for KRLS n b x t y t " #$% (' # ) T * ! + #$% Compute ! " #$% (' # ) Double Precision Double Precision no e t ALD d t > n ? Double Precision yes Single Precision ! $% update P t update ) # Double Precision yes/no * + # update Double Precision

Condition Numbers of Matrices in KRLS Condition number of K -1 and P depends on ALD n and kernel width b => Cross Validation decides n and b

Case Study: Transprecision Computing for KRLS Non-Linear Regressions: y = sin(x 1 )/x 1 + x 2 /10.0 + cos(x 3 ) - x 1 , x 2 , x 3 = Uniform Random [-10, 10] - Gaussian Kernel Width b = 2.5 - ALD n = 0.01 - Intel(R) Xeon(R) CPU E5-2650 2GHz Single Core - Energy Estimation: ALEA Energy Profiling Tool Smaller ALD n , generally larger dictionary size, larger condition number of K and better prediction Observe Prediction Accuracies / Training Time / Energy Consumption

Case Study: Transprecision Computing for KRLS Prediction Accuracy

Case Study: Transprecision Computing for KRLS Execution Time Reduction

Case Study: Transprecision Computing for KRLS Energy Consumption Reduction

Case Study: Transprecision Computing for KRLS NOTICE No Accuracy Loss: Prediction Accuracies are IDENTICAL (up to 6 digits) between Full precision arithmetic KRLS and Mixed precision KRLS Speedups and Energy Savings: Mixed precision KRLS achieves 1.5X Speedups and Energy Savings over Full precision KRLS for Training

Conclusions Transprecision Computing: Minimising Execution Time without accuracy loss and HW resource increment Mixed precision KRLS by Transprecision Techniques brought 1.5X Speedups and Energy Savings without Accuracy Loss compared to Full precision KRLS Transprecision Computing can be a crucial paradigm for ManyCore in order to achieves both Speedups and Energy Savings.

Transprecision Computing Projects in QUB Entrans: Energy Efficient Transprecision Techniques for Linear Solver Jun. 2018 – May 2020 Regularized Transprecision Computing (No Accuracy Loss, No disruptive HW techniques allowed) - Aim: To seek energy savings for all types of linear solvers by utilizing transprecision techniques for MultiCore/GPUs. - H2020 Marie Sklodowska Curie Action Individual Project OPRECOMP: Open Transprecision Computing Transprecision Computing Jan. 2017 – Dec. 2020 (Minor Accuracy Loss, Disruptive HW techniques allowed) - Aim: To seek energy savings for Machine Learning/Scientific/IoT by utilizing transprecision/approximate computing techniques for MultiCore/GPUs/FPGAs. - H2020 Project

Thank you very much Any Questions?

Adaptive Mixed Precision Kernel Recursive Least Squares JunKyu Lee, - PowerPoint PPT Presentation

Adaptive Mixed Precision Kernel Recursive Least Squares JunKyu Lee, Hans Vandierendonck, Dimitrios S. Nikolopoulos En Entrans Key Message from this talk (Two Fold) Introduction to Transprecision Computing: - What? Why? How?

Mixed Precision Training PAI Overview What is mixed-precision

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

ECE 516: Adaptive Digital Filters Lecture 13 (Recursive Least-Squares) Mojtaba Soltanalian 2

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

EFFECTIVE USE OF MIXED PRECISION FOR HPC Kate Clark, Smoky Mountain Conference 2019 Why Mixed

MIXED PRECISION TRAINING OF DEEP NEURAL NETWORKS Carl Case, NVIDIA OUTLINE 1. What is mixed

MIXED PRECISION TRAINING Michael OConnor MIXED PRECISION What is the benefit? Using mixed

61A Lecture 6 Announcements Recursive Functions Recursive Functions 4 Recursive Functions

Recursive Methods Noter ch.2 Recursive Methods Recursive problem solution Problems

Recursion Announcements Recursive Functions Recursive Functions 4 Recursive Functions

Lesson 9 Recursive Types 2/19, 21 Chapters 20, 21 Recursive type Recursive type terms are

Statistical Geometry Processing Winter Semester 2011/2012 Least-Squares Least-Squares Fitting

9. Equality constraints and tradeoffs More least squares Example: moving average model

8. Least squares Review of linear equations Least squares Example: curve-fitting

Reduced-Precision Floa1ng-Point Analysis via Binary Modifica1on

The Impact of Multicore Multicore on Math Software on Math Software The Impact of and

Data Definition Language Data Definition Language Allows the specification of not only a set of

Lecture 9: Floating Point Todays topics: Division IEEE 754 representations FP

Return of the hardware floating-point elementary functions J er emie Detrey, Florent de

On the Impact of Number Representation for High-Order LES F.D. Witherden Department of Ocean

doubly linked lists Sept. 20/21, 2017 1 Singly linked list head tail 2 Doubly linked list

Java LinkedList Sept. 16, 2016 1 Doubly linked lists next prev element head Each node has

Sambuz

Useful Links

Newsletter

Mail Us