adaptive mixed precision kernel recursive least squares
play

Adaptive Mixed Precision Kernel Recursive Least Squares JunKyu Lee, - PowerPoint PPT Presentation

Adaptive Mixed Precision Kernel Recursive Least Squares JunKyu Lee, Hans Vandierendonck, Dimitrios S. Nikolopoulos En Entrans Key Message from this talk (Two Fold) Introduction to Transprecision Computing: - What? Why? How?


  1. Adaptive Mixed Precision Kernel Recursive Least Squares JunKyu Lee, Hans Vandierendonck, Dimitrios S. Nikolopoulos En Entrans

  2. Key Message from this talk (Two Fold) • Introduction to Transprecision Computing: - What? Why? How? • Case-Study for Transprecision Computing: - Adaptive Mixed Precision Kernel Recursive Least Squares

  3. What and Why Transprecision Computing? Transprecision Technique : Any (Precision) Technique to Minimise Execution Time/Energy Consumption without Accuracy Loss (or Minor Accuracy Loss) and HW Resource Increment Transprecision Computing : Computing utilising Transprecision Techniques Parallel Computing with m X Cores m X Speedup m X Power 1 X Energy Need some techniques for energy saving? Transprecision Computing without increasing cores n X Speedup 1 X Power 1/n X Energy Energy Savings!

  4. Transprecision Computing on GPUs/FPGAs NVidia Pascal GPU (P100) with NVLink: Half precision: 21.2 TeraFlops Single precision: 10.6 TeraFlops Double precision: 5.3 TeraFlops NVidia Volta GPU (V100) with NVLink: Half precision (Tensor Core): 125 TeraFlops Single precision: 15.7 TeraFlops Double precision: 7.8 TeraFlops FPGAs/ASIC : ALU Precision Size of Adder: Linear increase with precision Lower Higher Size of Multiplier: Quadratic Smaller ALUs Larger ALUs increase with precision Number of ALUs Clock Rate Shorter Wires Number of TRs in Fixed Area Shorter Pipeline

  5. Transprecision (OPRECOMP) vs Mixed Precision Fast Data transfer from/to Memory => Minimizing Runtime Transprecision Exploiting Minor Disruptive H/W Accuracy Loss techniques Dynamic Precision Mixed Precision Skipping Operations Dynamic Static variable Algorithm precision arithmetic Any techniques caused by precision utilization

  6. Transprecision Computing - Lower Precision, More Energy Savings - But, Lower Precision, Lower Accuracy? - Transprecision Computing! Energy Savings without accuracy loss! - How?

  7. Transprecision Techniques (TTs) TT 1: To explore error propagation for each computing component Computing Error e 1 Error e 21 Component 2 Computing Output y Input x Computing Component Component 4 Error e f 1 Computing Component Error e 1 Error e 31 3

  8. Transprecision Techniques (TTs) TT 1: To explore error propagation for each computing component A + ∆A x + ∆x (input error) y + ∆y FULL = Ax + (A ∆x + ∆Ax + ∊ F " (Ax)) Computing Component 2 Full Precision Arithmetic

  9. Transprecision Techniques (TTs) TT 1: To explore error propagation for each computing component A + ∆A x + ∆x (input error) y + ∆y FULL = Ax + (A ∆x + ∆Ax + ∊ F " (Ax)) Computing Component 2 Full Precision Arithmetic ||∆y FULL || includes err una (Unavoidable) and err rnd (Controllable: Precision arithmetic related) Key Idea: Extremely small ∊ then err una dominant. You can lower ∊ until ||∆y FULL || not affected

  10. Transprecision Techniques (TTs) TT 1: To explore error propagation for each computing component A + ∆A x + ∆x (input error) y + ∆y Reduced Computing Component 2 Reduced Precision Arithmetic err una err rnd ≳ When size of Unavoidable Error equivalent of size of rounding off error => Reduced Arithmetic can be applied to the Computing Component

  11. Transprecision Techniques (TTs) TT 1: To explore error propagation for each computing component A + ∆A x + ∆x (input error) y + ∆y Reduced Computing Component 2 Reduced Precision Arithmetic || A∆x + ∆Ax || || ∊ R " (Ax) || ≳ (unavoidable err) (controllable rounding-off error) Reduced Arithmetic can be applied to the Computing Component

  12. Transprecision Techniques (TTs) TT 1: To explore error propagation for each computing component A + ∆A x + ∆x (input error) y + ∆y Reduced Computing Component 2 Case 1: Linear Solver Reduced Precision Arithmetic err rnd ∝ " (A) # R Case 2: Matrix-vector err rnd ∝ " (A) # R Case 3: Dot-product err rnd ∝ (| x 1 | T | x 2 |)/| x 1 T x 2 | # R Question is: How can we use such properties? Let us look at a Case Study with a kernel machine learning algorithm.

  13. Kernel Recursive Least Squares (KRLS) Applications: Non-linear regressions - Simple RLS mechanism - No local minima approach (Always approach to a global minimum) - Fast Convergence / Good Prediction Accuracy - Online Adaptive Learning (Learning weights One by One – Fit for large scale Machine Learning)

  14. Kernel Recursive Least Squares (KRLS) Linear Regression: Seek w = (w 0 , w 1 , w 2 , w 3 ) to estimate y given an input x = (1, x 1 , x 2 , x 3 ) with est(y) = x T w = w 0 1 + w 1 x 1 + w 2 x 2 + w 3 x 3. Non-linear Regression (Kernel Method): Perform linear regression in higher dimension Hilbert space ! est(y) = " ( x ) T w = " ( x ) T [ " ( # $ n )] α = % & . T α $ 1 ) … " ( # Input x [ " ( # $ 1 ) … " ( # $ n )] Input Space Feature Space est(y) = % & T α Higher Dimension Hilbert Space kernel vector % &

  15. Computing Components for KRLS n b x t y t " #$% (' # ) T * ! + #$% Compute ! " #$% (' # ) Large condition number no e t ALD d t > n ? Error e tolerant to reduced yes precision arithmetic ! $% update P t update ) # Small condition number yes/no * + # update

  16. Transprecision Computing for KRLS n b x t y t " #$% (' # ) T * ! + #$% Compute ! " #$% (' # ) Double Precision Double Precision no e t ALD d t > n ? Double Precision yes Single Precision ! $% update P t update ) # Double Precision yes/no * + # update Double Precision

  17. Condition Numbers of Matrices in KRLS Condition number of K -1 and P depends on ALD n and kernel width b => Cross Validation decides n and b

  18. Case Study: Transprecision Computing for KRLS Non-Linear Regressions: y = sin(x 1 )/x 1 + x 2 /10.0 + cos(x 3 ) - x 1 , x 2 , x 3 = Uniform Random [-10, 10] - Gaussian Kernel Width b = 2.5 - ALD n = 0.01 - Intel(R) Xeon(R) CPU E5-2650 2GHz Single Core - Energy Estimation: ALEA Energy Profiling Tool Smaller ALD n , generally larger dictionary size, larger condition number of K and better prediction Observe Prediction Accuracies / Training Time / Energy Consumption

  19. Case Study: Transprecision Computing for KRLS Prediction Accuracy

  20. Case Study: Transprecision Computing for KRLS Execution Time Reduction

  21. Case Study: Transprecision Computing for KRLS Energy Consumption Reduction

  22. Case Study: Transprecision Computing for KRLS NOTICE No Accuracy Loss: Prediction Accuracies are IDENTICAL (up to 6 digits) between Full precision arithmetic KRLS and Mixed precision KRLS Speedups and Energy Savings: Mixed precision KRLS achieves 1.5X Speedups and Energy Savings over Full precision KRLS for Training

  23. Conclusions Transprecision Computing: Minimising Execution Time without accuracy loss and HW resource increment Mixed precision KRLS by Transprecision Techniques brought 1.5X Speedups and Energy Savings without Accuracy Loss compared to Full precision KRLS Transprecision Computing can be a crucial paradigm for ManyCore in order to achieves both Speedups and Energy Savings.

  24. Transprecision Computing Projects in QUB Entrans: Energy Efficient Transprecision Techniques for Linear Solver Jun. 2018 – May 2020 Regularized Transprecision Computing (No Accuracy Loss, No disruptive HW techniques allowed) - Aim: To seek energy savings for all types of linear solvers by utilizing transprecision techniques for MultiCore/GPUs. - H2020 Marie Sklodowska Curie Action Individual Project OPRECOMP: Open Transprecision Computing Transprecision Computing Jan. 2017 – Dec. 2020 (Minor Accuracy Loss, Disruptive HW techniques allowed) - Aim: To seek energy savings for Machine Learning/Scientific/IoT by utilizing transprecision/approximate computing techniques for MultiCore/GPUs/FPGAs. - H2020 Project

  25. Thank you very much Any Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend