Approach in ML Architecture" Professor Uri Weiser Viterbi - PowerPoint PPT Presentation

"The Next Challenge: Energy Efficient Approach in ML Architecture" Professor Uri Weiser Viterbi Faculty of Electrical Engineering Uri Weiser The Technion Israel UPC October 10 th 2018 July 1 st 2019 Contributors to the research: Leeor Peled, Daniel Raskin, Gil Shomron, Leonid Yavits, Moran Shkolnik, Avi Baum, 1 The presentation is based on work by: Gil Shomron, Daniel Raskin,, Loren Jammal, Avi Baum, Yoav Etsion

To Yale 5 years passed since Yale@75, OK but why do you have to drag us with you? Interesting how you keep staying in the center … 2

Beauty comes shining through not only when blooming 3

Agenda: • Technology environment • Process is slowing down • Big Data • Funnel • Killer apps ➔ ML • Efficient ML BASICS • Energy: • Amdahl and MA (divide effectively our limited resources) • SMT – is this a biggy? • Pipeline – why? • Map applications to HW – Data Flow concept • Prediction – no validation is necessary • Conclusions 4 4

Technology environment Performance History Relative Performance uArch impact 20X Total impact 2,000X Process impact 100X Process Feature size [um] We (the architects) did an “ OK- ” job 5 5 *ACMqueue April 6, 2012, Processors, Volume 10, issue 4 CPU DB: Recording Microprocessor History, Andrew Danowitz, Kyle Kelley, James Mao, John P. Stevenson, Mark Horowitz, Stanford University

Big Data ➔ usage of DATA Input: Unstructured data Funnel BW out beta= BW in ➔ Extract Transformed Load ➔ Read Once ➔ Non-Temporal Memory Access 6

Killer applications * • ML is one!? • Funnel (in most of the cases) • Input: huge amount of data Output: small amount • Many simple operations *Applications you can not effectively execute on current HW (Dr. Andy Grove) 7

Energy in “ Data Flow ” architecture Instruction energy breakdown I-Cache Access Control Reg. File OP. Access 25pJ 6pJ 45pJ 0.5pJ Data access “ Data flow ” energy breakdown Now Read Once counts! 8

Efficient ML I Accelerator • Energy ➔ Performance • Map applications to HW ➔ Graph mapping; data flow • Efficient mapping • Co-design HW structure and smart compiler in specific application environment • Almost no flow control • Statistical results – no need to validate execution 9

Efficient ML II Balanced design and energy reduction • Energy ➔ Performance • System vs. Accelerators : It is Amdahl again! • Energy reduction • Reduction in Computing (MACs op.) • Pruning • Prediction • Reduction in data access and movement • Pipeline • Efficient usage of the Hardware resources • Multi-Amdahl (divide effectively your limited resources) • SMT 10

Efficient ML II: Reduction in Computing 0.01pJ/OP Energy efficiency 0.1pJ/OP 1 TOPS/W drop due to inefficiency 2 (e.g. data movement, DRAM repeated accesses … ) Throughput Energy efficiency α energy/OP 11 ISSCC Feb 17 th 2019 preen announcement

Efficient ML II: Reduction in Computing (1) • Reduction in Computing ➔ reduce # of operations via • Pruning • Well known techniques • Value Data (Prediction) • ML are statistical ➔ no need to validate execution G. Shomron, U. Weiser, “ Spatial Correlation and Value Prediction in Convolutional Neural Networks ” IEEE Computer Architecture Letters (CAL) Journal January 2019 12

Efficient ML II: Reduction in Data Accesses (2) • Reduction in Data access and movements • Pipeline execution ➔ Data stays on die Memory (DRAM) Memory (SRAM) I R IN Out = IN MAC MAC MAC MAC Layer 1 Layer 2 Layer 3 Layer n 13

Efficient ML II: Efficient usage of HW • Multi-Amdahl* (divide effectively your limited resources) t 1 t 2 t 3 t n F 1 (a 1 ) F 2 (a 2 ) F n (a n ) Optimization using Lagrange multipliers Target under a constraint A F ’ = derivation of the accelerator Function t i F i ’ (a i ) = t j F j ’ (a j ) e.g. efficient resource division (e.g. SRAM)* • SMT** • Resources needs are known ahead of time … *T. Zidenberg, Isaac Keslassy, U. Weiser, “ Optimal Resource Allocation with MultiAmdahl ” IEEE MICRO Journal August 2013 ** Technion EE, Advanced Microarchitecture course ’ s Exam (winter 2019) ** *G. Shomron, T. Horowitz, U. Weiser, “ SMT-SA: Simultaneous Multithreading in Systolic Arrays ” IEEE Computer Architecture Letters (CAL) Journal July 2019 14

Conclusions • Opportunities: Map application to HW • • Reduce energy per operation? • Reduce # of operations • Reduce data movement and memory access • Efficient usage of HW We ’ re gonna have fun • • Open field, lots of ideas, many researchers Opportunities • New passionate energy in the community • Back to the “ big impact ” era … • 15

Thank You 16

Approach in ML Architecture" Professor Uri Weiser Viterbi - PowerPoint PPT Presentation

"The Next Challenge: Energy Efficient Approach in ML Architecture" Professor Uri Weiser Viterbi Faculty of Electrical Engineering Uri Weiser The Technion Israel UPC October 10 th 2018 July 1 st 2019 Contributors to the research:

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Architecture: Culture and Space Architecture: Culture and Space Architecture: Culture and Space

Introduction to Software Architecture Reid Holmes Architecture Architecture is: All

CMS Strip Readout Architecture for SLHC OUTLINE brief review of LHC strip readout architecture p

A new open source approach for requirements and architecture A new open source approach for

Betting on Software Architecture as Code a note on hypothesis-driven architecture James Lewis :

Institute for East Asian Architecture and Urbanism in Kyoto www.East-Asian-Architecture.org

Defense Daily Open Architecture Summit 2014 Defense Daily Open Architecture Summit 2014 PEO IWS

Wisznia | Architecture + Development Wisznia | Architecture + Development The Rebirth of a

Four Layers to Build a Four Layers to Build a Trusted Architecture Trusted Architecture Danny

Generic Architecture Architecture Generic to Securely Securely Manage Manage to

Reference Architecture A Reference Architecture for Web Servers by Hassan, Holt SWAG

Clean Architecture Clean Architecture in Python in Python Sebastian Buczyski Sebastian

From Requirements to Architecture Ana Moreira Software Architecture - Basics 1 Goals

Overview of Sofware Architecture Sofware Architecture VO (706.706) Roman Kern 2020-10-04

Generic Architecture Architecture Generic for Securely Securely Managing Managing for

Visual comparisons Comparing distributions: Part 1 R.W. Oldford The Titanic The data set

VLSI programming Systolic Design Book Parhi, Chp. 7 Rudolf Mak r.h.mak@tue.nl 18-May-16

Dont Use a Single Large Systolic Array, Use Many Small Ones Instead H. T. Kung Harvard

Probabilistic modeling of sensor artifacts in critical care Norm Aleks and Stuart J. Russell

CS137: Today Electronic Design Automation Sequential Sorting Building on Parallel

Algorithm-SoC Co-Design for Mobile Continuous Vision Yuhao Zhu Department of Computer Science

Using TPUs to Design TPUs Cliff Young, Google AI AIDArc Keynote 3 June 2018 Why Were at

Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University