approach in ml architecture
play

Approach in ML Architecture" Professor Uri Weiser Viterbi - PowerPoint PPT Presentation

"The Next Challenge: Energy Efficient Approach in ML Architecture" Professor Uri Weiser Viterbi Faculty of Electrical Engineering Uri Weiser The Technion Israel UPC October 10 th 2018 July 1 st 2019 Contributors to the research:


  1. "The Next Challenge: Energy Efficient Approach in ML Architecture" Professor Uri Weiser Viterbi Faculty of Electrical Engineering Uri Weiser The Technion Israel UPC October 10 th 2018 July 1 st 2019 Contributors to the research: Leeor Peled, Daniel Raskin, Gil Shomron, Leonid Yavits, Moran Shkolnik, Avi Baum, 1 The presentation is based on work by: Gil Shomron, Daniel Raskin,, Loren Jammal, Avi Baum, Yoav Etsion

  2. To Yale 5 years passed since Yale@75, OK but why do you have to drag us with you? Interesting how you keep staying in the center … 2

  3. Beauty comes shining through not only when blooming 3

  4. Agenda: • Technology environment • Process is slowing down • Big Data • Funnel • Killer apps ➔ ML • Efficient ML BASICS • Energy: • Amdahl and MA (divide effectively our limited resources) • SMT – is this a biggy? • Pipeline – why? • Map applications to HW – Data Flow concept • Prediction – no validation is necessary • Conclusions 4 4

  5. Technology environment Performance History Relative Performance uArch impact 20X Total impact 2,000X Process impact 100X Process Feature size [um] We (the architects) did an “ OK- ” job 5 5 *ACMqueue April 6, 2012, Processors, Volume 10, issue 4 CPU DB: Recording Microprocessor History, Andrew Danowitz, Kyle Kelley, James Mao, John P. Stevenson, Mark Horowitz, Stanford University

  6. Big Data ➔ usage of DATA Input: Unstructured data Funnel BW out beta= BW in ➔ Extract Transformed Load ➔ Read Once ➔ Non-Temporal Memory Access 6

  7. Killer applications * • ML is one!? • Funnel (in most of the cases) • Input: huge amount of data Output: small amount • Many simple operations *Applications you can not effectively execute on current HW (Dr. Andy Grove) 7

  8. Energy in “ Data Flow ” architecture Instruction energy breakdown I-Cache Access Control Reg. File OP. Access 25pJ 6pJ 45pJ 0.5pJ Data access “ Data flow ” energy breakdown Now Read Once counts! 8

  9. Efficient ML I Accelerator • Energy ➔ Performance • Map applications to HW ➔ Graph mapping; data flow • Efficient mapping • Co-design HW structure and smart compiler in specific application environment • Almost no flow control • Statistical results – no need to validate execution 9

  10. Efficient ML II Balanced design and energy reduction • Energy ➔ Performance • System vs. Accelerators : It is Amdahl again! • Energy reduction • Reduction in Computing (MACs op.) • Pruning • Prediction • Reduction in data access and movement • Pipeline • Efficient usage of the Hardware resources • Multi-Amdahl (divide effectively your limited resources) • SMT 10

  11. Efficient ML II: Reduction in Computing 0.01pJ/OP Energy efficiency 0.1pJ/OP 1 TOPS/W drop due to inefficiency 2 (e.g. data movement, DRAM repeated accesses … ) Throughput Energy efficiency α energy/OP 11 ISSCC Feb 17 th 2019 preen announcement

  12. Efficient ML II: Reduction in Computing (1) • Reduction in Computing ➔ reduce # of operations via • Pruning • Well known techniques • Value Data (Prediction) • ML are statistical ➔ no need to validate execution G. Shomron, U. Weiser, “ Spatial Correlation and Value Prediction in Convolutional Neural Networks ” IEEE Computer Architecture Letters (CAL) Journal January 2019 12

  13. Efficient ML II: Reduction in Data Accesses (2) • Reduction in Data access and movements • Pipeline execution ➔ Data stays on die Memory (DRAM) Memory (SRAM) I R IN Out = IN MAC MAC MAC MAC Layer 1 Layer 2 Layer 3 Layer n 13

  14. Efficient ML II: Efficient usage of HW • Multi-Amdahl* (divide effectively your limited resources) t 1 t 2 t 3 t n F 1 (a 1 ) F 2 (a 2 ) F n (a n ) Optimization using Lagrange multipliers Target under a constraint A F ’ = derivation of the accelerator Function t i F i ’ (a i ) = t j F j ’ (a j ) e.g. efficient resource division (e.g. SRAM)* • SMT** • Resources needs are known ahead of time … *T. Zidenberg, Isaac Keslassy, U. Weiser, “ Optimal Resource Allocation with MultiAmdahl ” IEEE MICRO Journal August 2013 ** Technion EE, Advanced Microarchitecture course ’ s Exam (winter 2019) ** *G. Shomron, T. Horowitz, U. Weiser, “ SMT-SA: Simultaneous Multithreading in Systolic Arrays ” IEEE Computer Architecture Letters (CAL) Journal July 2019 14

  15. Conclusions • Opportunities: Map application to HW • • Reduce energy per operation? • Reduce # of operations • Reduce data movement and memory access • Efficient usage of HW We ’ re gonna have fun • • Open field, lots of ideas, many researchers Opportunities • New passionate energy in the community • Back to the “ big impact ” era … • 15

  16. Thank You 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend