memory driven architecture
play

Memory driven architecture: flipping the inequality computing vs. - PowerPoint PPT Presentation

Uri Weiser Professor of Engineering Technion Memory driven architecture: flipping the inequality computing vs. memory 1 The talk covers research done by: Prof. Y. Etsion, Dr. Z. Guz, Prof. I. Keidar, Prof. A. Kolodny, S. Kvatinsky, Prof I.


  1. Uri Weiser Professor of Engineering Technion Memory driven architecture: flipping the inequality computing vs. memory 1 The talk covers research done by: Prof. Y. Etsion, Dr. Z. Guz, Prof. I. Keidar, Prof. A. Kolodny, S. Kvatinsky, Prof I. Keslassy, T. Zidenberg, Prof. A. Mendelson, Y. Nacson, Prof E. Friedman, Prof. U. Weiser 1

  2. This conference ’ s message “ The large energy consumption associated with the ever increasing internet use and the lack of efficient renewable energy sources to support it ” *Energy problems in data-com systems Scent of Solutions? *Energy problems in computers: from systems to the chip level *Advanced solar energy harvesting

  3. The Trend Our Customers Expect 3 From:

  4. The Trend Our Customers Expect 4 From:

  5. Outline The trends The implications The opportunities Heterogeneous systems – some thoughts Memristor  Memory Intensive Architecture (MIA) Energy: Optimal resource allocation in a Heterogeneous system How to start to think about Memory Intensive Architecture 5

  6. The Trends 6

  7. Process Technology: Minimum Feature Size Feature Size (microns) 10 Intel SIA 1 180nm 130nm 90nm 0.1 65nm 45nm 32nm 22nm 22nm 14nm 0.01 ’ 08 ’ 14 ’ 68 ’ 71 ’ 76 ’ 80 ’ 84 ’ 88 ’ 92 ’ 96 ’ 00 ’ 04 Source: Intel, SIA Technology Roadmap 7 SIA: Semiconductor Industry Association

  8. Putting It All Together ! ! ! !! 8

  9. The Trend Where are we going? The power wall 9

  10. Microarchitecture VLSI Microarchitecture has been influenced by concepts that have been around for a long time We hit a power wall Solutions Top down – improve performance/power or Throughput/power  Heterogeneous Architecture Bottom up – new devices ? Memory resistive devices? 10

  11. Hetero vs. Memory Intensive Heterogeneous Architecture  For a while no major breakthrough in CPU technology But the main reason is the POWER wall and energy/task Accelerators to the rescue Memory Intensive Architecture  Either a huge amount of memory cells close to logic, or Logic cells close to lots of memory Does it imply Symmetric processing? 11

  12. Heterogeneous Systems Flying machines - are they all the same? 12

  13. Heterogeneous Computing: Application Specific Accelerators Performance/power Accelerators Apps range Continue performance trend using Heterogeneous computing to bypass current technological hurdles 13

  14. Heterogeneous Computing Performances/Power Accelerator General Purpose 14

  15. Heterogeneous Systems ’ Environment Environment with limited resources Need to optimize system ’ s targets within resource constrains Resources may be: - Power, energy, area, space, $ System's targets may be: - Performance, power, energy, area, space, $ 15

  16. Heterogeneous Computing Heterogeneous system design under resource constraint how to divide resources (e.g. area, power, energy) to achieve maximum system ’ s output (e.g. performance, throughput) Example: time t 1 t 2 t 3 t n t i = execution time of an application ’ s section (run on a reference computing system) 𝑏 4 𝒋=𝒐 𝑏 3 𝑩 = 𝒃 𝒋 𝑏 2 𝑏 𝑜 𝑏 1 𝒋=𝟐 Accelerator target (an example): Minimize execution time under Area constraint 16

  17. MultiAmdahl: t 2 t 1 t 3 t n F 2 (a 2 ) F 1 (a 1 ) F n (a n ) t 1* F 1 (a 1 )+ t 2* F 2 (a 2 ) + T = + t n* F n (a n ) A = a 1 + a 2 + a 3 + … + a n 𝑏 1 𝑏 3 𝑏 2 𝑏 𝑜 a 4 Target: Minimize T under a constraint A 17

  18. MultiAmdahl: t 2 t 1 t 3 t n F 1 (a 1 ) F 2 (a 2 ) F n (a n ) Optimization using Lagrange multipliers Minimize execution time (T) under an Area (a) constraint t j F ’ j (a j ) = t i F ’ i (a i ) F ’ = derivation of the accelerator function a i = Area of the i-th accelerator t i = Execution time on reference computer 18

  19. MultiAmdahl Framework Applying known techniques* to new environments Can be used during system ’ s definition and/or dynamically to tune system * Gossen ’ s second law (1854), Marginal utility, Marginal rate of substitution (Finance) 19

  20. Example: CPU vs. Accelerators Future GP CPU size vs. transistor budget growth Test case: 4 accelerators and GP (big) CPU Applications: evenly distributed benchmarks mix w/ 10% sequential code Heterogeneous Insight: In an increased-transistor-budget-environment, 20 General Purpose (big) CPU importance will grow

  21. Example: CPU vs. Accelerators GP CPU size vs. power budget Test case: 4 accelerators and GP (big) CPU Applications: evenly distributed benchmarks mix w/ 10% sequential code Heterogeneous Insight: In a decreased-power-budget-environment, 21 Accelerators importance will grow

  22. Environment Changes Is it time for a change in implementation? Throughput became an essential Microprocessor target Data footprint became bigger Multi-Core systems are everywhere  more performance = more memory usage  Memory pressure is increasing Significant CPU die power (>30%) is consumed by IO (access to out-of-die memory) 22

  23. Bottom up approach: New device - Memristor? 23

  24. What is a Memristor? Current [mA] 2-terminal resistive nonvolatile device R OFF R ON Device ’ s resistivity depends on past electrical current Device is constructed of 2 metal layers with Voltage [V] oxide in between (e.g. T i O 2 ) Can be implemented in Multi (physical) layer memory Jul 30, 2013 Panasonic Starts World's First Mass Production of ReRAM Mounted Microcomputers [1] ReRAM (Resistive Random Access Memory) A type of non-volatile memory which records "0" and "1" digital information by generating large resistance changes with a pulsed voltage applied to a thin-film metal oxide. 24 The simple structure of the metal oxide sandwiched by electrodes makes the manufacturing process easier and provides excellent low power-consumption and high-speed rewriting characteristics.

  25. Memristor • Theoretical idea by Chua in 1971 • Implemented today by Hewlett Packard SK Hynix, HRL Labs • Memory products by Pannasonic Array of 17 oxygen-depleted titanium dioxide 50nm memristors (HP Labs) 25

  26. Memristor Microarchitecture “ Vision ” Layers of memory cells above logic Does this new structure open the possibility for new Microarchitecture? 26

  27. Memristors to the Rescue? Huge amount of memory cells Very close to logic Non volatile No need for power to keep alive ~ transistor size Fast No leakage 27

  28. Sea of Memory Cells Impact - Conventional vs. Out of the box  Enhance Multithreading architecture (Graphics like)  Increase on-die prediction structures  Instruction queues  Back to LUT (look-up-tables) implementations  New caches (e.g. NAHALAL, MC vs. MT, Cache specific content)  Non-Register Architecture (memory-to-memory operations) ? ?  Continues Flow Multithreading (improved SoE MT)  ? Instruction reuse (memoization) ?  Computation at the memory level* 28 * Ref Dr. Avidan Akerib General manager NeoMagic

  29. Throughput and Bandwidth* Chip boundary Throughput/Bandwidth Memory Intensive Throughput engine Bandwidth Architectures to out-of Chip TP 4 devices  energy waste TP 3 TP 2 The trend TP 1 Bandwidth demons Bandwidth demons Bandwidth Traversing on a constant-throughput-line  ? ?  increase on-die-memory (e.g. cache, new ideas) 29 *Influenced by ISCA 1995 paper: Performance Evaluation of the PowerPC 620 Microarchitecture; (graph: frequency vs. performance/frequency)

  30. Switch on Event Multithreading Example- processor pipeline Fetch Thread A Pipeline stages Thread B Thread C Cache miss!!! Execute Write back 30

  31. Continuous Flow MT (CFMT) Example – processor ’ s pipeline SOE deficiencies I nstructions flush beyond the “ event instruction ”  waste of energy  performance degradation Can we use Memristor to reduce thread switch penalty (bubbles)?  Yes do not flush, store the thread-pipe-state in memristors (Multistage Pipeline Register) 31

  32. Continues Flow MT (CFMT) Example – processor ’ s pipeline Thread B Pipeline register Thread A Pipeline register Fetch R/W Pipeline stages R/W R/W R/W Multistate Pipeline R/W Register (MPR) Execute R/W Write back 32

  33. Continuous Flow MT (CFMT) Example – processor ’ s pipeline Thread A Fetch Thread B MPR MPR=Multistate Pipeline Register Thread C Pipeline stages MPR MPR MPR MPR Cache miss!!! Execute MPR Write back 33

  34. CFMT Initial Simulation (preliminary) (ARM like Microarch V7, lbm from Spec CPU 2006 IPC (Performance) CFMT (Mem and MCE) CFMT (mem only) SoE; no CFMT # of threads CFMT for multiple cycle events? not sure yet … 34

  35. Memory Intensive Architecture Looking Forward • Large on die memory may save energy and change the way we architect our computational machines  Reduction in Data-Transfer  Opportunity for dramatic improvement in Performance/Power or Throughput/Power  Performance improvement (@same power) => energy reduction  Reduction of static/leakage power  Energy saving in reactive systems (0 memory energy when no operation)  NEW!!! 35

  36. Summary Saving energy via optimal Heterogeneous system The introduction of on-die huge memory should alter the way we design computational machines for low energy consumption 36

  37. Thank You 37

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend