Preparing for a Post Moores Law World Todd Austin University of - PowerPoint PPT Presentation

Preparing for a Post Moore’s Law World Todd Austin University of Michigan

Perspectives on Scaling • C-FAR : Center for Future Architectures Research • Focused on scaling in 2020-2030 silicon • Performance, power and cost • 27 faculty at 14 universities, 92 students • Why is C-FAR’s mission important? All of the work presented in this talk • The promise… tomorrow’s applications need powerful systems is that of C-FAR faculty. • Why is C-FAR’s mission challenging? • The threats… slowing innovation and degrading silicon Many Idle Cores Computer Vision End of Dennard Scaling Machine Learning Big Data Analytics Silicon Defects 2

Moore’s Law Performance Gap Today, gap is cresting 10x Lack of perceived value Dark silicon Diminished ILP 3

Is Density Still Scaling? 1000 180 130 14nm slips 90 Technology Node (nm) 100 by 2 quarters 65 10nm slips 45 32 by 5-6 quarters 7nm by 22 end 2020? 14 10 10 7 1 Street Dates for Intel’s Lead Generation Products Courtesy David Brooks @ Harvard 4

What Does This All Mean to Architects? Today, value = scalability (performance, power, cost). But, the technology scaling component has left us. 5

Remedy #1: Chip Multiprocessors 6

CMP Performance Scaling for the Highly Parallel PARSEC Benchmarks From “Dark Silicon and the End of Multicore Scaling,” by Esmaeilzadeh et al . 7

What Does the Press Think? 8

We Investigate: Who’s to Blame? ? Programmers 9

Largest NA Bitcoin Miner • GPGPU-based system • Fills 2000 sq.ft. warehouse • Computes 1 petahash/s • Reportedly generates $8M in Bitcoins per month • Unfortunately soon to be obsolete as Bitcoin difficulty continues to scale 10

We Investigate: Who’s to Blame? Educators ? Programmers 11

CS Education is Booming • CS enrollment on a fast-rising trajectory for a decade • Parallel programming at UM UM EECS Enrollment CS EECS 381, Object-Oriented and Advanced Programming • EECS 482, Operating Systems • EECS 570, Parallel Computer Architecture • EECS 587, Parallel Computing • EECS 591, Distributed Systems • EECS 598, Ubiquitous Parallelism • EE • I have been teaching and developing CS in Ethiopia • Nearly 600 students in the CE CS program • 2 nd most popular major in the university 12

We Investigate: Who’s to Blame? Educators The Transistor ? Programmers 13

The Dark Silicon Dilemma Courtesy Michael Taylor @ UCSD 14

We Investigate: Who’s to Blame? Educators The Transistor ? Programmers Architects 17

The Tyranny of Amdahl’s Law Where we (P) need to be today! (10x) (S) (N) 18

We Investigate: Who’s to Blame? Educators The Transistor ? Programmers Architects What is the solution? 19

A Story about Jason and His Two Advisors 20

EVA: Embedded Vision Architecture Heterogeneous Application-specific Multicore Functional Units Customized Memory System EVA Functional Units Initial EVA design: Monopoly Compare, 90x greater efficiency for Dot Product Unit, computer vision algorithms Vector Max, Decision Tree Compare 21

Where We Need to Focus Parallelism Customization Heterogeneous parallel systems overcome dark silicon and the tyranny of Amdahl’s Law. 22

Why These Ideas Will Likely Fail, Unless We Make a Change… • The Good : Hetero-parallel systems can close the Moore’s Law gap • The Bad : Dennard scaling has stopped, Moore’s Law is slowing, leaving a growing gap • The Ugly : Hetero-parallel designs needed to close the gap will be too expensive to afford • We must make design much cheaper ! 23

What I Want You to Remember • Successfully bridging the Moore’s Law performance gap is less about “ How ” to do it and more about “ How Much ” does it cost! • My claim: if we can effect a 100x reduction in the cost to bring a design to market, innovation will flourish and scaling challenges will be overcome. 24

Design Costs Are Skyrocketing 140 $120M Mask Costs 120 S/W Development and Testing Cost to Market ($ million) $500K H/W Design and Verification 100 $88M 80 60 40 20 0 0.5u 0.35u 0.25u 0.18u 0.13u 90nm 65nm 45nm 28nm 20nm Silicon Technology Node Source: International Business Strategies 25

Outcome: “Nanodiversity” is Dwindling 12000 10000 Total ASIC Starts 8000 6000 4000 2000 0 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Year Source: Gartner Group 26

Inexpensive “Design” Promotes Innovation and Adaptation • Don’t Believe Me? Ask Mother Nature! • r/K selection theory is a biological mechanism that organisms use to better adapt to their environment • In unstable environments, r-selection predominates as the ability to reproduce quickly is crucial • In stable environments, K-selection predominates as the ability to compete successfully for limited resources is crucial 27

The Remedy: Scale Innovation • Ultimate goal: accelerate system architecture innovation and make it sufficiently inexpensive that anyone can do it anywhere • Approach #1: Expect more from architectural innovation • Approach #2: Reduce the cost to design custom hardware • Approach #3: Embrace open-source concepts • Approach #4: Widen the applicability of custom hardware • Approach #5: Reduce the cost of manufacturing custom H/W 28

1) Expect more from architectural innovation “Give me 15% “I need 1% speedup and I’ll speedup for 1% accept your paper” area” “Your idea needs to deliver 2x or more , or someone else should fund it” 29

HELIX-UP Unleashed Parallelization David Brooks @ Harvard • Traditional parallelizing Thread 0 Iteration 0 compilers must honor possible dependencies Thread 1 Data Iteration 1 Thread 2 Data • HELIX-UP manufactures Thread 3 Data parallelism by profiling which deps do not exist and which are not needed Nehalem 6 cores, 2 threads per core • Based on user supplied output distortion function • Big step for parallelization • 2x speedup over parallelizing compilers, 6x over serial, < 7% distortion 30

Association Rule Mining with the Automata Processor Kevin Skadron @ UVA • Micron’s Automata processor • Implements FSMs at memory • Massively parallel with accelerators • Mapped data-mining ARM rules to memory-based FSMs • ARM algorithms identify relationships between data elements • Implementations are often memory bottlenecked • Big-data sets had big speedups • 90x+ over single CPU performance • 2-9x+ speedups over CMPs and GPUs • Joint effort with UVA and Micron 31

2) Reduce the cost to design custom hardware Shared Memory/Interconnect Models Unmodified C-Code David Brooks Accelerator Private L1/ @ Harvard Specific Accelerator Design Scratchpad Datapath Parameters (e.g., # FU, mem. BW) • Better tools and infrastructure • Scalable accelerator synthesis and compilation, generate code and H/W for highly reusable accelerators • Composable design space exploration, enables efficient exploration of highly complex design spaces • Well put-together benchmark suites to drive development efforts 32

CortexSuite: A Synthetic Brain Benchmark Suite Michael Taylor @ UCSD Disparity Image Robot Map Segmentation Localization Texture Feature Support Synthesis Tracking Vector Image Machines SIFT Stitch 33

3) Embrace Open-Source Concepts • Thought experiment: let’s design the next great smartphone Red = non-free IP, Green = free IP 34

3) Embrace Open-Source Concepts As a community, we need to consider: How much of our basic technology should be free ? Red = non-free IP, Green = free IP 35

Open-Source H/W is Growing 36

4) Widen the Applicability of Customized H/W Krste Asanovic @ UC-Berkeley Machine Multimedia Computer Applications Learning Analysis Vision … Dense Sparse Graph Computational Patterns Specializers with custom implementations and autotuning ESP Graph Glue Dense Sparse Code Code Code Code Code ESP ILP Dense Sparse Graph Core Engine Engine Engine Engine • ESP: Ensembles of Specialized Processors • Ensembles are algorithmic-specific processors optimized for code “patterns” • Approach uses composable customization to deliver speed and efficiency that is widely applicable to general purpose programs • Grand challenges remain: what are the components and how are they connected ? 37

5) Reduce the cost of manufacturing customized H/W Martha Kim @ Columbia • Brick-and-mortar silicon explores assembly-time • Another thought experiment: what if building a house were like fabricating a chip? customization , i.e., MCMs + 3D + FPGA interconnect Brick-and-mortar silicon design flow: 1) Assemble brick layer H/W brick 2) Connect with mortar layer 3) Package assembly 4) Deploy software • Diversity via brick ecosystem & interconnect flexibility • Brick design costs amortized across all designs • Robust interconnect and custom bricks rival ASIC speeds 38

Preparing for a Post Moores Law World Todd Austin University of - PowerPoint PPT Presentation

Preparing for a Post Moores Law World Todd Austin University of Michigan Perspectives on Scaling C-FAR : Center for Future Architectures Research Focused on scaling in 2020-2030 silicon Performance, power and cost 27 faculty

Parallel Computing 2020: Preparing for the Post-Moore Era Marc Snir THE (CMOS) WORLD IS ENDING

Institute of Law Institute of Law Institute of Law Institute of Law Law Made Simple

Statement of Ohms Law Circuit diagram of Ohms Law Formula of Ohms Law Ohms law in

How to Make a Formal Presentation Contents Preparing Content ( Written ) Theory

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

Rail Resource Management Rail Resource Management (RRM) (RRM) Post Post Post Post

DATA BINDING Client-side View of Data Client Server MY BLOG This is my first post. ADD POST

Studying Law at Salford Presented by: Ian King (Law UG Programme Leader) and Emma Clarke (Final

Moore County CTP & STI Prioritization Presentation to Moore County Managers Group July 19,

Macquarie Capital Nicholas Moore Nicholas Moore Group Head Group Head Macquarie

PBIS @ Moore 2017-2018 School Year LAST NAMES OBJECTIVES Know the meaning of Tier 1 at Moore

Local Planning Panel 24 October 2018 Moore Park - Anzac Parade, Dacey Avenue & Moore Park

Mixed Moore Graphs. GT2015, Nyborg Leif K. Jrgensen Aalborg University Denmark Moore Graphs

Preparing the CMC section of IMPD for biological/ biotechnology derived substances Dr. Una Moore

WORLD WORLD WORLD WORLD WORLD WORLD En End of of the Br Bron onze Age ME MEETI NG 8

Preparing for Virtual Meitheal Preparing for Virtual Meitheal Video 1 of 4 What is Meitheal?

ADOT Environmental Planning Group: Biology Project Coordination: Navajo Nation Joshua Fife

Caring'Safely Module'722 Organizational'Health Trauma'Informed'Work'and'ACEs 1

reception transmission propagation 2 MAXP2009 3 MAXP2009 4 MAXP2009 5

Creative Approaches to Remote Learning: Critical Questions of Environmental Health and Disease

Meet your instructors and course aides Blake Everett Johnson, Ph.D. T eaching Assistant

Open Data Driving Scholarly Communications in 2020 Philip E. Bourne UCSD pbourne@ucsd.edu

PR PREVENT NTION ON OF OF EXPOS POSURE OF OF WOR ORKERS TO O BIOL OLOG OGICAL RISKS

CSCE 790 Computer Systems Security Malware Professor Qiang Zeng Spring 2020 Previous

Preparing for a Post Moores Law World Todd Austin University of - PowerPoint PPT Presentation

Preparing for a Post Moores Law World Todd Austin University of Michigan Perspectives on Scaling C-FAR : Center for Future Architectures Research Focused on scaling in 2020-2030 silicon Performance, power and cost 27 faculty

Parallel Computing 2020: Preparing for the Post-Moore Era Marc Snir THE (CMOS) WORLD IS ENDING

Institute of Law Institute of Law Institute of Law Institute of Law Law Made Simple

Statement of Ohms Law Circuit diagram of Ohms Law Formula of Ohms Law Ohms law in

How to Make a Formal Presentation Contents Preparing Content ( Written ) Theory

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

Rail Resource Management Rail Resource Management (RRM) (RRM) Post Post Post Post

DATA BINDING Client-side View of Data Client Server MY BLOG This is my first post. ADD POST

Studying Law at Salford Presented by: Ian King (Law UG Programme Leader) and Emma Clarke (Final

Moore County CTP &amp; STI Prioritization Presentation to Moore County Managers Group July 19,

Macquarie Capital Nicholas Moore Nicholas Moore Group Head Group Head Macquarie

PBIS @ Moore 2017-2018 School Year LAST NAMES OBJECTIVES Know the meaning of Tier 1 at Moore

Local Planning Panel 24 October 2018 Moore Park - Anzac Parade, Dacey Avenue &amp; Moore Park

Mixed Moore Graphs. GT2015, Nyborg Leif K. Jrgensen Aalborg University Denmark Moore Graphs

Preparing the CMC section of IMPD for biological/ biotechnology derived substances Dr. Una Moore

WORLD WORLD WORLD WORLD WORLD WORLD En End of of the Br Bron onze Age ME MEETI NG 8

Preparing for Virtual Meitheal Preparing for Virtual Meitheal Video 1 of 4 What is Meitheal?

ADOT Environmental Planning Group: Biology Project Coordination: Navajo Nation Joshua Fife

Caring'Safely Module'722 Organizational'Health Trauma'Informed'Work'and'ACEs 1

reception transmission propagation 2 MAXP2009 3 MAXP2009 4 MAXP2009 5

Creative Approaches to Remote Learning: Critical Questions of Environmental Health and Disease

Meet your instructors and course aides Blake Everett Johnson, Ph.D. T eaching Assistant

Open Data Driving Scholarly Communications in 2020 Philip E. Bourne UCSD pbourne@ucsd.edu

PR PREVENT NTION ON OF OF EXPOS POSURE OF OF WOR ORKERS TO O BIOL OLOG OGICAL RISKS

CSCE 790 Computer Systems Security Malware Professor Qiang Zeng Spring 2020 Previous

Moore County CTP & STI Prioritization Presentation to Moore County Managers Group July 19,

Local Planning Panel 24 October 2018 Moore Park - Anzac Parade, Dacey Avenue & Moore Park