the future is not w hat it used to be
play

The Future is not w hat it used to be... Erik Hagersten Then... - PowerPoint PPT Presentation

The Future is not w hat it used to be... Erik Hagersten Then... ENI AC 1 9 4 6 ( 5 kHz) 1 8 0 0 0 radiorr sladdprogram m erad 5 KHz AVDARK ENIAC 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten|


  1. The Future is not w hat it used to be... Erik Hagersten

  2. Then... ENI AC 1 9 4 6 ( ”5 kHz”) 1 8 0 0 0 radiorör sladdprogram m erad ”5 KHz” AVDARK ENIAC 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  3. Then ( in Sw eden)  BARK (~1950)  8 000 relays,  80 km cables  BESK (~1953)  2 400 vac. tubes  ”20 kHz” (world record) AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  4. “Recently” APZ 2 1 2 , 1 9 8 3 Ericsson’s Supercom puter ( “5 MHz”) AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  5. APZ 2 1 2 m arketing brochure quotes:  ”Very compact”  6 times the performance  1/6:th the size  1/5 the power consumption  ”A breakthrough in computer science”  ”Why more CPU power?”  ”All the power needed for future development”  ”…800,000 BHCA, should that ever be needed”  ”SPC computer science at its most elegance”  ”Using 64 kbit memory chips”  ”1500W power consumption AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  6. 6 5 years of “im provem ents”  Speed  Size  Price  Price/performance  Reliability  Predictability  Energy  Safety  Usability…. AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  7. ”Moore’s Law ” Pop: Double perform ance every 1 8 -2 4 th m onth Perform ance [ log] Multicore 1000 Single-core 100 10 1 Year 2006 AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  8. Ray Kurzw eil pictures w w w .Kurzw eilAI .net/ pps/ W orldHealthCongress/ AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  9. Ray Kurzw eil pictures w w w .Kurzw eilAI .net/ pps/ W orldHealthCongress/ AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  10. Ray Kurzw eil pictures w w w .Kurzw eilAI .net/ pps/ W orldHealthCongress/ AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  11. Exponentiell utveckling: Doublerings/ halverings-tider ( according to Kurzw eil) Dynam ic RAM Mem ory ( bits per dollar) 1 .5 years  Average Transistor Price 1 .6 years  Microprocessor Cost per Transistor Cycle 1 .1 years  Total Bits Shipped 1 .1 years  Processor Perform ance in MI PS 1 .8 years  Transistors in I ntel Microprocessors 2 .0 years  Log scale 1000 100 10 1 time AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  12. Ray Kurzw eil pictures w w w .Kurzw eilAI .net/ pps/ W orldHealthCongress/ AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  13. Linear scale 1 9 4 0  2 0 1 7 ( 2 x perform ance every 1 8 th m onth) Doubling every 18th month since 1940 4,E+15 3,E+15 Performance 3,E+15 2,E+15 2,E+15 1,E+15 5,E+14 0,E+00 40 50 60 70 80 90 0 10 Year AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  14. Exponentiell utveckling Exam ple: Doubling every 2 nd year How long does it it take for 1 0 0 0 x im provem ent? Exam ple: Doubling every 1 8 th m onth How long does it it take for 1 0 0 0 x im provem ent? Log scale 1000 100 10 1 time ? Linear scale AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  15. Looking Forw ard Three rules of common wisdom:  Do not bet against exponential trends  Do not bet against exponential trends  Do not bet against exponential trends But, is it possible to continue ”Moore’s Law”? Are there show-stoppers? - Can we utilize an exponential growth of - #cores? AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  16. Not everything scales as fast! Example: 470.LBM "Lattice Boltzmann Method" to simulate incompressible fluids in 3D 3,5 3 2,5 Throughput 2 1,5 1.0 1 0,5 0 1 2 3 4 Number of Cores Used Throughput (as defined by SPEC): Amount of work performed per time unit when several instances of the application is executed simultaneously. Our TP study: compare TP improvement when you go from 1 core to 4 cores AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  17. Nerd Curve: 4 7 0 .LBM Miss rate (excluding HW prefetch effects) Utilization, i.e., fraction cache data used (scale to the right) cache Possible miss rate if utilization problem was fixed miss rate 5 ,0 % 3 ,5 % cache size  Less amount of work Running Running per memory byte moved four threads one thread @ four threads AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  18. Rem em ber: I t is getting w orse! Computation vs Bandwidth #Cores ~ #Transistors CPU CPU 6 # T * T _ f r e q / # P * P _ f r e q 5 CPU CPU 4 #Pins 3 2 DRAM 1 0 2007 2008 2009 2010 2011 2012 2013 2014 2015 Y e a r Source: I nternatronal Technology Roadm ap for Sem iconductors ( I TRS) From Karlsson and Hagersten. Conserving Memory Bandwidth in Chip Multiprocessors with Runahead Execution . IPDPS March 2007. [graph updated with more recent data] HPCwire Feb 2011 [cites Linley Gwennap and Justin Rattner] W ithout Silicon Photonics, Moore's Law W on't Matter HPCwire Feb 2011 Grow ing Data Deluge Prom pts Processor Redesign AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  19. Case study: Lim ited by bandw idth AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  20. Nerd Curve ( again) Miss rate (excluding HW prefetch effects) Utilization, i.e., fraction cache data used (scale to the right) cache Possible miss rate if utilization problem was fixed miss rate orig application 5 ,0 % 2 ,5 % optimized application cache size  Twice the amount of work Running per memory byte moved four threads AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  21.  Better Mem ory Usage! Example: 470.LBM Modified to promote better cache utilization 3,5 3 2,5 Througput 2 1,5 1 0,5 0 1 2 3 4 # Cores Used Original code AVDARK 21 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  22. Example 2: A Scalable Parallel Application Performance 4 3 2 1 0 1 2 3 4 # Cores App: Cigar Looks like a perfect scalable application! Are we done? AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  23. Example 2: The Same Application Optimized Performance 30 7.3x Original 25 Optimized 20 15 10 5 0 1 2 3 4 #Cores App: Cigar Looks like a perfect scalable application! Are we done?  Duplicate one data structure AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  24. I m plem entation Trends

  25. Predicting the future is hard Predicting: “Chip Multiprocessor” aka Multicores [ from PARA Bergen 2 0 0 0 ] Mem Chip Multiprocessor (CMP): Simple fast CPU External Mem -- many open I/F I/F questions L2$ $1 $1 $1 $1 CPU CPU CPU CPU treads t AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  26. Multi-CMPs [ from PARA Bergen 2 0 0 0 ] Explicit parallelism: Mem # chips x # threads/chip Mem • Global shared memory • Global/local comm cost >10 Mem • Gotta’ explore small caches c chips Interconnect • Gotta’ explore locality! Mem • OS scalability ? Mem • Application scalability ? Mem Mem Mem AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  27. W hy Multicores Now ? -- Hur Mår ”Moore’s Lag”? -- Multi core Perf [log] Single core time ~2007 Not enough ILP/MLP to get payoff from 1. using more transistors Signal propagation delay » transistor delay 2. Power consumption P dyn ~ C • f • V 2 3. AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  28. Darling, I shrunk the com puter Sequential execution ( ≈ one program) Mainframes Super Minis: Microprocessor: Mem Paradigm Shift Need TLP to Mem Chip Multiprocessor (CMP): m ake one A multiprocessor on a chip! chip run fast AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

  29. HPC in the Rear Mirror... * Promise of performance MC + Accelerators * Forced by † ???? technology MC Clusters * COTS cost † ???? convergence Beowulf x86 Linux Clusters * UNIX † COTS perf Commercial management Killer Micro SMPs computing † High cost, * Scalability Bad scaling Naive view Nifty Parallel † Hard to use Vector No standards † Not general Expensive ???? 2000 2010 1990 1980 AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend