The Future is not w hat it used to be... Erik Hagersten Then... - PowerPoint PPT Presentation

The Future is not w hat it used to be... Erik Hagersten

Then... ENI AC 1 9 4 6 ( ”5 kHz”) 1 8 0 0 0 radiorör sladdprogram m erad ”5 KHz” AVDARK ENIAC 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Then ( in Sw eden)  BARK (~1950)  8 000 relays,  80 km cables  BESK (~1953)  2 400 vac. tubes  ”20 kHz” (world record) AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

“Recently” APZ 2 1 2 , 1 9 8 3 Ericsson’s Supercom puter ( “5 MHz”) AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

APZ 2 1 2 m arketing brochure quotes:  ”Very compact”  6 times the performance  1/6:th the size  1/5 the power consumption  ”A breakthrough in computer science”  ”Why more CPU power?”  ”All the power needed for future development”  ”…800,000 BHCA, should that ever be needed”  ”SPC computer science at its most elegance”  ”Using 64 kbit memory chips”  ”1500W power consumption AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

6 5 years of “im provem ents”  Speed  Size  Price  Price/performance  Reliability  Predictability  Energy  Safety  Usability…. AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

”Moore’s Law ” Pop: Double perform ance every 1 8 -2 4 th m onth Perform ance [ log] Multicore 1000 Single-core 100 10 1 Year 2006 AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Ray Kurzw eil pictures w w w .Kurzw eilAI .net/ pps/ W orldHealthCongress/ AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Exponentiell utveckling: Doublerings/ halverings-tider ( according to Kurzw eil) Dynam ic RAM Mem ory ( bits per dollar) 1 .5 years  Average Transistor Price 1 .6 years  Microprocessor Cost per Transistor Cycle 1 .1 years  Total Bits Shipped 1 .1 years  Processor Perform ance in MI PS 1 .8 years  Transistors in I ntel Microprocessors 2 .0 years  Log scale 1000 100 10 1 time AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Ray Kurzw eil pictures w w w .Kurzw eilAI .net/ pps/ W orldHealthCongress/ AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Linear scale 1 9 4 0  2 0 1 7 ( 2 x perform ance every 1 8 th m onth) Doubling every 18th month since 1940 4,E+15 3,E+15 Performance 3,E+15 2,E+15 2,E+15 1,E+15 5,E+14 0,E+00 40 50 60 70 80 90 0 10 Year AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Exponentiell utveckling Exam ple: Doubling every 2 nd year How long does it it take for 1 0 0 0 x im provem ent? Exam ple: Doubling every 1 8 th m onth How long does it it take for 1 0 0 0 x im provem ent? Log scale 1000 100 10 1 time ? Linear scale AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Looking Forw ard Three rules of common wisdom:  Do not bet against exponential trends  Do not bet against exponential trends  Do not bet against exponential trends But, is it possible to continue ”Moore’s Law”? Are there show-stoppers? - Can we utilize an exponential growth of - #cores? AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Not everything scales as fast! Example: 470.LBM "Lattice Boltzmann Method" to simulate incompressible fluids in 3D 3,5 3 2,5 Throughput 2 1,5 1.0 1 0,5 0 1 2 3 4 Number of Cores Used Throughput (as defined by SPEC): Amount of work performed per time unit when several instances of the application is executed simultaneously. Our TP study: compare TP improvement when you go from 1 core to 4 cores AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Nerd Curve: 4 7 0 .LBM Miss rate (excluding HW prefetch effects) Utilization, i.e., fraction cache data used (scale to the right) cache Possible miss rate if utilization problem was fixed miss rate 5 ,0 % 3 ,5 % cache size  Less amount of work Running Running per memory byte moved four threads one thread @ four threads AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Rem em ber: I t is getting w orse! Computation vs Bandwidth #Cores ~ #Transistors CPU CPU 6 # T * T _ f r e q / # P * P _ f r e q 5 CPU CPU 4 #Pins 3 2 DRAM 1 0 2007 2008 2009 2010 2011 2012 2013 2014 2015 Y e a r Source: I nternatronal Technology Roadm ap for Sem iconductors ( I TRS) From Karlsson and Hagersten. Conserving Memory Bandwidth in Chip Multiprocessors with Runahead Execution . IPDPS March 2007. [graph updated with more recent data] HPCwire Feb 2011 [cites Linley Gwennap and Justin Rattner] W ithout Silicon Photonics, Moore's Law W on't Matter HPCwire Feb 2011 Grow ing Data Deluge Prom pts Processor Redesign AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Case study: Lim ited by bandw idth AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Nerd Curve ( again) Miss rate (excluding HW prefetch effects) Utilization, i.e., fraction cache data used (scale to the right) cache Possible miss rate if utilization problem was fixed miss rate orig application 5 ,0 % 2 ,5 % optimized application cache size  Twice the amount of work Running per memory byte moved four threads AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

 Better Mem ory Usage! Example: 470.LBM Modified to promote better cache utilization 3,5 3 2,5 Througput 2 1,5 1 0,5 0 1 2 3 4 # Cores Used Original code AVDARK 21 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Example 2: A Scalable Parallel Application Performance 4 3 2 1 0 1 2 3 4 # Cores App: Cigar Looks like a perfect scalable application! Are we done? AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Example 2: The Same Application Optimized Performance 30 7.3x Original 25 Optimized 20 15 10 5 0 1 2 3 4 #Cores App: Cigar Looks like a perfect scalable application! Are we done?  Duplicate one data structure AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

I m plem entation Trends

Predicting the future is hard Predicting: “Chip Multiprocessor” aka Multicores [ from PARA Bergen 2 0 0 0 ] Mem Chip Multiprocessor (CMP): Simple fast CPU External Mem -- many open I/F I/F questions L2$ $1 $1 $1 $1 CPU CPU CPU CPU treads t AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Multi-CMPs [ from PARA Bergen 2 0 0 0 ] Explicit parallelism: Mem # chips x # threads/chip Mem • Global shared memory • Global/local comm cost >10 Mem • Gotta’ explore small caches c chips Interconnect • Gotta’ explore locality! Mem • OS scalability ? Mem • Application scalability ? Mem Mem Mem AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

W hy Multicores Now ? -- Hur Mår ”Moore’s Lag”? -- Multi core Perf [log] Single core time ~2007 Not enough ILP/MLP to get payoff from 1. using more transistors Signal propagation delay » transistor delay 2. Power consumption P dyn ~ C • f • V 2 3. AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

Darling, I shrunk the com puter Sequential execution ( ≈ one program) Mainframes Super Minis: Microprocessor: Mem Paradigm Shift Need TLP to Mem Chip Multiprocessor (CMP): m ake one A multiprocessor on a chip! chip run fast AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

HPC in the Rear Mirror... * Promise of performance MC + Accelerators * Forced by † ???? technology MC Clusters * COTS cost † ???? convergence Beowulf x86 Linux Clusters * UNIX † COTS perf Commercial management Killer Micro SMPs computing † High cost, * Scalability Bad scaling Naive view Nifty Parallel † Hard to use Vector No standards † Not general Expensive ???? 2000 2010 1990 1980 AVDARK 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten| user.it.uu.se/ ~ eh

The Future is not w hat it used to be... Erik Hagersten Then... - PowerPoint PPT Presentation

The Future is not w hat it used to be... Erik Hagersten Then... ENI AC 1 9 4 6 ( 5 kHz) 1 8 0 0 0 radiorr sladdprogram m erad 5 KHz AVDARK ENIAC 2 0 1 2 Dept of I nform ation Technology| w w w .it.uu.se Erik Hagersten|

FREE FREE FREE FREE RIDE RIDE RIDE RIDE W HAT HAT IS IS F REE REE RIDE RIDE ? HAT HAT IS

Hat: Windows and WIMP Neil Mitchell Progress Updates I have: Ported the Hat tools to Windows

Agenda 9:00 10:30 AM HAT 1 (Project Operations) 10:45 12:00 PM HAT 5

SLACK AND HIPCHAT ANALYSIS A S URVEY OF G ROUP C OLLABORATION T OOLS AGENDA W HAT IS S LACK ?

WELCOME THANK YOU ! WHY NO HAT? WHITE HAT BLACK HAT WHITE or BLACK ? OUR MENU

Making Your Own Open Source Raspberry Pi HAT Leon Anavi Konsulko Group leon.anavi@konsulko.com

Introduction to Red Hat ALBERT WONG Solution Architect, Red Hat # 1 OPEN SOURCE LEADER 90

W HAT S R EQUIRED I N T HIS L EGISLATIVE S ESSION ? W HAT H APPENS A FTER T HIS L

PERFORMANCE OPTIMIZATION IN RED PERFORMANCE OPTIMIZATION IN RED HAT OPENSTACK PLATFORM HAT

Multi-State I nvestigations: W hat They Are and W hat To Do W hen Your Com pany I s the Subject

R.L. Harris Project Relicensing Harris Action Team (HAT) Meetings FERC No. 2628 September 20,

Red Hat Ceph Storage Free Test Drive Environment Introduction Karan Singh Sr. Storage Architect

How nCipher HSMs enhance security for Red Hat OpenShift Platform www.ncipher.com Red Hat and

Hat Visual Neil Mitchell My Perspective A casual user When I fire up Hat, Im already

NOT FOR REPRODUCTION NOT FOR REPRODUCTION NOT FOR REPRODUCTION NOT FOR REPRODUCTION NOT FOR

FUTURE PULL: Future Pull Creating Change From the THE FARMHOUSE IN MY FUTURE Future Back Bill

Chapter 15 Lists Chapter Scope Types of list collections Using lists to solve problems

Relax and Recover Relax and Recover (ReaR) The Ultimate Disaster Recovery Framework

RTP Payload format for ATRAC-X Matthew Romaine Mitsuyuki Hatanaka Jun Matsumoto (Sony

Name convention for FTs, proximity racks and APAs ProtoDUNE-SP ITI Meeting March 16 th , 2017

ZOOM AGENDA: Morning Meeting Review Weekly Assignments Math Mini Lesson Question

Computing Infrastructure for PP (and PPAN) Science Pete Clarke PPAP Town Meeting 26/27 th July

Fall Program Plan Recommendations as of June 18, 2020 San Mateo County Pandemic Recovery

Reconstruction 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Structure Motion

Sambuz

Useful Links

Newsletter

Mail Us