7 9 september 2011
play

7.-9. September 2011 Hamburg Ernst M. Mutke Technical Director - PowerPoint PPT Presentation

Energy-Efficient Data-Intensive Supercomputing T HE W ORLD S F IRST H YBRID -C ORE C OMPUTER . T HE W ORLD S F IRST H YBRID -C ORE C OMPUTER . EnA-HPC Conference 7.-9. September 2011 Hamburg Ernst M. Mutke Technical Director HMK


  1. Energy-Efficient Data-Intensive Supercomputing T HE W ORLD ’ S F IRST H YBRID -C ORE C OMPUTER . T HE W ORLD ’ S F IRST H YBRID -C ORE C OMPUTER . EnA-HPC Conference 7.-9. September 2011 Hamburg Ernst M. Mutke Technical Director HMK Supercomputing GmbH

  2. Agenda • A new era of supercomputing • The next computing frontier – Data-intensive Supercomputing • Convey Architecture Overview • Energy Savings Examples EnA-HPC - 7.-9. September 2011 – Hamburg Slide 2 Convey Proprietary

  3. A new era of supercomputing • HPC is changing/growing – From compute-intensive to data-intensive • A new class of problems – Extreme data volumes (Image: Lloyd et al/Royal Society) – Complex processing “Data intensive computing – Highly dynamic demands a fundamentally different set of principles than • Better Energy Efficiency mainstream computing .” and Peta-Scale — National Science Foundation Directorate for Computer and Computing Information Science and Engineering EnA-HPC - 7.-9. September 2011 – Hamburg Slide 3 Convey Proprietary

  4. Lessons from history The growth of numerically-intensive computing Numerically-intensive computing — Driven by the need to save money, Commoditization increase product quality, reduce time- (“Killer Micros”) to-market HPC Revenue Commercialization Integrated Vector Attached Array Processors Custom/ Coprocessor 1980 1990 2000 *”The Marketplace of High Performance Computing,” July 1999 Erich Strohmaier, Jack J. Dongarra, Hans W. Meuer, and Horst D. Simon EnA-HPC - 7.-9. September 2011 – Hamburg Slide 4 Convey Proprietary

  5. Numerically-intensive computing: Modeling real-world events • Used to save money, increase product quality, reduce time-to-market – Computer simulation of real-world events – Requires FLOP/s – New ISA (Vector) developed • Required restructuring of programs – New language extensions for vectorization – “Smart” compilers find opportunities to generate vector code • Ultimately supercomputers “replaced” by commodity processors – Led to application-specific instructions in x86 architecture (e.g. SSE) – Supercomputers today are just huge clusters of x86 ISA with commodity “vector” instructions EnA-HPC - 7.-9. September 2011 – Hamburg Slide 5 Convey Proprietary

  6. Today: It’s a data -driven world • Science – Data bases from astronomy, weather, climate, genomics, bioinformatics, natural languages, seismic modeling, … • Humanities – Scanned books, historic documents, … • Commerce – Corporate sales, stock market transactions, census, airline traffic, … • Entertainment – Internet images, Hollywood movies, MP3 files, … • Medicine – MRI & CT scans, patient records, … Adapted from cs.cmu.edu/~bryant EnA-HPC - 7.-9. September 2011 – Hamburg Slide 6 Convey Proprietary

  7. Why so much data? • We can produce it – Automation, Internet, Sensors, Instruments • We can keep it – Western Digital Caviar Blue 1TB - $59.95 • We can use it “… But data-intensive applications are – Cybersecurity quickly emerging as a significant new class – Medical Informatics of HPC workloads. For this class of applications, a new kind of supercomputer, – Data Enrichment and a different way to assess them, will be – Social Networks required .” — HPCwire, Nov 2010 – Symbolic Networks Adapted from cs.cmu.edu/~bryant EnA-HPC - 7.-9. September 2011 – Hamburg Slide 7 Convey Proprietary

  8. D ATA -I NT ENSIVE S UPER UTING NTENSIVE ERCOMP COMPUTIN

  9. The next computing frontier: Data-Intensive Computing • Wal-Mart CRM – 267 million items/day, sold at 6,000 stores – 4PB data warehouse – Mine data to manage supply chain, understand market trends, formulate pricing strategies • Massive Social Networks – Detecting implicit communities, influential persons for targeted advertising EnA-HPC - 7.-9. September 2011 – Hamburg Slide 9 Convey Proprietary

  10. Data-intensive Computing Driven by the need to capture, Commoditization manage, analyze, and understand data HPC Revenue Commercialization Customization 2010 2020 You are here EnA-HPC - 7.-9. September 2011 – Hamburg Slide 10 Convey Proprietary

  11. Data-intensive Computing • Growing from the need to reduce computation time • Conserve cost for energy, cooling, infrastructure, space, etc. • Make better business decisions, reduce time-to- market • Requires restructuring of programs & algorithms – New language extensions for MMT – “Smart” compilers find opportunities to generate parallel code • Ultimately will be “replaced” by commodity processors/systems – Early data-intensive technology will be woven into mainstream processors EnA-HPC - 7.-9. September 2011 – Hamburg Slide 11 Convey Proprietary

  12. Architectural Characteristics • Reconfigurable compute elements – Customizable data types – Application-specific logic – New [graph] ISA • Supercomputer-inspired memory subsystem – Latency-tolerant – Large (TB’s), highly -parallel memory – Reconfigurable architecture – Efficient random (cache-less) access to memory • Maintain x86 development Image Source: Giotet al., “A Protein Interaction Map of Drosophila melanogaster”, ecosystem Science 302 , 1722-1736, 2003. EnA-HPC - 7.-9. September 2011 – Hamburg Slide 12 Convey Proprietary

  13. Parallels Numerically-intensive Data-Intensive Computing Computing Commoditization: techniques and HPC Revenue technology are adopted by You were here You are here “mainstream” processor/system manufacturers 1980 1990 2000 2010 2020 EnA-HPC - 7.-9. September 2011 – Hamburg Slide 13 Convey Proprietary

  14. C ONVEY EY A RCHIT URE O VER IEW RCHITEC ECTUR ERVIEW

  15. Design philosophies/requirements • Heterogeneous computing is inevitable – And the simplest to program will win – Moore’s Law is still valid, i.e. more transistors • Competitive/science pressures demand a different approach – Must make better use of transistors – Support for large, randomly-accessible memory – Order-of-magnitude increases in performance/watt – Reduces OS instances, cabling, floor space, cooling requirements and power consumption • Convey balanced approach provides FPGA-based computing with supercomputing memory subsystems EnA-HPC - 7.-9. September 2011 – Hamburg Slide 15 Convey Proprietary

  16. HPC architectures need: balanced implementations Process ssing power Memory y size ze & bandwidt dth • Applica cati tion-sp speci cifi fic c • Highl hly parallel el inst structio ruction n set ets • Atomic c operati ations ns • Multi tiple ple techni niqu ques es for parallelism (SIMD, , et etc.) EnA-HPC - 7.-9. September 2011 – Hamburg Slide 16 Convey Proprietary

  17. CPU versus FPGA Comparison • A processor executes instructions An FPGA uses programmable logic • “C” Code of 4 -input logical operation FPGA Logic of 4-input logical operation uint32 Log4(uint32 F, uint32 A, uint32 B, uint32 C, uint32 D) { uint32 R = 0; for (int i = 0; i < 32; i += 1) { uint32 a = (A >> i) & 1; uint32 b = (B >> i) & 1; uint32 c = (C >> i) & 1; uint32 d = (D >> i) & 1; uint32 e = (a << 3) | (b << 2) | (c << 1) | d; R |= ((F >> e) & 1) << i; } return R; } Assembly Instructions for Log4 routine: 00401006 xor edx,edx 00401008 mov ecx,esi 0040100A shr edx,cl 0040100C and edx,1 0040100F lea edi,[edx+edx] • Four logic resources per bit of result • 32 result bits => 128 logic resources • A loop of 23 instructions are executed to solve “C” routine 32 times => 736 inst. • The FPGA logic would take 2 ns • 736 inst. at 3 GHz would take 245 ns An FPGA would consume 5.6x10 -15 15 A processor core would consume • • 6.1x10 -9 Joules (per operation) Joules (per operation) EnA-HPC - 7.-9. September 2011 – Hamburg Slide 17 Convey Proprietary

  18. Hybrid-core Computing Convey y Hybrid rid-Co Core e System ems High Performance of application- specific Application Performance/ hardware Power efficiency Heterogenous solutions • can be much more efficient • still hard to program Programmability and deployment ease of an x86 server Multicore solutions • don’t always scale well Low • parallel programming is hard Difficult Easy Ease of Deployment EnA-HPC - 7.-9. September 2011 – Hamburg Slide 18 Convey Proprietary

  19. HC-1 Hardware PCI I/O FPGA FPGA Intel Personalities Chipset FPGA FPGA 8 GB/s 80 GB/s Memory Scatter/Gather Memory Cache Coherent, Shared Virtual Memory EnA-HPC - 7.-9. September 2011 – Hamburg Slide 19 Convey Proprietary

  20. Convey hybrid-core architecture “Commodity” Intel Server Convey FPGA-based coprocessor EnA-HPC - 7.-9. September 2011 – Hamburg Slide 20 Convey Proprietary

  21. Supercomputer-inspired memory subsystem • Optimized for 64-bit accesses; 80 GB/sec peak • Automatically maintains coherency without impacting AE performance EnA-HPC - 7.-9. September 2011 – Hamburg Slide 21 Convey Proprietary

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend