jade heterogeneous multiprocessor
play

JADE Heterogeneous Multiprocessor Design & Simulation - PowerPoint PPT Presentation

JADE Heterogeneous Multiprocessor Design & Simulation Environment Jiang Xu Acknowledgement Intel Labs Bin Li, Ravi Iyer, Ramesh Illikkal HP Labs Qiong Cai Current PhD students Rafael Kioji Vivas Maeda,


  1. JADE Heterogeneous Multiprocessor Design & Simulation Environment Jiang Xu

  2. Acknowledgement  Intel Labs  Bin Li, Ravi Iyer, Ramesh Illikkal  HP Labs  Qiong Cai  Current PhD students  Rafael Kioji Vivas Maeda, Peng Yang, Zhe Wang, Haoran Li, Zhehui Wang, Zhongyuan Tian, Zhifei Wang, Duong Huu Kinh Luan, Xuanqi Chen  Past members  Xiaowen Wu, Weichen Liu, Xuan Wang, Yaoyao Ye 2016-06-09 Jiang Xu (HKUST) 2

  3. PERFECT Computing Systems  Design targets Performance Energy efficiency Reliability Functionality Extensibility Cost Testability  More cores and memory on a chip and in a system  Heterogeneous 2016-06-09 Jiang Xu (HKUST) 3

  4. Huge Design Space to Explore  Application  Interconnect IoT /IoE, mobile, data center, HPC, mainframe … Ad-hoc, bus, NoC , hybrid …   Wireless communication, multimedia processing, machine Regular vs. irregular topology   learning, database … Protocol: routing, flow control, congestion control …  Switch/router architecture  Processor  Electrical, optical, RF …  CPU, GPU, FPGA, DSP, ASIP, ASIC …  Homogenous vs. heterogeneous multiprocessor  Support  FinFET, FD- SOI, GAA, CNT FET …  Power delivery and management  Clock distribution and management  Memory and storage  Thermal, aging, noise …  Hierarchy  Cache coherence  Peripherals  DRAM, SRAM, flash, STT- RAM …  Network interface, user interface, management …  Mesh MPEG RISC RISC CPU SRAM Core Core Core Core FPGA MPEG arbiter 1 ring processor bus CPU DSP DSP Core Core Core Core memory SRAM controller memory USB bridge controller bridge USB CPU Core Core Core Core arbiter 2 arbiter peripheral bus CPU GPU bus Core Core Core Core CPU LCD power GPIO LCD power GPIO controller manager controller manager 2016-06-09 Jiang Xu (HKUST) 4

  5. Simulation-based Architecture Exploration  Benchmark applications with sample Benchmark Applications input data sets Programs Sample inputs  System software  Cycle- accurate “full - system” architecture Compilation simulator software System Instructions  Speed-up techniques  Simplify interconnect, memory, processor, OS, etc. Operating system  Sampling application executions Device drivers  Sampling inputs Architecture simulator  Break causality to better parallelize simulations Architecture under evaluation  Hybrid the above techniques 2016-06-09 Jiang Xu (HKUST) 5

  6. The Good, the Bad and the Ugly  Good for detailed/late-stage design Benchmark Applications  Tweaking, testing, debugging … Programs Sample inputs  Bad for early design space exploration Compilation  Too slow to provide essential system statistics such as average and worst-case performance, software System energy efficiency, cost … Instructions  Ugly for heterogeneous systems Operating system  Compilation for heterogeneous ISAs, hardware Device drivers accelerator, FPGA … Architecture simulator Architecture under evaluation  OS support of new large-scale heterogeneous systems without drivers 2016-06-09 Jiang Xu (HKUST) 6

  7. Joint Application/Architecture Design Exploration  Application models for heterogeneous Applications Sample Algorithms Programs multiprocessor system explorations inputs  COSMIC Algorithm Application analysis partition COSMIC  Heterogeneous multiprocessor system Computation, communication, and memory analysis and profiling design and simulation platform Application TCG models Statistical application Recorded application  JADE models models Mapping, routing, scheduling Mapping, routing, scheduling Traffic routing plan Memory space mapping algorithms JADE Task mapping & scheduling Architecture under evaluation 2016-06-09 Jiang Xu (HKUST) 7

  8. JADE Heterogeneous Multiprocessor Simulation Environment  JADE (Joint Application/Architecture Hardware Architecture Network Architecture Design Exploration) Processor Memory Coherence Architecture Hierarchy Protocol Optical Electrical  Heterogeneous system designs  Early design space exploration COSMIC Architecture Template and Energy Library Benchmark Optical and Electrical Memory and Cache  Systematic system evaluation Processor Library Recorded Network Library Coherence Library Application Model Memory  Highlights JADE Statistical Application Model  Statistical, recorded and synthetic application Network Synthetic Application models Model Processor Peripherals  Network-on-chip and off-chip networks Mapping, Routing, Scheduling MRS Task Mapping and Communication Memory Space  Optical and electrical interconnects Algorithms Scheduling Traffic Routing Plan Mapping  Memory subsystem  Built-in power analysis Memory System Performance Energy Access Trace Behavior Analysis Analysis Output 2016-06-09 Jiang Xu (HKUST) 8

  9. COSMIC Heterogeneous Multiprocessor Benchmark Applica cation tion Descript iption ion Machine Learning - FMP Financial market prediction using machine learning Machine Learning - ALIP Machine learning based image indexing Molecular Dynamics Simulating molecular dynamics when molecules hit surfaces of solid atoms Ray Tracing 3D scenes rendering Ultrasound Medical diagnostics using 2D/3D ultrasound imaging Fast Fourier Transform Fast Fourier Transform with complex number inputs LDPC Encoder Low-density parity-check code encoder TURBO Decoder Turbo code decoder Reed-Solomon Reed-Solomon code encoder and decoder  Collaborating with application experts  More applications are under development 2016-06-09 Jiang Xu (HKUST) 9

  10. Exploration Cases  I 2 CON inter/intra-chip optical network ONI ONI ONI ONI  SUOR optical NoC ONI ONI ONI ONI  Electrical mesh-based NoC Controller Memory controller Memory controller ONI ONI ONI ONI  Memory hierarchy  Private L1 caches ONI ONI ONI ONI  Shared L2 cache – 16 banks Cluster of Electrical Optical Network Core Waveguide Cores  16 memory controllers Wire Interface (ONI) Memory controller  Processor core  ARM-v7a Memory controller  7nm, 1GHz, 0.6V Memory controller Memory controller 2016-06-09 Jiang Xu (HKUST) 10

  11. Performance and Scalability 2016-06-09 Jiang Xu (HKUST) 11

  12. Energy Efficiency and Scalability 2016-06-09 Jiang Xu (HKUST) 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend