 
              JADE Heterogeneous Multiprocessor Design & Simulation Environment Jiang Xu
Acknowledgement  Intel Labs  Bin Li, Ravi Iyer, Ramesh Illikkal  HP Labs  Qiong Cai  Current PhD students  Rafael Kioji Vivas Maeda, Peng Yang, Zhe Wang, Haoran Li, Zhehui Wang, Zhongyuan Tian, Zhifei Wang, Duong Huu Kinh Luan, Xuanqi Chen  Past members  Xiaowen Wu, Weichen Liu, Xuan Wang, Yaoyao Ye 2016-06-09 Jiang Xu (HKUST) 2
PERFECT Computing Systems  Design targets Performance Energy efficiency Reliability Functionality Extensibility Cost Testability  More cores and memory on a chip and in a system  Heterogeneous 2016-06-09 Jiang Xu (HKUST) 3
Huge Design Space to Explore  Application  Interconnect IoT /IoE, mobile, data center, HPC, mainframe … Ad-hoc, bus, NoC , hybrid …   Wireless communication, multimedia processing, machine Regular vs. irregular topology   learning, database … Protocol: routing, flow control, congestion control …  Switch/router architecture  Processor  Electrical, optical, RF …  CPU, GPU, FPGA, DSP, ASIP, ASIC …  Homogenous vs. heterogeneous multiprocessor  Support  FinFET, FD- SOI, GAA, CNT FET …  Power delivery and management  Clock distribution and management  Memory and storage  Thermal, aging, noise …  Hierarchy  Cache coherence  Peripherals  DRAM, SRAM, flash, STT- RAM …  Network interface, user interface, management …  Mesh MPEG RISC RISC CPU SRAM Core Core Core Core FPGA MPEG arbiter 1 ring processor bus CPU DSP DSP Core Core Core Core memory SRAM controller memory USB bridge controller bridge USB CPU Core Core Core Core arbiter 2 arbiter peripheral bus CPU GPU bus Core Core Core Core CPU LCD power GPIO LCD power GPIO controller manager controller manager 2016-06-09 Jiang Xu (HKUST) 4
Simulation-based Architecture Exploration  Benchmark applications with sample Benchmark Applications input data sets Programs Sample inputs  System software  Cycle- accurate “full - system” architecture Compilation simulator software System Instructions  Speed-up techniques  Simplify interconnect, memory, processor, OS, etc. Operating system  Sampling application executions Device drivers  Sampling inputs Architecture simulator  Break causality to better parallelize simulations Architecture under evaluation  Hybrid the above techniques 2016-06-09 Jiang Xu (HKUST) 5
The Good, the Bad and the Ugly  Good for detailed/late-stage design Benchmark Applications  Tweaking, testing, debugging … Programs Sample inputs  Bad for early design space exploration Compilation  Too slow to provide essential system statistics such as average and worst-case performance, software System energy efficiency, cost … Instructions  Ugly for heterogeneous systems Operating system  Compilation for heterogeneous ISAs, hardware Device drivers accelerator, FPGA … Architecture simulator Architecture under evaluation  OS support of new large-scale heterogeneous systems without drivers 2016-06-09 Jiang Xu (HKUST) 6
Joint Application/Architecture Design Exploration  Application models for heterogeneous Applications Sample Algorithms Programs multiprocessor system explorations inputs  COSMIC Algorithm Application analysis partition COSMIC  Heterogeneous multiprocessor system Computation, communication, and memory analysis and profiling design and simulation platform Application TCG models Statistical application Recorded application  JADE models models Mapping, routing, scheduling Mapping, routing, scheduling Traffic routing plan Memory space mapping algorithms JADE Task mapping & scheduling Architecture under evaluation 2016-06-09 Jiang Xu (HKUST) 7
JADE Heterogeneous Multiprocessor Simulation Environment  JADE (Joint Application/Architecture Hardware Architecture Network Architecture Design Exploration) Processor Memory Coherence Architecture Hierarchy Protocol Optical Electrical  Heterogeneous system designs  Early design space exploration COSMIC Architecture Template and Energy Library Benchmark Optical and Electrical Memory and Cache  Systematic system evaluation Processor Library Recorded Network Library Coherence Library Application Model Memory  Highlights JADE Statistical Application Model  Statistical, recorded and synthetic application Network Synthetic Application models Model Processor Peripherals  Network-on-chip and off-chip networks Mapping, Routing, Scheduling MRS Task Mapping and Communication Memory Space  Optical and electrical interconnects Algorithms Scheduling Traffic Routing Plan Mapping  Memory subsystem  Built-in power analysis Memory System Performance Energy Access Trace Behavior Analysis Analysis Output 2016-06-09 Jiang Xu (HKUST) 8
COSMIC Heterogeneous Multiprocessor Benchmark Applica cation tion Descript iption ion Machine Learning - FMP Financial market prediction using machine learning Machine Learning - ALIP Machine learning based image indexing Molecular Dynamics Simulating molecular dynamics when molecules hit surfaces of solid atoms Ray Tracing 3D scenes rendering Ultrasound Medical diagnostics using 2D/3D ultrasound imaging Fast Fourier Transform Fast Fourier Transform with complex number inputs LDPC Encoder Low-density parity-check code encoder TURBO Decoder Turbo code decoder Reed-Solomon Reed-Solomon code encoder and decoder  Collaborating with application experts  More applications are under development 2016-06-09 Jiang Xu (HKUST) 9
Exploration Cases  I 2 CON inter/intra-chip optical network ONI ONI ONI ONI  SUOR optical NoC ONI ONI ONI ONI  Electrical mesh-based NoC Controller Memory controller Memory controller ONI ONI ONI ONI  Memory hierarchy  Private L1 caches ONI ONI ONI ONI  Shared L2 cache – 16 banks Cluster of Electrical Optical Network Core Waveguide Cores  16 memory controllers Wire Interface (ONI) Memory controller  Processor core  ARM-v7a Memory controller  7nm, 1GHz, 0.6V Memory controller Memory controller 2016-06-09 Jiang Xu (HKUST) 10
Performance and Scalability 2016-06-09 Jiang Xu (HKUST) 11
Energy Efficiency and Scalability 2016-06-09 Jiang Xu (HKUST) 12
Recommend
More recommend