Experience ces Using the RISC-V E V Ecosystem to Design an Acce - PowerPoint PPT Presentation

Experience ces Using the RISC-V E V Ecosystem to Design an Acce ccelerator-Centric c SoC in TSMC 16nm Tutu Ajayi 2 , Khalid Al-Hawaj 1 , Aporva Amarnath 2 , Steve Dai 1 , Scott Davidson 4 , Paul Gao 4 , Gai Liu 1 , Anuj Rao 4 , Austin Rovinski 2 , Ningxiao Sun 4 , Christopher Torng 1 , Luis Vega 4 , Bandhav Veluri 4 , Shaolin Xie 4 , Chun Zhao 4 Ritchie Zhao 1 , Christopher Batten 1 , Ronald G. Dreslinski 2 , Rajesh K. Gupta 3 , Michael B. Taylor 4 , Zhiru Zhang 1 1 Cornell University 2 University of Michigan 3 University of California, San Diego 4 Bespoke Silicon Group, (U. Washington/ UC San Diego) MICRO-50 October 14, 2017

Computer Architecture Research Prototyping Prototyping is important to complement the results of simulation-based research Many benefits to prototyping : • Validating assumptions • Validating design methodologies • Measuring real system-level performance and energy efficiency • Creating platforms for software research • Building credibility with industry • Building intuition for physical design • Pedagogical benefits • Building real things is fun! Celerity :: Introduction

The Continuing Need for Building Prototypes The Four Horsemen of the Coming The rise of the dark silicon era [1] , in which an Dark Silicon Apocalypse increasing fraction of silicon must remain unpowered, is motivating an increasing trend towards accelerator-centric architectures. Specialization research requires: “Dim” “Shrink” • New simulation-based evaluation methodologies based on accelerators [2] • New prototyping methodologies for rapidly building accelerator-centric prototypes Unfortunately, building research prototypes can “Magic” be tremendously challenging. “Specialize” [1] M. Taylor. “ Is Dark Silicon Useful? Harnessing the Four Horsemen of the Coming Dark Silicon Apocalypse ,” In Design Automation Conference, 2012. [2] Y. Shao, et al. “ Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures ”, ISCA 2014 Celerity :: Introduction

Prototyping with the RISC-V Software/Hardware Ecosystem Software Toolchain Application • A complete, off-the-shelf software stack (e.g., binutils, GCC, newlib/glibc, Algorithm Linux kernel & distros) for both embedded and general-purpose Programming Language Architecture Operating System • RISC-V ISA specification designed to be both modular and extensible, with Compilers a small base ISA and optional extensions Instruction Set Architecture Microarchitecture Microarchitecture Register-Transfer Level • On-chip network specifications and implementations (NASTI, TileLink) Gate-Level • RISC-V processor implementations for both in-order (Berkeley Rocket) and Circuits out-of-order (Berkeley BOOM) cores Devices Physical Design Technology • Previous spins of chips for reference Testing • Standard core verification test suites + Turn-key FPGA gateware Celerity :: Introduction

The Celerity System-on-Chip BaseJumpFSB and Motherboard NASTI RISC-V Rocket Core RoCC Celerity , an accelerator-centric SoC D-Cache I-Cache with a tiered accelerator fabric NASTI RISC-V Rocket Core RoCC that targets highly performant and energy- D-Cache I-Cache efficient embedded systems RISC-V NASTI RISC-V Rocket Core NoC Router RoCC Vanilla-5 Core D-Cache I-Cache Funded by the DARPA CRAFT program, I Mem XBAR NASTI RISC-V Rocket Core RoCC “Circuit Realization At Faster Timescales” D Mem D-Cache I-Cache NASTI RISC-V Rocket Core RoCC The goal was to develop new methodologies to D-Cache I-Cache design chips more quickly General-Purpose Massively Parallel Specialization Tier Tier Tier We leveraged the RISC-V software/hardware ecosystem as we built Celerity, and we believe it was instrumental in enabling a team of 20 graduate students to tape out a complex SoC in only 9 months Celerity :: Introduction

Celerity: Chip Overview http://www.opencelerity.org • TSMC 16nm FFC • 25 mm 2 die area (5mm x 5mm) • ~385 million transistors • 511 RISC-V cores • 5 Linux-capable RV64G Berkeley Rocket cores • 496-core RV32IM mesh tiled array “manycore” • 10-core RV32IM mesh tiled array (low voltage) • Binarized Neural Network Specialized Accelerator • On-chip synthesizable PLLs and DC/DC LDO • Developed in-house • 3 Clock domains • 400 MHz – DDR I/O • 625 MHz – Rocket core + Specialized accelerator • 1.05 GHz – Manycore array • 672-pin flip chip BGA package • 9-months from PDK access to tape-out Celerity :: Introduction

Agenda BaseJumpFSB and Motherboard NASTI RISC-V Rocket Core RoCC • Introduction D-Cache I-Cache NASTI RISC-V Rocket Core RoCC • For each Tier: D-Cache I-Cache • What did we build? RISC-V NASTI RISC-V Rocket Core NoC Router RoCC Vanilla-5 Core • How did we build it? D-Cache I-Cache I Mem XBAR NASTI RISC-V Rocket Core • RISC-V Ecosystem Successes RoCC D Mem D-Cache I-Cache • RISC-V Ecosystem Challenges NASTI RISC-V Rocket Core RoCC • Conclusion D-Cache I-Cache General-Purpose Massively Parallel Specialization Tier Tier Tier Celerity :: Introduction

Celerity: General-Purpose Tier BaseJumpFSB and Motherboard NASTI RISC-V Rocket Core RoCC D-Cache I-Cache NASTI RISC-V Rocket Core RoCC D-Cache I-Cache RISC-V NASTI RISC-V Rocket Core NoC Router RoCC Vanilla-5 Core D-Cache I-Cache I Mem XBAR NASTI RISC-V Rocket Core RoCC D Mem D-Cache I-Cache NASTI RISC-V Rocket Core RoCC D-Cache I-Cache Massively Parallel Specialization General-Purpose Tier Tier Tier Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

General-Purpose Tier Overview • 5 Berkeley Rocket Cores (RV64G) NASTI RISC-V Rocket Core RoCC • Workload BNN D-Cache I-Cache • General-purpose compute BaseJumpMotherboard • Operating system (e.g. Linux & TCP/IP Stack) NASTI RISC-V Rocket Core RoCC BaseJumpFSB • Interrupt and Exception handling D-Cache I-Cache • Program dispatch and control flow NASTI • Interface RISC-V Rocket Core RoCC Manycore • Interface to off-chip I/O and other peripherals D-Cache I-Cache • 4 Cores connect to the manycore array NASTI RISC-V Rocket Core RoCC • 1 Core interfaces with the BNN D-Cache I-Cache • Memory • Each core executes independently within its NASTI RISC-V Rocket Core RoCC own address space D-Cache I-Cache • Memory management for all tiers Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

Berkeley Rocket Cores • 5 Berkeley Rocket Cores (https://github.com/freechipsproject/rocket-chip) • Generated from Chisel • RV64G ISA • 5-stage, in-order, scalar processor • Double-precision floating point • I-Cache: 16KB 4-way assoc. • D-Cache: 16KB 4-way assoc. • Physical Implementation • 625 MHz (Critical path in FSB) • 0.19 mm 2 per core http://www.lowrisc.org/docs/tagged-memory-v0.1/rocket-core/ Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

Design Iterations 1. Loopback 2. Alpaca Baseline design to validate FSB and Northbridge Implemented NASTI bridge and connected rocket core Motherboard Motherboard BaseJump BaseJump BaseJump BaseJump NASTI RISC-V Rocket Core FSB FSB Loopback FIFO D-Cache I-Cache 3. Bison 4. Coyote Implemented accelerator connected through Blackboxed RoCC Modularized RoCC interface to accelerator RISC-V Rocket Core NASTI Motherboard Motherboard RoCC BaseJump BaseJump BaseJump BaseJump Accelerator RISC-V Rocket Core D-Cache I-Cache NASTI FSB FSB D-Cache I-Cache … … RoCC Accelerator Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

Off-Chip Interface and Northbridge • Open-source BaseJump IP Library • http://bjump.org • Front Side bus L2 $ RISC-V Rocket Core NASTI RoCC • BaseJump Communication Link D-Cache I-Cache DRAM • High Speed (DDR) Source-Synchronous Controller FPGA Bridge RISC-V Rocket Core BaseJump NASTI RoCC Communication Interface FSB & FPGA Bridge D-Cache I-Cache Ethernet • Packaging BaseJump . . . RISC-V Rocket Core NASTI RoCC • Modified BaseJump BGA Package and I/O Ring D-Cache I-Cache • Validation SSD RISC-V Rocket Core NASTI RoCC D-Cache I-Cache • BaseJump Super Trouble PCB (Daughter Card) Clocks • BaseJump Motherboard (ZedBoard) RISC-V Rocket Core NASTI RoCC D-Cache I-Cache JTAG Celerity SoC BaseJump Motherboard Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

RISC-V Successes • Berkeley Rocket Cores • Very quickly generated validated designs • Vibrant ecosystem to provide feedback and support • Test and Validation infrastructure • Software and Toolchain support • Flexible memory system and peripheral I/O support • Easy integration with BaseJump IP Library • Balances extensibility and software support Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

RISC-V Lessons Learned • Component stability, compatibility and versioning • Chisel adoption • RTL simulationissues • Deciphering Chisel generated RTL • Register initialization and X-Pessimism Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

Experience ces Using the RISC-V E V Ecosystem to Design an Acce - PowerPoint PPT Presentation

Experience ces Using the RISC-V E V Ecosystem to Design an Acce ccelerator-Centric c SoC in TSMC 16nm Tutu Ajayi 2 , Khalid Al-Hawaj 1 , Aporva Amarnath 2 , Steve Dai 1 , Scott Davidson 4 , Paul Gao 4 , Gai Liu 1 , Anuj Rao 4 , Austin Rovinski

The future of operating systems on RISC-V Alex Bradbury asb@lowrisc.org @asbradbury 4th

PROCESSOR DEVELOPMENT THE FREE AND OPEN RISC INSTRUCTION SET ARCHITECTURE Codasip is the

LOGIC TECHNOLOGY FOR CS EDUCATION RISCAL The RISC Algorithm Language Wolfgang Schreiner

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

LOGIC TECHNOLOGY FOR CS EDUCATION RISCAL The RISC Algorithm Language Wolfgang Schreiner

HomeConnect Riverside County CES CES Coordinated Entry System Access to available housing in the

Low Power Design Thomas Ebi and Prof. Dr. J. Henkel Thomas Ebi and Prof. Dr. J. Henkel CES CES

Design and Architectures for Embedded Systems Prof. Dr. J. Henkel Prof. Dr. J. Henkel CES CES

Design and Architectures for Embedded Systems Prof. Dr. J. Henkel Henkel Prof. Dr. J. CES CES

Design and Architectures for Embedded Systems Prof. Dr. J. Henkel Prof. Dr. J. Henkel CES CES

Design and Architectures for Embedded Systems Prof. Dr. J. Henkel Henkel Prof. Dr. J. CES CES

Unification Temur Kutsia RISC, Johannes Kepler University Linz, Austria kutsia@risc.jku.at

QEMU Support for the RISC-V Instruction Set Architecture Sagar Karandikar

Sail, RISC-V, and CHERI-RISC-V Prashanth Mundkur and Peter G. Neumann, SRI International (most of

End-to-end formal ISA verification of RISC-V processors with riscv-formal Clifford Wolf About

TITLE Topic: Characterization of DDR4 Receiver Sensitivity Impact o Nam elementum commodo

Using Timing-Error Detection and Correction for Transient-Error Tolerance and Adaptation to PVT

Probabilistic Real-Time Analysis Luca Santinelli and Liliana Cucu-Grosjean Trio Team, INRIA Nancy

Real-Time Architecture Heechul Yun 1 Topics Introduction to Real-Time Systems, CPS CPS

Fiscal stimulus in an age of debt: how much stimulus are we planning? Economic and social outlook

Multiagent Traffic Management: An Improved Intersection Control Mechanism Mechanism PRESENTED

Monetary independence and rollover crises Javier Bianchi (Federal Reserve Bank of Minneapolis &

Computational Social Choice: Autumn 2012 Ulle Endriss Institute for Logic, Language and

Experience ces Using the RISC-V E V Ecosystem to Design an Acce - PowerPoint PPT Presentation

Experience ces Using the RISC-V E V Ecosystem to Design an Acce ccelerator-Centric c SoC in TSMC 16nm Tutu Ajayi 2 , Khalid Al-Hawaj 1 , Aporva Amarnath 2 , Steve Dai 1 , Scott Davidson 4 , Paul Gao 4 , Gai Liu 1 , Anuj Rao 4 , Austin Rovinski

The future of operating systems on RISC-V Alex Bradbury asb@lowrisc.org @asbradbury 4th

PROCESSOR DEVELOPMENT THE FREE AND OPEN RISC INSTRUCTION SET ARCHITECTURE Codasip is the

LOGIC TECHNOLOGY FOR CS EDUCATION RISCAL The RISC Algorithm Language Wolfgang Schreiner

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

LOGIC TECHNOLOGY FOR CS EDUCATION RISCAL The RISC Algorithm Language Wolfgang Schreiner

HomeConnect Riverside County CES CES Coordinated Entry System Access to available housing in the

Low Power Design Thomas Ebi and Prof. Dr. J. Henkel Thomas Ebi and Prof. Dr. J. Henkel CES CES

Design and Architectures for Embedded Systems Prof. Dr. J. Henkel Prof. Dr. J. Henkel CES CES

Design and Architectures for Embedded Systems Prof. Dr. J. Henkel Henkel Prof. Dr. J. CES CES

Design and Architectures for Embedded Systems Prof. Dr. J. Henkel Prof. Dr. J. Henkel CES CES

Design and Architectures for Embedded Systems Prof. Dr. J. Henkel Henkel Prof. Dr. J. CES CES

Unification Temur Kutsia RISC, Johannes Kepler University Linz, Austria kutsia@risc.jku.at

QEMU Support for the RISC-V Instruction Set Architecture Sagar Karandikar

Sail, RISC-V, and CHERI-RISC-V Prashanth Mundkur and Peter G. Neumann, SRI International (most of

End-to-end formal ISA verification of RISC-V processors with riscv-formal Clifford Wolf About

TITLE Topic: Characterization of DDR4 Receiver Sensitivity Impact o Nam elementum commodo

Using Timing-Error Detection and Correction for Transient-Error Tolerance and Adaptation to PVT

Probabilistic Real-Time Analysis Luca Santinelli and Liliana Cucu-Grosjean Trio Team, INRIA Nancy

Real-Time Architecture Heechul Yun 1 Topics Introduction to Real-Time Systems, CPS CPS

Fiscal stimulus in an age of debt: how much stimulus are we planning? Economic and social outlook

Multiagent Traffic Management: An Improved Intersection Control Mechanism Mechanism PRESENTED

Monetary independence and rollover crises Javier Bianchi (Federal Reserve Bank of Minneapolis &amp;

Computational Social Choice: Autumn 2012 Ulle Endriss Institute for Logic, Language and

Monetary independence and rollover crises Javier Bianchi (Federal Reserve Bank of Minneapolis &