experience ces using the risc v e v ecosystem to design
play

Experience ces Using the RISC-V E V Ecosystem to Design an Acce - PowerPoint PPT Presentation

Experience ces Using the RISC-V E V Ecosystem to Design an Acce ccelerator-Centric c SoC in TSMC 16nm Tutu Ajayi 2 , Khalid Al-Hawaj 1 , Aporva Amarnath 2 , Steve Dai 1 , Scott Davidson 4 , Paul Gao 4 , Gai Liu 1 , Anuj Rao 4 , Austin Rovinski


  1. Experience ces Using the RISC-V E V Ecosystem to Design an Acce ccelerator-Centric c SoC in TSMC 16nm Tutu Ajayi 2 , Khalid Al-Hawaj 1 , Aporva Amarnath 2 , Steve Dai 1 , Scott Davidson 4 , Paul Gao 4 , Gai Liu 1 , Anuj Rao 4 , Austin Rovinski 2 , Ningxiao Sun 4 , Christopher Torng 1 , Luis Vega 4 , Bandhav Veluri 4 , Shaolin Xie 4 , Chun Zhao 4 Ritchie Zhao 1 , Christopher Batten 1 , Ronald G. Dreslinski 2 , Rajesh K. Gupta 3 , Michael B. Taylor 4 , Zhiru Zhang 1 1 Cornell University 2 University of Michigan 3 University of California, San Diego 4 Bespoke Silicon Group, (U. Washington/ UC San Diego) MICRO-50 October 14, 2017

  2. Computer Architecture Research Prototyping Prototyping is important to complement the results of simulation-based research Many benefits to prototyping : • Validating assumptions • Validating design methodologies • Measuring real system-level performance and energy efficiency • Creating platforms for software research • Building credibility with industry • Building intuition for physical design • Pedagogical benefits • Building real things is fun! Celerity :: Introduction

  3. The Continuing Need for Building Prototypes The Four Horsemen of the Coming The rise of the dark silicon era [1] , in which an Dark Silicon Apocalypse increasing fraction of silicon must remain unpowered, is motivating an increasing trend towards accelerator-centric architectures. Specialization research requires: “Dim” “Shrink” • New simulation-based evaluation methodologies based on accelerators [2] • New prototyping methodologies for rapidly building accelerator-centric prototypes Unfortunately, building research prototypes can “Magic” be tremendously challenging. “Specialize” [1] M. Taylor. “ Is Dark Silicon Useful? Harnessing the Four Horsemen of the Coming Dark Silicon Apocalypse ,” In Design Automation Conference, 2012. [2] Y. Shao, et al. “ Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures ”, ISCA 2014 Celerity :: Introduction

  4. Prototyping with the RISC-V Software/Hardware Ecosystem Software Toolchain Application • A complete, off-the-shelf software stack (e.g., binutils, GCC, newlib/glibc, Algorithm Linux kernel & distros) for both embedded and general-purpose Programming Language Architecture Operating System • RISC-V ISA specification designed to be both modular and extensible, with Compilers a small base ISA and optional extensions Instruction Set Architecture Microarchitecture Microarchitecture Register-Transfer Level • On-chip network specifications and implementations (NASTI, TileLink) Gate-Level • RISC-V processor implementations for both in-order (Berkeley Rocket) and Circuits out-of-order (Berkeley BOOM) cores Devices Physical Design Technology • Previous spins of chips for reference Testing • Standard core verification test suites + Turn-key FPGA gateware Celerity :: Introduction

  5. The Celerity System-on-Chip BaseJumpFSB and Motherboard NASTI RISC-V Rocket Core RoCC Celerity , an accelerator-centric SoC D-Cache I-Cache with a tiered accelerator fabric NASTI RISC-V Rocket Core RoCC that targets highly performant and energy- D-Cache I-Cache efficient embedded systems RISC-V NASTI RISC-V Rocket Core NoC Router RoCC Vanilla-5 Core D-Cache I-Cache Funded by the DARPA CRAFT program, I Mem XBAR NASTI RISC-V Rocket Core RoCC “Circuit Realization At Faster Timescales” D Mem D-Cache I-Cache NASTI RISC-V Rocket Core RoCC The goal was to develop new methodologies to D-Cache I-Cache design chips more quickly General-Purpose Massively Parallel Specialization Tier Tier Tier We leveraged the RISC-V software/hardware ecosystem as we built Celerity, and we believe it was instrumental in enabling a team of 20 graduate students to tape out a complex SoC in only 9 months Celerity :: Introduction

  6. Celerity: Chip Overview http://www.opencelerity.org • TSMC 16nm FFC • 25 mm 2 die area (5mm x 5mm) • ~385 million transistors • 511 RISC-V cores • 5 Linux-capable RV64G Berkeley Rocket cores • 496-core RV32IM mesh tiled array “manycore” • 10-core RV32IM mesh tiled array (low voltage) • Binarized Neural Network Specialized Accelerator • On-chip synthesizable PLLs and DC/DC LDO • Developed in-house • 3 Clock domains • 400 MHz – DDR I/O • 625 MHz – Rocket core + Specialized accelerator • 1.05 GHz – Manycore array • 672-pin flip chip BGA package • 9-months from PDK access to tape-out Celerity :: Introduction

  7. Agenda BaseJumpFSB and Motherboard NASTI RISC-V Rocket Core RoCC • Introduction D-Cache I-Cache NASTI RISC-V Rocket Core RoCC • For each Tier: D-Cache I-Cache • What did we build? RISC-V NASTI RISC-V Rocket Core NoC Router RoCC Vanilla-5 Core • How did we build it? D-Cache I-Cache I Mem XBAR NASTI RISC-V Rocket Core • RISC-V Ecosystem Successes RoCC D Mem D-Cache I-Cache • RISC-V Ecosystem Challenges NASTI RISC-V Rocket Core RoCC • Conclusion D-Cache I-Cache General-Purpose Massively Parallel Specialization Tier Tier Tier Celerity :: Introduction

  8. Celerity: General-Purpose Tier BaseJumpFSB and Motherboard NASTI RISC-V Rocket Core RoCC D-Cache I-Cache NASTI RISC-V Rocket Core RoCC D-Cache I-Cache RISC-V NASTI RISC-V Rocket Core NoC Router RoCC Vanilla-5 Core D-Cache I-Cache I Mem XBAR NASTI RISC-V Rocket Core RoCC D Mem D-Cache I-Cache NASTI RISC-V Rocket Core RoCC D-Cache I-Cache Massively Parallel Specialization General-Purpose Tier Tier Tier Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

  9. General-Purpose Tier Overview • 5 Berkeley Rocket Cores (RV64G) NASTI RISC-V Rocket Core RoCC • Workload BNN D-Cache I-Cache • General-purpose compute BaseJumpMotherboard • Operating system (e.g. Linux & TCP/IP Stack) NASTI RISC-V Rocket Core RoCC BaseJumpFSB • Interrupt and Exception handling D-Cache I-Cache • Program dispatch and control flow NASTI • Interface RISC-V Rocket Core RoCC Manycore • Interface to off-chip I/O and other peripherals D-Cache I-Cache • 4 Cores connect to the manycore array NASTI RISC-V Rocket Core RoCC • 1 Core interfaces with the BNN D-Cache I-Cache • Memory • Each core executes independently within its NASTI RISC-V Rocket Core RoCC own address space D-Cache I-Cache • Memory management for all tiers Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

  10. Berkeley Rocket Cores • 5 Berkeley Rocket Cores (https://github.com/freechipsproject/rocket-chip) • Generated from Chisel • RV64G ISA • 5-stage, in-order, scalar processor • Double-precision floating point • I-Cache: 16KB 4-way assoc. • D-Cache: 16KB 4-way assoc. • Physical Implementation • 625 MHz (Critical path in FSB) • 0.19 mm 2 per core http://www.lowrisc.org/docs/tagged-memory-v0.1/rocket-core/ Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

  11. Design Iterations 1. Loopback 2. Alpaca Baseline design to validate FSB and Northbridge Implemented NASTI bridge and connected rocket core Motherboard Motherboard BaseJump BaseJump BaseJump BaseJump NASTI RISC-V Rocket Core FSB FSB Loopback FIFO D-Cache I-Cache 3. Bison 4. Coyote Implemented accelerator connected through Blackboxed RoCC Modularized RoCC interface to accelerator RISC-V Rocket Core NASTI Motherboard Motherboard RoCC BaseJump BaseJump BaseJump BaseJump Accelerator RISC-V Rocket Core D-Cache I-Cache NASTI FSB FSB D-Cache I-Cache … … RoCC Accelerator Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

  12. Off-Chip Interface and Northbridge • Open-source BaseJump IP Library • http://bjump.org • Front Side bus L2 $ RISC-V Rocket Core NASTI RoCC • BaseJump Communication Link D-Cache I-Cache DRAM • High Speed (DDR) Source-Synchronous Controller FPGA Bridge RISC-V Rocket Core BaseJump NASTI RoCC Communication Interface FSB & FPGA Bridge D-Cache I-Cache Ethernet • Packaging BaseJump . . . RISC-V Rocket Core NASTI RoCC • Modified BaseJump BGA Package and I/O Ring D-Cache I-Cache • Validation SSD RISC-V Rocket Core NASTI RoCC D-Cache I-Cache • BaseJump Super Trouble PCB (Daughter Card) Clocks • BaseJump Motherboard (ZedBoard) RISC-V Rocket Core NASTI RoCC D-Cache I-Cache JTAG Celerity SoC BaseJump Motherboard Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

  13. RISC-V Successes • Berkeley Rocket Cores • Very quickly generated validated designs • Vibrant ecosystem to provide feedback and support • Test and Validation infrastructure • Software and Toolchain support • Flexible memory system and peripheral I/O support • Easy integration with BaseJump IP Library • Balances extensibility and software support Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

  14. RISC-V Lessons Learned • Component stability, compatibility and versioning • Chisel adoption • RTL simulationissues • Deciphering Chisel generated RTL • Register initialization and X-Pessimism Celerity :: General-Purpose Tier :: What is it? • How did we build it? • Successes with RISC-V • Challenges with RISC-V

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend