configurable tlb hierarchy for the rocket chip generator
play

Configurable TLB Hierarchy for the Rocket Chip Generator Nikos Ch. - PowerPoint PPT Presentation

Enabling Virtual Memory Research on RISC-V with a Configurable TLB Hierarchy for the Rocket Chip Generator Nikos Ch. Papadopoulos , Vasileios Karakostas, Konstantinos Nikas, Nectarios Koziris, Dionisios N. Pnevmatikatos ncpapad@cslab.ece.ntua.gr


  1. Enabling Virtual Memory Research on RISC-V with a Configurable TLB Hierarchy for the Rocket Chip Generator Nikos Ch. Papadopoulos , Vasileios Karakostas, Konstantinos Nikas, Nectarios Koziris, Dionisios N. Pnevmatikatos ncpapad@cslab.ece.ntua.gr National Technical University of Athens School of Electrical and Computer Engineering Computing Systems Laboratory

  2. Motivation Explore RISC-V ISA and Rocket Chip Generator ● Vanilla L1 TLB is fully-associative ○ May impact the critical path ○ #entries vs resource usage tradeoff ● Vanilla L2 TLB is direct-mapped ○ May impact the miss rate ● We want to lift these restrictions and enable: ○ Configurable L1 and L2 TLBs ○ From direct mapped up to fully-associative structures CARRV 2020 | May 29, 2020 | Virtual Workshop 2

  3. Outline ● Background ○ Rocket Chip Generator ○ RISC-V Virtual Memory support ● Configurable TLB Hierarchy features ● Methodology ○ Hardware & Software Development Flow ● Performance and Area Results ● Related & Future work ● Conclusions 3 CARRV 2020 | May 29, 2020 | Virtual Workshop 3

  4. Rocket Chip Generator ● SoC Generator that produces Synthesizable RTL ○ Written in Chisel ○ Rocket core or BOOM (Berkeley Out-of-Order Machine) ○ Parameterized Tiles, Caches, Accelerators, etc. ● Library of processor parts and utilities ○ Replacement policies ○ Branch predictors ○ ...and many more 4 CARRV 2020 | May 29, 2020 | Virtual Workshop 4

  5. RV64-Sv39 Paging Scheme 39-bit (512GB) virtual address space ● 3-level page table ● Supports 4KB base pages ● But also 2MB, 1GB superpages ○ 27-bit VPN → 44-bit PPN ● 12-bit page offset for 4KB pages ○ SATP register ● Stores the root of the page table ○ 5 CARRV 2020 | May 29, 2020 | Virtual Workshop 5

  6. Existing MMU in Rocket Chip Generator ● Fully-associative L1 TLB ○ Separate Data/Instr L1 TLB ○ Vector of Registers ○ Fast & small (32-128 entries) ● Direct-mapped L2 TLB ○ SyncReadMem ○ Slower but larger (128-1024) ● Fully-associative PTW Cache ○ Vector of Registers ○ Keeps non-leaf nodes 6 CARRV 2020 | May 29, 2020 | Virtual Workshop 6

  7. Configurable TLB hierarchy in Rocket ● Kept the same overall structure ○ Lookups, refill, replacement policies, flushing ● Added about 70 LoC for the L1 TLB ● 50 LoC for the L2 TLB ● Implementation in two different editions of the RCG ○ Apr 2018 version ■ Supports Xilinx ZCU102 ○ January 2020 version 7 CARRV 2020 | May 29, 2020 | Virtual Workshop 7

  8. Hardware Development Flow Implementation ● Chisel & FIRRTL checks ○ Syntax errors, unconnected wires, etc. ○ Testing ● Verilator: Cycle-accurate Simulator ○ Chisel debug statements ○ Assembly tests ○ Evaluation ● Generate bitstream for the Xilinx ZCU102 ○ Run tests and benchmarks using Buildroot ○ 8 CARRV 2020 | May 29, 2020 | Virtual Workshop 8

  9. Software Flow Freedom-U-SDK by Sifive ● SW for the Freedom Unleashed ○ Buildroot ● Minimal embedded distribution ○ Easy to add custom packages ○ Linux kernel 4.15 ● Cross-compilation for RISC-V ○ Berkeley Boot Loader (BBL) ● Sets up performance counters (cycles, TLB misses) ○ Boots linux ○ 9 CARRV 2020 | May 29, 2020 | Virtual Workshop 9

  10. L1 | L2 TLB Contributions Vanilla L1 | L2 TLB Configurable L1 | L2 TLB Organization Fully-assoc | Direct-mapped Any associativity Parameterization #Entries #Sets, #Ways (pow2) Replacement policies PseudoLRU/Random | No policy Pseudo LRU/Random set- associative alternatives Other features Sectored L1 TLB entries Sectored L1 TLB entries are supported too 10 CARRV 2020 | May 29, 2020 | Virtual Workshop 10

  11. Evaluation Metrics ● FPGA Resource Usage ○ Lookup-Tables (LUTs), Flip-Flops (FFs), Block RAM (BRAMs) ● Performance Metrics ○ SPEC2006 benchmarks (with test input set) ■ Misses-per-kilo-Instructions (MPKI) ■ Instructions-per-cycle (IPC) 11 CARRV 2020 | May 29, 2020 | Virtual Workshop 11

  12. Evaluation Scenarios Configurations resembling well-known architectures ● Conf III → ARM Cortex A57 ○ Conf IV → Intel Skylake ○ Conf V → Intel Skylake (swapped I/D TLB sizes) ○ 12 CARRV 2020 | May 29, 2020 | Virtual Workshop 12

  13. FPGA resource usage evaluation 13 CARRV 2020 | May 29, 2020 | Virtual Workshop 13

  14. L1 TLB Performance Evaluation (MPKI) Results for L1 Data and Instruction TLBs ● Most TLB misses come from data accesses ● Several benchmarks show similar behavior ● across configurations But larger L1 DTLB may improve performance ● mcf stresses the TLB hierarchy the most ● 14 CARRV 2020 | May 29, 2020 | Virtual Workshop 14

  15. L1 TLB Performance Evaluation (MPKI) Results for L1 Data and Instruction TLBs ● Most TLB misses come from data accesses ● Several benchmarks show similar behavior ● across configurations But larger L1 DTLB may improve performance ● mcf stresses the TLB hierarchy the most ● 15 CARRV 2020 | May 29, 2020 | Virtual Workshop 14

  16. L1 TLB Performance Evaluation (MPKI) Results for L1 Data and Instruction TLBs ● Most TLB misses come from data accesses ● Several benchmarks show similar behavior ● across configurations But larger L1 DTLB may improve performance ● mcf stresses the TLB hierarchy the most ● 16 CARRV 2020 | May 29, 2020 | Virtual Workshop 14

  17. L1 TLB Performance Evaluation (MPKI) Results for L1 Data and Instruction TLBs ● Most TLB misses come from data accesses ● Several benchmarks show similar behavior ● across configurations But larger L1 DTLB may improve performance ● mcf stresses the TLB hierarchy the most ● 17 CARRV 2020 | May 29, 2020 | Virtual Workshop 14

  18. L2 TLB Performance Evaluation (MPKI) L2 TLB misses are rare for most benchmarks ● Larger L2 TLB reach may reduce page walks ● Configurations IV and V ○ mcf improves significantly as L2 TLB increases ● 18 CARRV 2020 | May 29, 2020 | Virtual Workshop 15

  19. L2 TLB Performance Evaluation (MPKI) L2 TLB misses are rare for most benchmarks ● Larger L2 TLB reach may reduce page walks ● Configurations IV and V ○ mcf improves significantly as L2 TLB increases ● 19 CARRV 2020 | May 29, 2020 | Virtual Workshop 15

  20. L2 TLB Performance Evaluation (MPKI) L2 TLB misses are rare for most benchmarks ● Larger L2 TLB reach may reduce page walks ● Configurations IV and V ○ mcf improves significantly as L2 TLB increases ● 20 CARRV 2020 | May 29, 2020 | Virtual Workshop 15

  21. L2 TLB Performance Evaluation (MPKI) L2 TLB misses are rare for most benchmarks ● Larger L2 TLB reach may reduce page walks ● Configurations IV and V ○ mcf improves significantly as L2 TLB increases ● 21 CARRV 2020 | May 29, 2020 | Virtual Workshop 15

  22. System Performance Evaluation (IPC) 22 CARRV 2020 | May 29, 2020 | Virtual Workshop 16

  23. System Performance Evaluation (IPC) 23 CARRV 2020 | May 29, 2020 | Virtual Workshop 16

  24. System Performance Evaluation (IPC) 24 CARRV 2020 | May 29, 2020 | Virtual Workshop 16

  25. … Further Evaluation ● Unfortunately the Xilinx ZCU102 board reserves only 512MB RAM for the PL thus limiting the benchmarks we could run ○ Older Rocket Chip commit ● Correctness evaluation of the more recent RC edition ● We plan on moving to Firesim ○ Evaluation with SPEC2017 and other benchmarks ○ + Multicore benchmarking ● BOOM performance evaluation 25 CARRV 2020 | May 29, 2020 | Virtual Workshop 17

  26. Related & Future Work ● Research/Develop new MMU features ○ Direct Segments [ISCA'13] ○ Coalesced/Clustered TLBs [MICRO'12, HPCA'14] ○ Redundant Memory Mappings [ISCA'15] ○ Hybrid TLB Coalescing [ISCA'17] ● Reduce resource usage in FPGA simulation ○ TLBs are CAMs → FPGA-hostile structure 26 CARRV 2020 | May 29, 2020 | Virtual Workshop 18

  27. Conclusions ● Enabled further configurability in the Rocket Chip Generator ● Our design can output any L1/L2 TLB organization/size ● Evaluated resource usage & application performance ● Feel free to review our work in github! ○ https://github.com/ncppd/rocket-chip Thank you! 27 CARRV 2020 | May 29, 2020 | Virtual Workshop 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend