sonicboom the third generation berkeley out of order
play

SonicBOOM The Third Generation Berkeley Out-of-Order Machine Jerry - PowerPoint PPT Presentation

SonicBOOM The Third Generation Berkeley Out-of-Order Machine Jerry Zhao, Ben Korpan, Abe Gonzalez, Krste Asanovic UC Berkeley jzh@berkeley.edu Goal of the BOOM project 2x 7- wide OOO Vortex 72x 8- wide OOO Skylake 4x 10- wide


  1. SonicBOOM – The Third Generation Berkeley Out-of-Order Machine Jerry Zhao, Ben Korpan, Abe Gonzalez, Krste Asanovic UC Berkeley jzh@berkeley.edu

  2. Goal of the BOOM project 2x 7- wide OOO “Vortex” 72x 8- wide OOO “Skylake” 4x 10- wide OOO “Sunny Lake” 2x 3- wide OOO “T empest” 2x 9- wide OOO “Typhoon” 4x 3- wide OOO “T empest” General-purpose performance is important across the entire computing ecosystem. BOOM Goals : Build a high-performance open-source RISC-V out-of-order core Support research in various aspects of high-performance SoC design (microarch, security, accelerators, etc.) 2

  3. BOOMv1 7-cycle branch- iss mispredict penalty fetch fetch dec exec wb queues queues rrd iss BTB GShare tlb D$ D$ wb rrd 10-cycle branch- BOOMv2 mispredict penalty queue fetch fetch fetch dec dis iss rrd exec wb queues 4-cycle load-use BTB queue iss rrd tlb D$ D$ wb GShare

  4. Open-source Performance Gap 9 8 7 6 5 4 3 2 1 0 Ivy Bridge XuanTie SiFive U74 WD BOOMv1 BOOMv2 Rocket 910 SWERV 12+stage 12-stage 8-stage 9-stage 8-stage 10-stage 5-stage Architecture 4-w OOO 3-w OOO 2-w in-order 2-w in-order 4-w OOO 4-w OOO 1-w in-order CoreMark/ 8.5 7.1 5.1 4.9 4.9 3.2 2.3 MHz 4

  5. BOOMv1 7-cycle branch- iss mispredict penalty fetch fetch dec exec wb queues queues rrd iss BTB GShare tlb D$ D$ wb rrd 10-cycle branch- BOOMv2 mispredict penalty queue fetch fetch fetch dec dis iss rrd exec wb queues 4-cycle load-use BTB queue iss rrd tlb D$ D$ wb GShare 12-cycle branch- mispredict penalty (SonicBOOM) queue BOOMv3 fetch fetch fetch fetch dec dis issue rrd exec wb br queues 4-cycle load-use SFB uBTB queue Recoder issue rrd tlb D$ D$ wb BTB RAS queue issue rrd Custom RoCC Accelerator wb TAGE 5

  6. SonicBOOM Frontend: • New TAGE-L branch predictor • New decoders for RISC-V compressed Execute: • Short-forwards-branch recoding • Superscalar branch resolution • Improved address-generation pipeline • Custom RoCC accelerators Memory: • Superscalar address generation • Superscalar load-store unit • Optimized load/store scheduling • L1 next-line-prefetcher w. line-fill-buffers 6

  7. State-of-the-art Branch Prediction Challenges: • Superscalar fetch/predict Instruction Dec • Speculative updates ICache ode Buffer • Repair after misspeculation Control/Redirect Logic • Predictor pipelining Branch Generated Predictor Pipeline Metadata SonicBOOM Instruction Fetch: • Variable-width (RVC) decode Global Update + + Local Repair • L0/L1 BTBs Histories • Pipelined TAGE + Loop predictor • Repaired return-address-stack 7

  8. Improving Branch Performance Dynamic Predication Superscalar Branch Resolution • Recode short-forwards-branches • BOOMv2: 1 branch/jump unit into “predicated” micro -ops • BOOMv3: Every ALU is a branch • "POWER8"-style unit • Correct prediction is cheap, • 5.1 CM/MHz -> 6.2 CM/MHz misprediction is expensive • Single JMP unit to handle fetch fetch fetch fetch AUIPC/JAL instructions SFB • +1 branch latency to find oldest uBTB Recoder mispredicted branch BTB RAS queue issue rrd exec wb br TAGE queue issue rrd exec wb br 8

  9. Advanced Load/Store Unit Memory Issue Queue Superscalar memory access: • Addr-gen/translate/execute 2 loads Register-read per cycle DataGen DataGen DataGen • Banked DCache data arrays AddrGen AddrGen Improved L1 Data Cache: TLB • Fully non-blocking (refill in parallel Load Store Queue Queue with writeback) DCache DCache Bank0 Bank1 • Line-fill-buffers with next-line- prefetcher MSHRs Probe + Next-line • Improved memory scheduler writeback prefetch Line Fill Buffers 9

  10. FPGA-accelerated Co-simulation Dromajo : simulator developed by Test Esperanto, checks correctness of Application RISC-V trace Linux Kernel Fromajo: couple Dromajo to FireSim Image FPGA simulation of core • Committed instruction stream pulled Dromajo Cosimulator FireSim Simulation from core (1 MHz) (100 MHz) • Committed instructions checked RISC-V RISC-V core against Dromajo at 1 MHz simulation model • Cycle-exact, reproducible divergences • Works with other RISC-V cores (Ex: Ariane) 10

  11. Finding a RISC-V Linux Bug Background: PTW Insn reads+writes • PTWs are unordered w.r.t. loads/stores • SFENCE.VMA orders page-table updates with accesses Store-buffer Found Linux hang with SonicBOOM • Kernel load launches a PTW to recently written PTE Memory • No SFENCE between PTE write and PTW • Only materializes on a deeply speculating core • Patch in-progress 11

  12. CoreMark IPC 9 8 7 6 5 4 3 2 1 0 Ivy XuanTie BOOMv3 SiFive WD BOOMv1 BOOMv2 Rocket Bridge 910 U74 SWERV 12+stage 12-stage 12-stage 8-stage 9-stage 8-stage 10-stage 5-stage Architecture 4-w OOO 3-w OOO 4-w OOO 2-w in- 2-w in- 4-w OOO 4-w OOO 1-w in- order order order CoreMark/ 8.5 7.1 6.2 5.1 4.9 4.9 3.2 2.3 MHz 12

  13. SPEC17 Comparison • Evaluate SPEC17 intspeed, single-core performance • Target comparable branch-prediction accuracy and IPC Intel Xeon AWS Graviton SonicBOOM Microarchitecture Skylake Server Cortex A72 BOOMv3 Undisclosed Undisclosed TAGE-L Branch Predictor 64/64 KB 48/32 KB 32/32 KB L1 Cache Sizes (I/D) 1 MB 2 MB 512 KB L2 Cache Size L3 Cache Size 24 MB 0 MB 4 MB Compiler gcc gcc gcc Ubuntu 18.04 Server Ubuntu 18.04 Buildroot Linux OS AWS EC2 bare-metal AWS EC2 bare-metal FireSim simulation Platform 13

  14. SPEC17 Branch Prediction Accuracy Equivalent to A72 14

  15. SPEC17 IPC 15

  16. Next steps Physical Implementation: • > 1 GHz possible according to preliminary results • Critical path in issue-units (issue-select/compaction) • Current SRAMs limit us to 1.4 GHz Improving performance: • Larger prefetchers between L2/LLC to hide L2 miss penalty • Instruction prefetcher • V-Extension support 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend