2GRVI Phalanx: A
A Ki Kilocor locore RISC ISC-V V RV64I V64I Pr Processor
- cessor
Clust ster er Arr rray ay with h HBM BM2 2 In In a Xili linx nx VU37P 37P FPG FPGA
Wor
- rk
k in Progr
- gress
ss Re Repor
- rt
Softwar tware-Fir First st FPGA GA Ac Accele elerato rator r - - PowerPoint PPT Presentation
2GRVI Phalanx: A A Ki Kilocor locore RISC ISC-V V RV64I V64I Pr Processor ocessor Clust ster er Arr rray ay with h HBM BM2 2 In In a Xili linx nx VU37P 37P FPG FPGA Wor ork k in Progr ogress ss Re Repor ort Jan Gray |
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 2
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 3
2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 2019/11/17 4
~320 LUTs @ 400 MHz
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 5
PE PE PE PE PE PE PE PE 2:1 2:1 2:1 2:1 4:4
XBAR
CMEM = 128 KB CLUSTER DATA IMEM 4-8 KB ACCELERATOR(S) 64 IMEM 4-8 KB IMEM 4-8 KB IMEM 4-8 KB
32
~3500 LUTs
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 6
YI XI X Y
1,0 2,0 3,0 0,1 1,1 2,1 3,1 0,2 1,2 2,2 3,2 0,3 1,3 2,3 3,3
C C C C C C C C C C C C C C C C
256b @ 400 MHz = 100 Gb/s links
PE PE PE PE PE PE PE PE 2:1 2:1 2:1 2:1 4:4
XBAR
CMEM = 128 KB CLUSTER DATA IMEM ACCELERATOR(S)
NoC/ACCEL ITF
HOPLITE ROUTER
64 IMEM IMEM IMEM 310 256
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 7
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 8
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 9
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 10
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 11
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 12
2019/11/17 13
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 14
32b GRVI PE 64b 2GRVI VI PE Year 2015 Q4 2019 Q2 ISA RV32I + mul/lr/sc RV64I 4I + lr/sc Area 320 6-LUTs 400 6-LUT UTs s (sans s shared ed <<) Fmax / congested 400 / 300 MHz 500+ 500+ / TBD MHz Pipeline stages 2 / 3 2 / 3 / 4 (superpipelined) Out-of-order retire yes Cluster, load interval 5 cycles 1 / / c cycl cle Cluster, load-to-use 5 cycles 3-6 cycles Cluster, Σ RAM BW 4.8 GB/s (300 MHz) 12.8 GB/s s (400 MHz)
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 15
C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C A A A A C C C C C C C C C C C C A A A A C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C A A A A C C C C C C A A A H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H 4 GB HBM2 DRAM STACK 4 GB HBM2 DRAM STACK PCIe DMA H 300
C A HH
Cluster { 8 GRVI / 6 2GRVI, 4-8 KB IRAM, 128 KB CRAM, Hoplite router } NoC-AXI RDMA bridge { 2 256b AXI R/W req queues, 2 resp queues } Two AXI-switch-MC-HBM2 bridges, each 256b R/W at up to 450 MHz Unidirectional Hoplite NoC X-ring rows and Y-ring columns
32×n B burst-read request → n×32 B read responses
bridge issues request to its AXI-HBM channel(s); bridge sends read response messages to dest. address
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 17
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 18
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 19
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 20
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 21
kernel void vector_add( global int* g_a, global int* g_b, global int* g_sum, const unsigned n) { local align int a[N], b[N], sum[N]; int iloc = get_local_id(0) * n; int iglb = (get_group_id(0) * get_local_size(0) + get_local_id(0)) * n; int size = n * sizeof(int); copy(a + iloc, g_a + iglb, size); // from HBM copy(b + iloc, g_b + iglb, size); barrier(CLK_LOCAL_MEM_FENCE); for (int i = 0; i < n; ++i) sum[i] = a[i] + b[i]; barrier(CLK_LOCAL_MEM_FENCE); copy(g_sum + iglb, sum + iloc, size); // to HBM }
http://fpga.org/2014/12/31/the-past-and-future-of-fpga-soft-processors/
https://www.xilinx.com/support/documentation/ip_documentation/hbm/v1_0/pg276-axi-hbm.pdf
2019/11/17 2GRVI Phalanx: Kilocore RV64I + HBM Accelerator Framework 23