High-Performance Reconfigurable Computing Group
Department of Electrical and Computer Engineering University of Toronto
Making FPGAs Programmable as Computers and Doing It At Scale Paul - - PowerPoint PPT Presentation
Making FPGAs Programmable as Computers and Doing It At Scale Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Whats the real goal? Build large-scale
Department of Electrical and Computer Engineering University of Toronto
November 14, 2016 H2RC 2016
2
November 14, 2016 H2RC 2016
3
November 14, 2016 H2RC 2016
4
November 14, 2016 H2RC 2016
5
November 14, 2016 H2RC 2016
6
November 14, 2016 H2RC 2016
7
November 14, 2016 H2RC 2016
8
November 14, 2016 H2RC 2016
9
November 14, 2016 H2RC 2016
1
November 14, 2016 H2RC 2016
1 1
November 14, 2016 H2RC 2016
1 2
November 14, 2016 H2RC 2016
1 3
November 14, 2016 H2RC 2016
1 4
November 14, 2016 H2RC 2016
1 5
November 14, 2016 H2RC 2016
1 6
November 14, 2016 H2RC 2016
1 7
November 14, 2016 H2RC 2016
1 8
November 14, 2016 H2RC 2016
1 9
CPU (x86) Application Common API Interconnect Drivers
Kernel migration
CPU (x86) Application Common API Embedded CPU Application Common API Common API Hardware Interconnect Drivers
Kernel migration
CPU (x86) Application Common API Embedded CPU Application Common API Common API Hardware Custom Processing Element Common API Hardware Interconnect Drivers
Kernel migration
CPU (x86) Application Common API Embedded CPU Application Common API Common API Hardware Custom Processing Element Common API Hardware Interconnect Drivers
Kernel migration
November 14, 2016 H2RC 2016
2
– Easier development: SW Prototyping à Migration
November 14, 2016 H2RC 2016
2 1
November 14, 2016 H2RC 2016
2 2
November 14, 2016 H2RC 2016
2 3
103 - 1010 ∑
− =
i i i i b
r r k U
2 0 )
(
∑∑ ∑
= =
+ =
N i N j ij j i n
n r q q U
1 1
2 1
τ
⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ =
6 12
4 ) ( r r r V σ σ ε
∑
− =
i i i i a
k U
2 0 )
( θ θ
( ) [ ] ( )
∑⎩
⎨ ⎧ = − ≠ − + =
i i i i i i i i i t
n k n n k U , , cos 1
2
γ γ φ
O(n2) O(n)
November 14, 2016
2 4
H2RC 2016
November 14, 2016 H2RC 2016
2 5
November 14, 2016 H2RC 2016
2 6
Also a system simulation HLS can do this
Network of Five V2Pro PCI Cards (2006) Network of BEE2 Multi-FPGA Boards (2007)
FPGA portability and design abstraction facilitated ongoing migration.
November 14, 2016 H2RC 2016
2 7
November 14, 2016 H2RC 2016
2 8
Stack of 5 large Virtex-5 FPGAs + 1 FPGA for FSB PHY interface Quad socket Xeon Server
CPUi Processi
Bonded Nonbonded PME Datai
CPUi Processi
Bonded Nonbonded PME Datai
CPUi Processi
Bonded Nonbonded PME Datai
CPUi Processi
Bonded Nonbonded PME Datai November 14, 2016
2 9
H2RC 2016
Bond Engine Visualizer Output Scheduler Input MPI::Send(&msg, size, dest …); Atom Manager Atom Manager Atom Manager Bond Engine Long range Electrostatics Engine Long range Electrostatics Engine Long range Electrostatics Engine Atom Manager Short range Nonbond Engine Short range Nonbond Engine Short range Nonbond Engine Short range Nonbond Engine Short range Nonbond Engine Short range Nonbond Engine
November 14, 2016
3
H2RC 2016
FSB
FSB NBE NBE NBE NBE FSB NBE NBE NBE NBE MEM PME FSB NBE NBE NBE NBE PME MEM
Socket Socket 2 Socket 1 Socket 3
Short range Nonbonded Long range Electrostatic Bonds
Initial Breakdown of CPU Time
12 short range nonbond FPGAs
2-3 pipelines/NBE FPGA; Each runs 15-30x CPU
NBE 360-1080x
2 PME FPGAs with fast memory and fibre optic interconnects
PME 420x
Bonds on quad-core Xeon server
Bonds 1x
Sys Mem Sys Mem
Quad Xeon
Sys Mem 8.5 GB/s @ 1066 MHz 72.5 GB/s
3 1
November 14, 2016 H2RC 2016
November 14, 2016
3 2
H2RC 2016
Timestep = 108 ms (327 506 atoms)
November 14, 2016
3 3
H2RC 2016
– 140 with hardware bond engines – change engine from SW to HW, no architectural change
November 14, 2016
3 4
H2RC 2016
FPGA/CPU Supercomputer Scaling Factor
Space
5U 17.5*2U 1/7
Cooling
N/A Share of 735-ton chiller
Capital Cost
$15000* $120000 1/8
Annual Electricity Cost
$241 (Assuming 500W) $6758 1/30
Performance (Core Equivalent)
140 Cores 1*140 Cores 140x *Current system is a prototype. Cost is based on projections for next-generation system.
November 14, 2016 H2RC 2016
3 5
November 14, 2016 H2RC 2016
3 6
November 14, 2016 H2RC 2016
3 7
November 14, 2016 H2RC 2016
3 8
Address Space Processes/threads
November 14, 2016 H2RC 2016
3 9
Network One-sided Communication Library PGAS Library or Language Runtime PGAS Application Network One-sided Communication Library PGAS Library or Language Runtime PGAS Application
November 14, 2016 H2RC 2016
4
November 14, 2016 H2RC 2016
4 1
November 14, 2016 H2RC 2016
4 2 PGAS ¡Language ¡Runtime PGAS ¡Language ¡Application GASNet ¡Extended ¡API Network ¡Hardware ¡ ¡ ¡ ¡ ¡ ¡ ¡GASNet ¡Core ¡API
November 14, 2016 H2RC 2016
4 3
CPU (x86) Application Common API Interconnect Drivers
Kernel migration
CPU (x86) Application Common API Embedded CPU Application Common API Common API Hardware Interconnect Drivers
Kernel migration
CPU (x86) Application Common API Embedded CPU Application Common API Common API Hardware Custom Processing Element Common API Hardware Interconnect Drivers
Kernel migration
CPU (x86) Application Common API Embedded CPU Application Common API Common API Hardware Custom Processing Element Common API Hardware Interconnect Drivers
Kernel migration
November 14, 2016 H2RC 2016
4 4
November 14, 2016 H2RC 2016
4 5
PGAS ¡Language ¡Runtime PGAS ¡Language ¡Application THe_GASNet ¡Core ¡API THe_GASNet ¡Extended ¡API Network ¡Hardware Accelerator ¡Core GASCore PAMS
November 14, 2016 H2RC 2016
4 6
PGAS ¡Language ¡Runtime PGAS ¡Language ¡Application THe_GASNet ¡Core ¡API THe_GASNet ¡Extended ¡API Accelerator ¡Core GASCore Extended ¡PAMS Network ¡Hardware
November 14, 2016 H2RC 2016
4 7
November 14, 2016 H2RC 2016
4 8
November 14, 2016 H2RC 2016
4 9
November 14, 2016 H2RC 2016
5
November 14, 2016 H2RC 2016
5 1
November 14, 2016 H2RC 2016
5 2
November 14, 2016 H2RC 2016
5 3
Source: R. Willenberg
November 14, 2016 H2RC 2016
5 4
Source: R. Willenberg
November 14, 2016 H2RC 2016
5 5
November 14, 2016 H2RC 2016
5 6
the clusters
i5
November 14, 2016 H2RC 2016
5 7
the i5
November 14, 2016 H2RC 2016
5 8
November 14, 2016 H2RC 2016
5 9
Compile Static generation Dynamic generation
High- Level- Synthesis
November 14, 2016 H2RC 2016
6
November 14, 2016 H2RC 2016
6 1
November 14, 2016 H2RC 2016
6 2
November 14, 2016 H2RC 2016
6 3