Cornell'University' Computer'Systems'Laboratory'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Derek'Lockhart,'Gary'Zibrat,'and'Christopher'Ba>en'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' - - PowerPoint PPT Presentation
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research' Derek'Lockhart,'Gary'Zibrat,'and'Christopher'Ba>en' Cornell'University' Computer'Systems'Laboratory' Outline( The'Computer'Architecture' PyMTL'
Cornell'University' Computer'Systems'Laboratory'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Derek'Lockhart,'Gary'Zibrat,'and'Christopher'Ba>en'
Outline(
1'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
The'Computer'Architecture' Research'Methodology'Gap' The'PerformanceF ProducGvity'Gap'
PyMTL' SimJIT'
Trends(in(Computing(Systems(
2'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Energy'&'Power' Constrained'
Credible'' Energy'and'Power' Analysis'
Trends(in(Computing(Systems(
2'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Extensive' SpecializaGon' Energy'&'Power' Constrained'
Credible'' Energy'and'Power' Analysis' ProducGve' Design'Space'ExploraGon'
Trends(in(Computing(Systems(
2'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
CrossFLayer' OpGmizaGon' Extensive' SpecializaGon' Energy'&'Power' Constrained'
EffecGve' Strategies'for' VerGcally'Integrated' Design' Credible'' Energy'and'Power' Analysis' ProducGve' Design'Space'ExploraGon'
Managing(Increasing(Design(Complexity(
3'/'39'
'' '
'' '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Computer(Architecture(Research(Abstractions(
3'/'39'
'' '
'' '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Sea'of'Transistors' ApplicaGons'
Computer(Architecture(Research(Abstractions(
4'/'39'
Algorithms' InstrucGon'Set'Architecture' Microarchitecture' VLSI' Compilers'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Sea'of'Transistors' ApplicaGons'
Computer(Architecture(Research(Abstractions(
4'/'39'
Algorithms' InstrucGon'Set'Architecture' Microarchitecture' VLSI' Compilers' Academic' Research' ' A'Few'' Researchers' Industry' Development' ' Hundreds'of' Engineers'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Computer(Architecture(Research(Abstractions(
4'/'39''
Algorithms' InstrucGon'Set'Architecture' Microarchitecture' VLSI' Compilers' Academic' Research' ' A'Few'' Researchers' Industry' Development' ' Hundreds'of' Engineers' Sea'of'Transistors' ApplicaGons'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Computer(Architecture(Research(Methodologies(
5'/'39'
'' '
'' '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Sea'of'Transistors' ApplicaGons'
Computer(Architecture(Research(Methodologies(
6'/'39'
Cycle'Level'
Algorithms' InstrucGon'Set'Architecture' Microarchitecture' VLSI' Compilers'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Computer(Architecture(Research(Methodologies(
6'/'39'
Cycle'Level'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Sea'of'Transistors' ApplicaGons' Algorithms' VLSI' Compilers' InstrucGon'Set'Architecture' Microarchitecture'
Computer(Architecture(Research(Methodologies(
6'/'39'
Cycle'Level'
FuncGonal'Level'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Sea'of'Transistors' ApplicaGons' Microarchitecture' VLSI' InstrucGon'Set'Architecture' Algorithms' Compilers'
Algorithms' InstrucGon'Set'Architecture' Compilers'
Computer(Architecture(Research(Methodologies(
6'/'39'
Cycle'Level'
Microarchitecture' FuncGonal'Level'
VLSI' Register'Transfer'Level'
Sea'of'Transistors' ApplicaGons'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
InstrucGon'Set'Architecture' Algorithms' Compilers'
Computer(Architecture(Research(Methodologies(
6'/'39'
Cycle'Level'
Microarchitecture' FuncGonal'Level'
VLSI' Register'Transfer'Level'
Sea'of'Transistors' ApplicaGons'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Computer(Architecture(Research(Methodologies(
7'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Cycle'Level'
FuncGonal'Level'
Register'Transfer'Level'
Modeling'Towards'Layout
Computer(Architecture(Research(Methodologies(
7'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
FuncGonal'Level' Cycle'Level' Register'Transfer'Level' Algorithm'and'ISA' Development' Design'Space' Explora8on' Area/Energy/Timing'Valida8on' and' Prototype'Development'
Modeling'Towards'Layout
Greater'' Simula8on' Speed' Greater'' Model' Detail'
Computer(Architecture(Research(Frameworks(
8'/'100''
'' '
'' '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Computer(Architecture(Research(Frameworks(
9'/'100''
FuncGonal'Level' Cycle'Level' Register'Transfer'Level' Algorithm'and'ISA' Development' Design'Space' Explora8on' Area/Energy/Timing'Valida8on' and' Prototype'Development'
MATLAB/Python'Algorithm'or' C++'InstrucGon'Set'Simulator' C++'Computer'Architecture'' SimulaGon'Framework' (ObjectFOriented)' Verilog'or'VHDL'Design'with' EDA'Toolflow' (ConcurrentFStructural)'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Computer(Architecture(Research(Frameworks(
9'/'100''
FuncGonal'Level' Cycle'Level' Register'Transfer'Level' Algorithm'and'ISA' Development' Design'Space' Explora8on' Area/Energy/Timing'Valida8on' and' Prototype'Development'
Different'languages,'' pa>erns,'and'tools! The'Computer'Architecture' Research'Methodology'Gap
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Great(Ideas(From(Prior(Work(
10'/'100''
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
(Liberty,'Cascade,'SystemC)! !
Consistent'interfaces'across'abstracGons' '
Great(Ideas(From(Prior(Work(
10'/'100''
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
(Liberty,'Cascade,'SystemC)! !
(SystemC)' '
Consistent'interfaces'across'abstracGons' ' ' Unified'design'environment'for'FL,'CL,'RTL' ' '
Great(Ideas(From(Prior(Work(
10'/'100''
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
(Liberty,'Cascade,'SystemC)! !
(SystemC)' '
(Chisel,'Genesis2,'BlueSpec,'MyHDL)' '
Consistent'interfaces'across'abstracGons' ' ' Unified'design'environment'for'FL,'CL,'RTL' ' ' ProducGve'RTL'design'space'exploraGon' ' '
Great(Ideas(From(Prior(Work(
10'/'100''
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
(Liberty,'Cascade,'SystemC)! !
(SystemC)' '
(Chisel,'Genesis2,'BlueSpec,'MyHDL)' '
(Cascade)! '
Consistent'interfaces'across'abstracGons' ' ' Unified'design'environment'for'FL,'CL,'RTL' ' ' ProducGve'RTL'design'space'exploraGon' ' ' ProducGve'RTL'validaGon'and'cosimulaGon' ' '
Great(Ideas(From(Prior(Work(
10'/'100''
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
(Liberty,'Cascade,'SystemC)! !
(SystemC)' '
(Chisel,'Genesis2,'BlueSpec,'MyHDL)' '
(Cascade)! !
(Liberty,'BlueSpec)'
Consistent'interfaces'across'abstracGons' ' ' Unified'design'environment'for'FL,'CL,'RTL' ' ' ProducGve'RTL'design'space'exploraGon' ' ' ProducGve'RTL'validaGon'and'cosimulaGon' ' ' Component'and'test'bench'reuse'
Outline(
11'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
The'Computer'Architecture' Research'Methodology'Gap' The'PerformanceF ProducGvity'Gap'
PyMTL' SimJIT'
What(is(PyMTL?(
12'/'39'
Model'DSEL'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
What(is(PyMTL?(
12'/'39'
'
'
API'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Model'DSEL'
What(is(PyMTL?(
12'/'39'
'
API' SimulaGon' Tool'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Model'DSEL'
What(is(PyMTL?(
12'/'39'
'
API' SimulaGon' Tool' TranslaGon' Tool'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Model'DSEL'
What(is(PyMTL?(
12'/'39'
'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
API' SimulaGon' Tool' TranslaGon' Tool' Model'DSEL' TesGng'Framework'
What(Does(PyMTL(Enable?(
13'/'39'
'
FL' Model' Test' Harness'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
What(Does(PyMTL(Enable?(
13'/'39'
FL' Model' Test' Harness' CL' Model' Test' Harness'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
What(Does(PyMTL(Enable?(
13'/'39'
FL' Model' Test' Harness' CL' Model' Test' Harness' RTL' Model' Test' Harness'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
What(Does(PyMTL(Enable?(
14'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
FL' Model' Test' Harness' CL' Model' Test' Harness' RTL' Model' Test' Harness'
What(Does(PyMTL(Enable?(
14'/'39'
FL' Model' Test' Harness' CL' Model' Test' Harness' RTL' Model' Test' Harness' Verilog' RTL' Model' Verilog' RTL' Model' Test' Harness'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
What(Does(PyMTL(Enable?(
15'/'39'
'
FL' Model' CL' Model' RTL' Model'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
What(Does(PyMTL(Enable?(
16'/'39'
'
Verilog' RTL' Model'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
What(Does(PyMTL(Enable?(
17'/'39'
'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
' gem5' PyMTL' C++' Model' PyMTL' Verilog' Model'
What(Does(PyMTL(Enable?(
18'/'39'
'
(see!Srinath!et.!al.!in!MICRO247,!Session!6B!)!
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
' gem5' PyMTL' C++' Model' PyMTL' Verilog' Model'
The(PyMTL(Framework(
19'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Model'
Specifica8on' Tools' Output'
The(PyMTL(Framework(
19'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Model' Config' Elaborator' Model' Instance'
Specifica8on' Tools' Output'
The(PyMTL(Framework(
19'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Model' Config' Test'&'Sim' Harness' Traces'&' VCD' Elaborator' SimulaGon' Tool' Model' Instance'
Specifica8on' Tools' Output'
The(PyMTL(Framework(
19'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Model' Config' Test'&'Sim' Harness' Verilog' Traces'&' VCD' Elaborator' SimulaGon' Tool' TranslaGon' Tool' Model' Instance' EDA' Toolflow'
Specifica8on' Tools' Output'
The(PyMTL(Framework(
19'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Model' Config' Test'&'Sim' Harness' Verilog' Traces'&' VCD' User'Tool' Output' Elaborator' SimulaGon' Tool' TranslaGon' Tool' User' Tool' Model' Instance' EDA' Toolflow'
Specifica8on' Tools' Output'
The(PyMTL(Framework(
19'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Model' Config' Test'&'Sim' Harness' Verilog' Traces'&' VCD' User'Tool' Output' Elaborator' SimulaGon' Tool' TranslaGon' Tool' User' Tool' Model' Instance' EDA' Toolflow'
Specifica8on' Tools' Output'
VisualizaGon' StaGc' Analysis' Dynamic' Checking' FPGA' SimulaGon' High'Level' Synthesis'
The(PyMTL(DSEL(
20'/'39'
def sorter_network( input ):! return sorted( input )!
!
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
['3,'1,'2,'0']' ['0,'1,'2,'3']'
f(x)'
The(PyMTL(DSEL(
21'/'39'
def sorter_network( input ):! return sorted( input )! ! class SorterNetworkFL( Model )!
'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
f(x)'
['3,'1,'2,'0']' ['0,'1,'2,'3']'
f(x)'
The(PyMTL(DSEL(
21'/'39'
def sorter_network( input ):! return sorted( input )! ! class SorterNetworkFL( Model )! def __init__( s, nbits, nports ):! type = Bits( nbits )! s.in_ = InPort [nports]( type )! s.out = OutPort[nports]( type )! '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
f(x)'
['3,'1,'2,'0']' ['0,'1,'2,'3']'
f(x)'
The(PyMTL(DSEL(
21'/'39'
def sorter_network( input ):! return sorted( input )! ! class SorterNetworkFL( Model )! def __init__( s, nbits, nports ):! ! s.in_ = InPort [nports](nbits)! s.out = OutPort[nports](nbits)! '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
f(x)'
['3,'1,'2,'0']' ['0,'1,'2,'3']'
f(x)'
The(PyMTL(DSEL(
21'/'39'
def sorter_network( input ):! return sorted( input )! ! class SorterNetworkFL( Model )! def __init__( s, nbits, nports ):! ! s.in_ = InPort [nports](nbits)! s.out = OutPort[nports](nbits)! ! ! @s.tick_fl! def logic():! '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
f(x)'
['3,'1,'2,'0']' ['0,'1,'2,'3']'
f(x)'
The(PyMTL(DSEL(
21'/'39'
def sorter_network( input ):! return sorted( input )! ! class SorterNetworkFL( Model )! def __init__( s, nbits, nports ):! ! s.in_ = InPort [nports](nbits)! s.out = OutPort[nports](nbits)! ! ! @s.tick_fl! def logic():! for i, v in enumerate( sorted( s.in_ ) ):! s.out[i].next = v'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
f(x)'
['3,'1,'2,'0']' ['0,'1,'2,'3']'
f(x)'
The(PyMTL(DSEL(
22'/'39'
def sorter_network( input ):! return sorted( input )! ! class SorterNetworkCL( Model )! def __init__( s, nbits, nports, delay=3 ):! ! s.in_ = InPort [nports](nbits)! s.out = OutPort[nports](nbits)! ! ! @s.tick_cl! def logic():! '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
f(x)'
['3,'1,'2,'0']' ['0,'1,'2,'3']'
f(x)'
The(PyMTL(DSEL(
22'/'39'
def sorter_network( input ):! return sorted( input )! ! class SorterNetworkCL( Model )! def __init__( s, nbits, nports, delay=3 ):! ! s.in_ = InPort [nports](nbits)! s.out = OutPort[nports](nbits)! s.pipe = Pipeline( delay )! ! @s.tick_cl! def logic():! s.pipe.xtick()! s.pipe.push( sorted( s.in_ ) )! ! if s.pipe.ready():! for i, v in enumerate( s.pipe.pop() ):! s.out[i].next = v! ! '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
f(x)'
['3,'1,'2,'0']' ['0,'1,'2,'3']'
f(x)'
The(PyMTL(DSEL(
23'/'39'
def sorter_network( input ):! return sorted( input )! ! class SorterNetworkRTL( Model )! def __init__( s, nbits ):! ! s.in_ = InPort [4](nbits)! s.out = OutPort[4](nbits)! ! '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
['3,'1,'2,'0']' ['0,'1,'2,'3']'
f(x)'
The(PyMTL(DSEL(
23'/'39'
def sorter_network( input ):! return sorted( input )! ! class SorterNetworkRTL( Model )! def __init__( s, nbits ):! ! s.in_ = InPort [4](nbits)! s.out = OutPort[4](nbits)! ! s.m = m = MinMaxRTL[5](nbits)! ! ! '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
['3,'1,'2,'0']' ['0,'1,'2,'3']'
f(x)'
The(PyMTL(DSEL(
23'/'39'
def sorter_network( input ):! return sorted( input )! ! class SorterNetworkRTL( Model )! def __init__( s, nbits ):! ! s.in_ = InPort [4](nbits)! s.out = OutPort[4](nbits)! ! s.m = m = MinMaxRTL[5](nbits)! ! s.connect( s.in_[0], m[0].in_[0] )! s.connect( s.in_[1], m[0].in_[1] )! s.connect( s.in_[2], m[1].in_[0] )! s.connect( s.in_[3], m[2].in_[1] )! ! . . .! '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
['3,'1,'2,'0']' ['0,'1,'2,'3']'
f(x)'
The(PyMTL(DSEL(
24'/'39'
class MinMaxRTL( Model )! def __init__( s, nbits ):! s.in_ = InPort [2](nbits)! s.out = OutPort[2](nbits)! @s.combinational! def logic():! swap = s.in_[0] > s.in_[1]! s.out[0].value = s.in[1] if swap else s.in[0]! s.out[1].value = s.in[0] if swap else s.in[1]! !
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
The(PyMTL(DSEL(
24'/'39'
class MinMaxRTL( Model )! def __init__( s, nbits ):! s.in_ = InPort [2](nbits)! s.out = OutPort[2](nbits)! @s.combinational! def logic():! swap = s.in_[0] > s.in_[1]! s.out[0].value = s.in[1] if swap else s.in[0]! s.out[1].value = s.in[0] if swap else s.in[1]! ! class RegRTL( Model )! def __init__( s, nbits ):! s.in_ = InPort (nbits)! s.out = OutPort(nbits)! @s.tick_rtl! def logic():! s.out.next = s.in_! '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
The(PyMTL(DSEL(
25'/'39'
TesGng'of'SorterFL,'SorterCL,'and'SorterRTL'can'be'greatly' simplified'by'using'latencyFinsensiGve'interfaces.' ' ' '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
The(PyMTL(DSEL(
25'/'39'
TesGng'of'SorterFL,'SorterCL,'and'SorterRTL'can'be'greatly' simplified'by'using'latencyFinsensiGve'interfaces.'
Produc8vity'helpers:'
' ' '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
' ' '
The(PyMTL(DSEL(
25'/'39'
TesGng'of'SorterFL,'SorterCL,'and'SorterRTL'can'be'greatly' simplified'by'using'latencyFinsensiGve'interfaces.'
Produc8vity'helpers:'
' ' '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
' ' See'the'paper'for'' more'examples!'
' ' '
Why(Python?(
26'/'39'
Benefits:' '
(dynamicFtyping,'reflecGon,'metaprogramming)'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Why(Python?(
26'/'39'
Benefits:' '
(dynamicFtyping,'reflecGon,'metaprogramming)'
' Drawbacks:' '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Outline(
27'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
The'Computer'Architecture' Research'Methodology'Gap' The'PerformanceF ProducGvity'Gap'
PyMTL' SimJIT'
PerformanceHProductivity(Gap(
28'/'39'
Experiment:' '
'
'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
PerformanceHProductivity(Gap(
28'/'39'
Experiment:' '
'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
PerformanceHProductivity(Gap(
29'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
RTL Network CL Network
Simulated Cycles Simulated Cycles
1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x
CPython
1K 10K 100K 1M 10x 5x 1x
CPython
PerformanceHProductivity(Gap(
29'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
RTL Network CL Network
Simulated Cycles Simulated Cycles
1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x
CPython Verilator C++
1K 10K 100K 1M 10x 5x 1x
CPython
RTL Network CL Network
Simulated Cycles Simulated Cycles
1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x
CPython Verilator C++
1K 10K 100K 1M 10x 5x 1x
CPython
PerformanceHProductivity(Gap(
29'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Performance'degrada8on'due'to'Compila8on'
RTL Network CL Network
Simulated Cycles Simulated Cycles
1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x
CPython Verilator C++
1K 10K 100K 1M 10x 5x 1x
CPython
PerformanceHProductivity(Gap(
29'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Short'Simula8ons:'LargeVCompila8on'Overhead'
RTL Network CL Network
Simulated Cycles Simulated Cycles
1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x
CPython Verilator C++
1K 10K 100K 1M 10x 5x 1x
CPython
PerformanceHProductivity(Gap(
29'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Long'Simula8ons:'Compila8on'Overhead'Amor8zed'
RTL Network CL Network
Simulated Cycles Simulated Cycles
1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x
CPython Verilator C++
1K 10K 100K 1M 10x 5x 1x
CPython
PerformanceHProductivity(Gap(
29'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
300x' 1200x'
PerformanceHProductivity(Gap(
30'/'39'
Python'is'growing'in'popularity'in'many'domains'of'scienGfic'and' highFperformance'compuGng.''How'do'they'close'this'gap?'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
PerformanceHProductivity(Gap(
30'/'39'
Python'is'growing'in'popularity'in'many'domains'of'scienGfic'and' highFperformance'compuGng.''How'do'they'close'this'gap?'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
(NumPy,!CVXOPT,!NLPy,!pythonOCC,!GEM5)'
'
(Numba,!Parakeet)! !
(PyPy,!Pyston)!
(SEJITS)!
PerformanceHProductivity(Gap(
30'/'39'
Python'is'growing'in'popularity'in'many'domains'of'scienGfic'and' highFperformance'compuGng.''How'do'they'close'this'gap?'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
(NumPy,!CVXOPT,!NLPy,!pythonOCC,!GEM5)'
'
(Numba,!Parakeet)! !
(PyPy,!Pyston)!
(SEJITS)!
RTL Network CL Network
Simulated Cycles Simulated Cycles
1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x
CPython Verilator C++
1K 10K 100K 1M 10x 5x 1x
CPython
PerformanceHProductivity(Gap(
31'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
300x' 1200x'
RTL Network CL Network
Simulated Cycles Simulated Cycles
1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x
CPython PyPy Verilator C++
1K 10K 100K 1M 10x 5x 1x
CPython PyPy
PerformanceHProductivity(Gap(
31'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
30x' 240x'
Outline(
32'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
The'Computer'Architecture' Research'Methodology'Gap' The'PerformanceF ProducGvity'Gap'
PyMTL' SimJIT'
PyMTL(SimJIT(Architecture(
33'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
PyMTL' RTL'Model' Instance'
TranslaGon' Verilog' Source'
SimJITFRTL'Tool'
PyMTL(SimJIT(Architecture(
33'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
PyMTL' RTL'Model' Instance'
TranslaGon' Verilator' Verilog' Source' RTL'C++' Source'
SimJITFRTL'Tool'
PyMTL(SimJIT(Architecture(
33'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
PyMTL' RTL'Model' Instance'
TranslaGon' Verilator' Verilog' Source' RTL'C++' Source' C'Interface' Source'
SimJITFRTL'Tool'
PyMTL(SimJIT(Architecture(
33'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
PyMTL' RTL'Model' Instance'
TranslaGon' Verilator' LLVM/GCC' Verilog' Source' RTL'C++' Source' C'Interface' Source' C'Shared' Library'
SimJITFRTL'Tool'
PyMTL(SimJIT(Architecture(
33'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
PyMTL' RTL'Model' Instance'
TranslaGon' Verilator' LLVM/GCC' Wrapper' Gen' Verilog' Source'
PyMTL' CFFI'Model' Instance'
RTL'C++' Source' C'Interface' Source' C'Shared' Library'
SimJITFRTL'Tool'
PyMTL(SimJIT(Architecture(
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
PyMTL' RTL'Model' Instance'
TranslaGon' Verilator' LLVM/GCC' Wrapper' Gen' Verilog' Source'
PyMTL' CFFI'Model' Instance'
RTL'C++' Source' C'Interface' Source' C'Shared' Library' TranslaGon' Cache'
SimJITFRTL'Tool'
33'/'39'
PyMTL(SimJIT(Architecture(
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
PyMTL' RTL'Model' Instance'
TranslaGon' Verilator' LLVM/GCC' Wrapper' Gen' Verilog' Source'
PyMTL' CFFI'Model' Instance'
RTL'C++' Source' C'Interface' Source' C'Shared' Library' TranslaGon' Cache'
SimJITFRTL'Tool'
33'/'39'
Fairly'robust,'ready'for'use'in'research!'
PyMTL(SimJIT(Architecture(
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
34'/'39' PyMTL' CL'Model' Instance'
TranslaGon' LLVM/GCC' Wrapper' Gen'
PyMTL' CFFI'Model' Instance'
CL'C++' Source' C'Interface' Source' C'Shared' Library'
SimJITFCL'Tool'
Just'a'prototype!'
RTL Network CL Network
Simulated Cycles Simulated Cycles
1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x
CPython PyPy Verilator C++
1K 10K 100K 1M 10x 5x 1x
CPython PyPy
PerformanceHProductivity(Gap(
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
30x' 240x'
35'/'39'
RTL Network CL Network
Simulated Cycles Simulated Cycles
1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x
CPython PyPy SimJIT-CL SimJIT-RTL Verilator C++
1K 10K 100K 1M 10x 5x 1x
CPython PyPy
PyMTL(SimJIT(Performance(
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
10x' 20x'
35'/'39'
RTL Network CL Network
Simulated Cycles Simulated Cycles
1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x
CPython PyPy SimJIT-CL SimJIT-RTL SimJIT-CL+PyPy SimJIT-RTL+PyPy Verilator C++
1K 10K 100K 1M 10x 5x 1x
CPython PyPy
PyMTL(SimJIT(Performance(
36'/'39'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
4.5x' 6x'
PyMTL(SimJIT(Performance(
37'/'39'
OpportuniGes'to'further'reduce'the'performance'gap:' '
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Contributions(
38'/'39'
PyMTL'is'a'producGve'Python'framework'for'FL,'CL,'and'RTL' modeling,'enabling:'
' SimJIT'considerably'closes'the'performanceFproducGvity'gap' between'Python'and'C++'simulaGons.'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
Conclusion(
39'/'39'
PyMTL'is'a'producGve,'openVsource'Python'framework'for' FL/CL/RTL'modeling'and'hardware'design.' ' ' heps://github.com/cornellVbrg/pymtl' '
Thank'you'to'our'sponsors'for'their'support:'' NSF,'DARPA,'and'donaGons'from'Intel'CorporaGon'and'Synopsys,'Inc.'
PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'
'