PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' - - PowerPoint PPT Presentation

pymtl a unified framework for ver8cally integrated
SMART_READER_LITE
LIVE PREVIEW

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' - - PowerPoint PPT Presentation

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research' Derek'Lockhart,'Gary'Zibrat,'and'Christopher'Ba>en' Cornell'University' Computer'Systems'Laboratory' Outline( The'Computer'Architecture' PyMTL'


slide-1
SLIDE 1

Cornell'University' Computer'Systems'Laboratory'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Derek'Lockhart,'Gary'Zibrat,'and'Christopher'Ba>en'

slide-2
SLIDE 2

Outline(

1'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

The'Computer'Architecture' Research'Methodology'Gap' The'PerformanceF ProducGvity'Gap'

PyMTL' SimJIT'

slide-3
SLIDE 3

Trends(in(Computing(Systems(

2'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Energy'&'Power' Constrained'

Credible'' Energy'and'Power' Analysis'

slide-4
SLIDE 4

Trends(in(Computing(Systems(

2'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Extensive' SpecializaGon' Energy'&'Power' Constrained'

Credible'' Energy'and'Power' Analysis' ProducGve' Design'Space'ExploraGon'

  • f'Specialized'Units'
slide-5
SLIDE 5

Trends(in(Computing(Systems(

2'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

CrossFLayer' OpGmizaGon' Extensive' SpecializaGon' Energy'&'Power' Constrained'

EffecGve' Strategies'for' VerGcally'Integrated' Design' Credible'' Energy'and'Power' Analysis' ProducGve' Design'Space'ExploraGon'

  • f'Specialized'Units'
slide-6
SLIDE 6

Managing(Increasing(Design(Complexity(

3'/'39'

  • AbstracGons'

'' '

  • Methodologies'

'' '

  • Pa>erns,'Languages,'Tools'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-7
SLIDE 7

Computer(Architecture(Research(Abstractions(

3'/'39'

'' '

  • Methodologies'

'' '

  • Pa>erns,'Languages,'Tools'
  • AbstracGons'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-8
SLIDE 8

Sea'of'Transistors' ApplicaGons'

Computer(Architecture(Research(Abstractions(

4'/'39'

Algorithms' InstrucGon'Set'Architecture' Microarchitecture' VLSI' Compilers'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-9
SLIDE 9

Sea'of'Transistors' ApplicaGons'

Computer(Architecture(Research(Abstractions(

4'/'39'

Algorithms' InstrucGon'Set'Architecture' Microarchitecture' VLSI' Compilers' Academic' Research' ' A'Few'' Researchers' Industry' Development' ' Hundreds'of' Engineers'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-10
SLIDE 10

Computer(Architecture(Research(Abstractions(

4'/'39''

Algorithms' InstrucGon'Set'Architecture' Microarchitecture' VLSI' Compilers' Academic' Research' ' A'Few'' Researchers' Industry' Development' ' Hundreds'of' Engineers' Sea'of'Transistors' ApplicaGons'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-11
SLIDE 11

Computer(Architecture(Research(Methodologies(

5'/'39'

  • AbstracGons'

'' '

  • Pa>erns,'Languages,'Tools'

'' '

  • Methodologies'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-12
SLIDE 12

Sea'of'Transistors' ApplicaGons'

Computer(Architecture(Research(Methodologies(

6'/'39'

Cycle'Level'

  • Behavior'
  • Timing'

Algorithms' InstrucGon'Set'Architecture' Microarchitecture' VLSI' Compilers'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-13
SLIDE 13

Computer(Architecture(Research(Methodologies(

6'/'39'

Cycle'Level'

  • Behavior'
  • Timing'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Sea'of'Transistors' ApplicaGons' Algorithms' VLSI' Compilers' InstrucGon'Set'Architecture' Microarchitecture'

slide-14
SLIDE 14

Computer(Architecture(Research(Methodologies(

6'/'39'

Cycle'Level'

  • Behavior'
  • Timing'

FuncGonal'Level'

  • Behavior'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Sea'of'Transistors' ApplicaGons' Microarchitecture' VLSI' InstrucGon'Set'Architecture' Algorithms' Compilers'

slide-15
SLIDE 15

Algorithms' InstrucGon'Set'Architecture' Compilers'

Computer(Architecture(Research(Methodologies(

6'/'39'

Cycle'Level'

  • Behavior'
  • Timing'

Microarchitecture' FuncGonal'Level'

  • Behavior'

VLSI' Register'Transfer'Level'

  • Behavior'
  • Timing'
  • Physical'Resources'

Sea'of'Transistors' ApplicaGons'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-16
SLIDE 16

InstrucGon'Set'Architecture' Algorithms' Compilers'

Computer(Architecture(Research(Methodologies(

6'/'39'

Cycle'Level'

  • Behavior'
  • Timing'

Microarchitecture' FuncGonal'Level'

  • Behavior'

VLSI' Register'Transfer'Level'

  • Behavior'
  • Timing'
  • Physical'Resources'

Sea'of'Transistors' ApplicaGons'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-17
SLIDE 17

Computer(Architecture(Research(Methodologies(

7'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Cycle'Level'

  • Behavior'
  • Timing'

FuncGonal'Level'

  • Behavior'

Register'Transfer'Level'

  • Behavior'
  • Timing'
  • Physical'Resources'

Modeling'Towards'Layout

slide-18
SLIDE 18

Computer(Architecture(Research(Methodologies(

7'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

FuncGonal'Level' Cycle'Level' Register'Transfer'Level' Algorithm'and'ISA' Development' Design'Space' Explora8on' Area/Energy/Timing'Valida8on' and' Prototype'Development'

Modeling'Towards'Layout

Greater'' Simula8on' Speed' Greater'' Model' Detail'

slide-19
SLIDE 19

Computer(Architecture(Research(Frameworks(

8'/'100''

  • AbstracGons'

'' '

  • Methodologies'

'' '

  • Pa>erns,'Languages,'Tools'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-20
SLIDE 20

Computer(Architecture(Research(Frameworks(

9'/'100''

FuncGonal'Level' Cycle'Level' Register'Transfer'Level' Algorithm'and'ISA' Development' Design'Space' Explora8on' Area/Energy/Timing'Valida8on' and' Prototype'Development'

MATLAB/Python'Algorithm'or' C++'InstrucGon'Set'Simulator' C++'Computer'Architecture'' SimulaGon'Framework' (ObjectFOriented)' Verilog'or'VHDL'Design'with' EDA'Toolflow' (ConcurrentFStructural)'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-21
SLIDE 21

Computer(Architecture(Research(Frameworks(

9'/'100''

FuncGonal'Level' Cycle'Level' Register'Transfer'Level' Algorithm'and'ISA' Development' Design'Space' Explora8on' Area/Energy/Timing'Valida8on' and' Prototype'Development'

Different'languages,'' pa>erns,'and'tools! The'Computer'Architecture' Research'Methodology'Gap

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-22
SLIDE 22

Great(Ideas(From(Prior(Work(

10'/'100''

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

  • ConcurrentVStructural'Modeling'

(Liberty,'Cascade,'SystemC)! !

Consistent'interfaces'across'abstracGons' '

slide-23
SLIDE 23

Great(Ideas(From(Prior(Work(

10'/'100''

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

  • ConcurrentVStructural'Modeling'

(Liberty,'Cascade,'SystemC)! !

  • Unified'Modeling'Languages'

(SystemC)' '

Consistent'interfaces'across'abstracGons' ' ' Unified'design'environment'for'FL,'CL,'RTL' ' '

slide-24
SLIDE 24

Great(Ideas(From(Prior(Work(

10'/'100''

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

  • ConcurrentVStructural'Modeling'

(Liberty,'Cascade,'SystemC)! !

  • Unified'Modeling'Languages'

(SystemC)' '

  • Hardware'Genera8on'Languages'

(Chisel,'Genesis2,'BlueSpec,'MyHDL)' '

Consistent'interfaces'across'abstracGons' ' ' Unified'design'environment'for'FL,'CL,'RTL' ' ' ProducGve'RTL'design'space'exploraGon' ' '

slide-25
SLIDE 25

Great(Ideas(From(Prior(Work(

10'/'100''

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

  • ConcurrentVStructural'Modeling'

(Liberty,'Cascade,'SystemC)! !

  • Unified'Modeling'Languages'

(SystemC)' '

  • Hardware'Genera8on'Languages'

(Chisel,'Genesis2,'BlueSpec,'MyHDL)' '

  • HDLVIntegrated'Simula8on'Frameworks'

(Cascade)! '

Consistent'interfaces'across'abstracGons' ' ' Unified'design'environment'for'FL,'CL,'RTL' ' ' ProducGve'RTL'design'space'exploraGon' ' ' ProducGve'RTL'validaGon'and'cosimulaGon' ' '

slide-26
SLIDE 26

Great(Ideas(From(Prior(Work(

10'/'100''

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

  • ConcurrentVStructural'Modeling'

(Liberty,'Cascade,'SystemC)! !

  • Unified'Modeling'Languages'

(SystemC)' '

  • Hardware'Genera8on'Languages'

(Chisel,'Genesis2,'BlueSpec,'MyHDL)' '

  • HDLVIntegrated'Simula8on'Frameworks'

(Cascade)! !

  • LatencyVInsensi8ve'Interfaces'

(Liberty,'BlueSpec)'

Consistent'interfaces'across'abstracGons' ' ' Unified'design'environment'for'FL,'CL,'RTL' ' ' ProducGve'RTL'design'space'exploraGon' ' ' ProducGve'RTL'validaGon'and'cosimulaGon' ' ' Component'and'test'bench'reuse'

slide-27
SLIDE 27

Outline(

11'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

The'Computer'Architecture' Research'Methodology'Gap' The'PerformanceF ProducGvity'Gap'

PyMTL' SimJIT'

slide-28
SLIDE 28

What(is(PyMTL?(

12'/'39'

  • A'Python'DSEL'for'concurrentFstructural'hardware'modeling'

Model'DSEL'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-29
SLIDE 29

What(is(PyMTL?(

12'/'39'

'

  • A'Python'DSEL'for'concurrentFstructural'hardware'modeling'
  • A'Python'API'for'analyzing'models'described'in'the'PyMTL'DSEL'

'

API'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Model'DSEL'

slide-30
SLIDE 30

What(is(PyMTL?(

12'/'39'

'

  • A'Python'DSEL'for'concurrentFstructural'hardware'modeling'
  • A'Python'API'for'analyzing'models'described'in'the'PyMTL'DSEL'
  • A'Python'tool'for'simulaGng'PyMTL'FL,'CL,'and'RTL'models'

API' SimulaGon' Tool'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Model'DSEL'

slide-31
SLIDE 31

What(is(PyMTL?(

12'/'39'

'

  • A'Python'DSEL'for'concurrentFstructural'hardware'modeling'
  • A'Python'API'for'analyzing'models'described'in'the'PyMTL'DSEL'
  • A'Python'tool'for'simulaGng'PyMTL'FL,'CL,'and'RTL'models'
  • A'Python'tool'for'translaGng'PyMTL'RTL'models'into'Verilog'

API' SimulaGon' Tool' TranslaGon' Tool'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Model'DSEL'

slide-32
SLIDE 32

What(is(PyMTL?(

12'/'39'

'

  • A'Python'DSEL'for'concurrentFstructural'hardware'modeling'
  • A'Python'API'for'analyzing'models'described'in'the'PyMTL'DSEL'
  • A'Python'tool'for'simulaGng'PyMTL'FL,'CL,'and'RTL'models'
  • A'Python'tool'for'translaGng'PyMTL'RTL'models'into'Verilog'
  • A'Python'tesGng'framework'for'model'validaGon'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

API' SimulaGon' Tool' TranslaGon' Tool' Model'DSEL' TesGng'Framework'

slide-33
SLIDE 33

What(Does(PyMTL(Enable?(

13'/'39'

'

  • Incremental'refinement'from'algorithm'to'accelerator'implementaGon'

FL' Model' Test' Harness'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-34
SLIDE 34

What(Does(PyMTL(Enable?(

13'/'39'

  • Incremental'refinement'from'algorithm'to'accelerator'implementaGon'

FL' Model' Test' Harness' CL' Model' Test' Harness'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-35
SLIDE 35

What(Does(PyMTL(Enable?(

13'/'39'

  • Incremental'refinement'from'algorithm'to'accelerator'implementaGon'

FL' Model' Test' Harness' CL' Model' Test' Harness' RTL' Model' Test' Harness'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-36
SLIDE 36

What(Does(PyMTL(Enable?(

14'/'39'

  • Incremental'refinement'from'algorithm'to'accelerator'implementaGon'
  • Automated'tesGng'and'integraGon'of'PyMTLFgenerated'Verilog'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

FL' Model' Test' Harness' CL' Model' Test' Harness' RTL' Model' Test' Harness'

slide-37
SLIDE 37

What(Does(PyMTL(Enable?(

14'/'39'

  • Incremental'refinement'from'algorithm'to'accelerator'implementaGon'
  • Automated'tesGng'and'integraGon'of'PyMTLFgenerated'Verilog'

FL' Model' Test' Harness' CL' Model' Test' Harness' RTL' Model' Test' Harness' Verilog' RTL' Model' Verilog' RTL' Model' Test' Harness'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-38
SLIDE 38

What(Does(PyMTL(Enable?(

15'/'39'

'

  • Incremental'refinement'from'algorithm'to'accelerator'implementaGon'
  • Automated'tesGng'and'integraGon'of'PyMTLFgenerated'Verilog'
  • MulGFlevel'coFsimulaGon'of'FL,'CL,'and'RTL'models'

FL' Model' CL' Model' RTL' Model'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-39
SLIDE 39

What(Does(PyMTL(Enable?(

16'/'39'

'

  • Incremental'refinement'from'algorithm'to'accelerator'implementaGon'
  • Automated'tesGng'and'integraGon'of'PyMTLFgenerated'Verilog'
  • MulGFlevel'coFsimulaGon'of'FL,'CL,'and'RTL'models'
  • ConstrucGon'of'highlyFparameterized'RTL'chip'generators'

Verilog' RTL' Model'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-40
SLIDE 40

What(Does(PyMTL(Enable?(

17'/'39'

'

  • Incremental'refinement'from'algorithm'to'accelerator'implementaGon'
  • Automated'tesGng'and'integraGon'of'PyMTLFgenerated'Verilog'
  • MulGFlevel'coFsimulaGon'of'FL,'CL,'and'RTL'models'
  • ConstrucGon'of'highlyFparameterized'RTL'chip'generators'
  • Embedding'within'C++'frameworks'&'integraGon'of'C++/Verilog'models'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

' gem5' PyMTL' C++' Model' PyMTL' Verilog' Model'

slide-41
SLIDE 41

What(Does(PyMTL(Enable?(

18'/'39'

'

  • Incremental'refinement'from'algorithm'to'accelerator'implementaGon'
  • Automated'tesGng'and'integraGon'of'PyMTLFgenerated'Verilog'
  • MulGFlevel'coFsimulaGon'of'FL,'CL,'and'RTL'models'
  • ConstrucGon'of'highlyFparameterized'RTL'chip'generators'
  • Embedding'within'C++'frameworks'&'integraGon'of'C++/Verilog'models'

(see!Srinath!et.!al.!in!MICRO247,!Session!6B!)!

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

' gem5' PyMTL' C++' Model' PyMTL' Verilog' Model'

slide-42
SLIDE 42

The(PyMTL(Framework(

19'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Model'

Specifica8on' Tools' Output'

slide-43
SLIDE 43

The(PyMTL(Framework(

19'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Model' Config' Elaborator' Model' Instance'

Specifica8on' Tools' Output'

slide-44
SLIDE 44

The(PyMTL(Framework(

19'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Model' Config' Test'&'Sim' Harness' Traces'&' VCD' Elaborator' SimulaGon' Tool' Model' Instance'

Specifica8on' Tools' Output'

slide-45
SLIDE 45

The(PyMTL(Framework(

19'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Model' Config' Test'&'Sim' Harness' Verilog' Traces'&' VCD' Elaborator' SimulaGon' Tool' TranslaGon' Tool' Model' Instance' EDA' Toolflow'

Specifica8on' Tools' Output'

slide-46
SLIDE 46

The(PyMTL(Framework(

19'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Model' Config' Test'&'Sim' Harness' Verilog' Traces'&' VCD' User'Tool' Output' Elaborator' SimulaGon' Tool' TranslaGon' Tool' User' Tool' Model' Instance' EDA' Toolflow'

Specifica8on' Tools' Output'

slide-47
SLIDE 47

The(PyMTL(Framework(

19'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Model' Config' Test'&'Sim' Harness' Verilog' Traces'&' VCD' User'Tool' Output' Elaborator' SimulaGon' Tool' TranslaGon' Tool' User' Tool' Model' Instance' EDA' Toolflow'

Specifica8on' Tools' Output'

VisualizaGon' StaGc' Analysis' Dynamic' Checking' FPGA' SimulaGon' High'Level' Synthesis'

slide-48
SLIDE 48

The(PyMTL(DSEL(

20'/'39'

def sorter_network( input ):! return sorted( input )!

!

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

['3,'1,'2,'0']' ['0,'1,'2,'3']'

f(x)'

slide-49
SLIDE 49

The(PyMTL(DSEL(

21'/'39'

def sorter_network( input ):! return sorted( input )! ! class SorterNetworkFL( Model )!

'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

f(x)'

['3,'1,'2,'0']' ['0,'1,'2,'3']'

f(x)'

slide-50
SLIDE 50

The(PyMTL(DSEL(

21'/'39'

def sorter_network( input ):! return sorted( input )! ! class SorterNetworkFL( Model )! def __init__( s, nbits, nports ):! type = Bits( nbits )! s.in_ = InPort [nports]( type )! s.out = OutPort[nports]( type )! '

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

f(x)'

['3,'1,'2,'0']' ['0,'1,'2,'3']'

f(x)'

slide-51
SLIDE 51

The(PyMTL(DSEL(

21'/'39'

def sorter_network( input ):! return sorted( input )! ! class SorterNetworkFL( Model )! def __init__( s, nbits, nports ):! ! s.in_ = InPort [nports](nbits)! s.out = OutPort[nports](nbits)! '

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

f(x)'

['3,'1,'2,'0']' ['0,'1,'2,'3']'

f(x)'

slide-52
SLIDE 52

The(PyMTL(DSEL(

21'/'39'

def sorter_network( input ):! return sorted( input )! ! class SorterNetworkFL( Model )! def __init__( s, nbits, nports ):! ! s.in_ = InPort [nports](nbits)! s.out = OutPort[nports](nbits)! ! ! @s.tick_fl! def logic():! '

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

f(x)'

['3,'1,'2,'0']' ['0,'1,'2,'3']'

f(x)'

slide-53
SLIDE 53

The(PyMTL(DSEL(

21'/'39'

def sorter_network( input ):! return sorted( input )! ! class SorterNetworkFL( Model )! def __init__( s, nbits, nports ):! ! s.in_ = InPort [nports](nbits)! s.out = OutPort[nports](nbits)! ! ! @s.tick_fl! def logic():! for i, v in enumerate( sorted( s.in_ ) ):! s.out[i].next = v'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

f(x)'

['3,'1,'2,'0']' ['0,'1,'2,'3']'

f(x)'

slide-54
SLIDE 54

The(PyMTL(DSEL(

22'/'39'

def sorter_network( input ):! return sorted( input )! ! class SorterNetworkCL( Model )! def __init__( s, nbits, nports, delay=3 ):! ! s.in_ = InPort [nports](nbits)! s.out = OutPort[nports](nbits)! ! ! @s.tick_cl! def logic():! '

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

f(x)'

['3,'1,'2,'0']' ['0,'1,'2,'3']'

f(x)'

slide-55
SLIDE 55

The(PyMTL(DSEL(

22'/'39'

def sorter_network( input ):! return sorted( input )! ! class SorterNetworkCL( Model )! def __init__( s, nbits, nports, delay=3 ):! ! s.in_ = InPort [nports](nbits)! s.out = OutPort[nports](nbits)! s.pipe = Pipeline( delay )! ! @s.tick_cl! def logic():! s.pipe.xtick()! s.pipe.push( sorted( s.in_ ) )! ! if s.pipe.ready():! for i, v in enumerate( s.pipe.pop() ):! s.out[i].next = v! ! '

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

f(x)'

['3,'1,'2,'0']' ['0,'1,'2,'3']'

f(x)'

slide-56
SLIDE 56

The(PyMTL(DSEL(

23'/'39'

def sorter_network( input ):! return sorted( input )! ! class SorterNetworkRTL( Model )! def __init__( s, nbits ):! ! s.in_ = InPort [4](nbits)! s.out = OutPort[4](nbits)! ! '

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

['3,'1,'2,'0']' ['0,'1,'2,'3']'

f(x)'

slide-57
SLIDE 57

The(PyMTL(DSEL(

23'/'39'

def sorter_network( input ):! return sorted( input )! ! class SorterNetworkRTL( Model )! def __init__( s, nbits ):! ! s.in_ = InPort [4](nbits)! s.out = OutPort[4](nbits)! ! s.m = m = MinMaxRTL[5](nbits)! ! ! '

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

['3,'1,'2,'0']' ['0,'1,'2,'3']'

f(x)'

slide-58
SLIDE 58

The(PyMTL(DSEL(

23'/'39'

def sorter_network( input ):! return sorted( input )! ! class SorterNetworkRTL( Model )! def __init__( s, nbits ):! ! s.in_ = InPort [4](nbits)! s.out = OutPort[4](nbits)! ! s.m = m = MinMaxRTL[5](nbits)! ! s.connect( s.in_[0], m[0].in_[0] )! s.connect( s.in_[1], m[0].in_[1] )! s.connect( s.in_[2], m[1].in_[0] )! s.connect( s.in_[3], m[2].in_[1] )! ! . . .! '

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

['3,'1,'2,'0']' ['0,'1,'2,'3']'

f(x)'

slide-59
SLIDE 59

The(PyMTL(DSEL(

24'/'39'

class MinMaxRTL( Model )! def __init__( s, nbits ):! s.in_ = InPort [2](nbits)! s.out = OutPort[2](nbits)! @s.combinational! def logic():! swap = s.in_[0] > s.in_[1]! s.out[0].value = s.in[1] if swap else s.in[0]! s.out[1].value = s.in[0] if swap else s.in[1]! !

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-60
SLIDE 60

The(PyMTL(DSEL(

24'/'39'

class MinMaxRTL( Model )! def __init__( s, nbits ):! s.in_ = InPort [2](nbits)! s.out = OutPort[2](nbits)! @s.combinational! def logic():! swap = s.in_[0] > s.in_[1]! s.out[0].value = s.in[1] if swap else s.in[0]! s.out[1].value = s.in[0] if swap else s.in[1]! ! class RegRTL( Model )! def __init__( s, nbits ):! s.in_ = InPort (nbits)! s.out = OutPort(nbits)! @s.tick_rtl! def logic():! s.out.next = s.in_! '

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-61
SLIDE 61

The(PyMTL(DSEL(

25'/'39'

TesGng'of'SorterFL,'SorterCL,'and'SorterRTL'can'be'greatly' simplified'by'using'latencyFinsensiGve'interfaces.' ' ' '

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-62
SLIDE 62

The(PyMTL(DSEL(

25'/'39'

TesGng'of'SorterFL,'SorterCL,'and'SorterRTL'can'be'greatly' simplified'by'using'latencyFinsensiGve'interfaces.'

Produc8vity'helpers:'

' ' '

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

  • MemoryProxies'
  • QueueAdapters'
  • PortBundles'
  • BitStructs'
  • TestSource'
  • TestSink'

' ' '

slide-63
SLIDE 63

The(PyMTL(DSEL(

25'/'39'

TesGng'of'SorterFL,'SorterCL,'and'SorterRTL'can'be'greatly' simplified'by'using'latencyFinsensiGve'interfaces.'

Produc8vity'helpers:'

' ' '

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

' ' See'the'paper'for'' more'examples!'

  • MemoryProxies'
  • QueueAdapters'
  • PortBundles'
  • BitStructs'
  • TestSource'
  • TestSink'

' ' '

slide-64
SLIDE 64

Why(Python?(

26'/'39'

Benefits:' '

  • Modern'language'features'enable'rapid'prototyping'

(dynamicFtyping,'reflecGon,'metaprogramming)'

  • Lightweight,'pseudocodeFlike'syntax'
  • BuiltFin'support'for'integraGng'C/C++'code'
  • Large,'acGve'developer'and'support'community'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-65
SLIDE 65

Why(Python?(

26'/'39'

Benefits:' '

  • Modern'language'features'enable'rapid'prototyping'

(dynamicFtyping,'reflecGon,'metaprogramming)'

  • Lightweight,'pseudocodeFlike'syntax'
  • BuiltFin'support'for'integraGng'C/C++'code'
  • Large,'acGve'developer'and'support'community'

' Drawbacks:' '

  • Performance'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-66
SLIDE 66

Outline(

27'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

The'Computer'Architecture' Research'Methodology'Gap' The'PerformanceF ProducGvity'Gap'

PyMTL' SimJIT'

slide-67
SLIDE 67

PerformanceHProductivity(Gap(

28'/'39'

Experiment:' '

  • Simple'8x8'Mesh'Network'Model'

'

  • CycleFPrecise'CL'Model:'
  • PyMTL'Model'Simulated'with'the'CPython'Interpreter'
  • HandFWri>en'C++'Model'and'Simulator'

'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-68
SLIDE 68

PerformanceHProductivity(Gap(

28'/'39'

Experiment:' '

  • Simple'8x8'Mesh'Network'Model'

'

  • CycleFPrecise'CL'Model:'
  • PyMTL'Model'Simulated'with'the'CPython'Interpreter'
  • HandFWri>en'C++'Model'and'Simulator'
  • BitFAccurate'RTL'Model:'
  • PyMTL'Model'Simulated'with'CPython'Interpreter'
  • HandFWri>en'Verilog'RTL'Simulated'with'Verilator'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-69
SLIDE 69

PerformanceHProductivity(Gap(

29'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

RTL Network CL Network

Simulated Cycles Simulated Cycles

1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x

CPython

1K 10K 100K 1M 10x 5x 1x

CPython

slide-70
SLIDE 70

PerformanceHProductivity(Gap(

29'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

RTL Network CL Network

Simulated Cycles Simulated Cycles

1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x

CPython Verilator C++

1K 10K 100K 1M 10x 5x 1x

CPython

slide-71
SLIDE 71

RTL Network CL Network

Simulated Cycles Simulated Cycles

1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x

CPython Verilator C++

1K 10K 100K 1M 10x 5x 1x

CPython

PerformanceHProductivity(Gap(

29'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Performance'degrada8on'due'to'Compila8on'

slide-72
SLIDE 72

RTL Network CL Network

Simulated Cycles Simulated Cycles

1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x

CPython Verilator C++

1K 10K 100K 1M 10x 5x 1x

CPython

PerformanceHProductivity(Gap(

29'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Short'Simula8ons:'LargeVCompila8on'Overhead'

slide-73
SLIDE 73

RTL Network CL Network

Simulated Cycles Simulated Cycles

1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x

CPython Verilator C++

1K 10K 100K 1M 10x 5x 1x

CPython

PerformanceHProductivity(Gap(

29'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

Long'Simula8ons:'Compila8on'Overhead'Amor8zed'

slide-74
SLIDE 74

RTL Network CL Network

Simulated Cycles Simulated Cycles

1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x

CPython Verilator C++

1K 10K 100K 1M 10x 5x 1x

CPython

PerformanceHProductivity(Gap(

29'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

300x' 1200x'

slide-75
SLIDE 75

PerformanceHProductivity(Gap(

30'/'39'

Python'is'growing'in'popularity'in'many'domains'of'scienGfic'and' highFperformance'compuGng.''How'do'they'close'this'gap?'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-76
SLIDE 76

PerformanceHProductivity(Gap(

30'/'39'

Python'is'growing'in'popularity'in'many'domains'of'scienGfic'and' highFperformance'compuGng.''How'do'they'close'this'gap?'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

  • PythonVWrapped'C/C++'Libraries'

(NumPy,!CVXOPT,!NLPy,!pythonOCC,!GEM5)'

'

  • Numerical'JustVInVTime'Compilers'

(Numba,!Parakeet)! !

  • JustVInVTime'Compiled'Interpreters'

(PyPy,!Pyston)!

  • Selec8ve'Embedded'JustVInVTime'Specializa8on'

(SEJITS)!

slide-77
SLIDE 77

PerformanceHProductivity(Gap(

30'/'39'

Python'is'growing'in'popularity'in'many'domains'of'scienGfic'and' highFperformance'compuGng.''How'do'they'close'this'gap?'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

  • PythonVWrapped'C/C++'Libraries'

(NumPy,!CVXOPT,!NLPy,!pythonOCC,!GEM5)'

'

  • Numerical'JustVInVTime'Compilers'

(Numba,!Parakeet)! !

  • JustVInVTime'Compiled'Interpreters'

(PyPy,!Pyston)!

  • Selec8ve'Embedded'JustVInVTime'Specializa8on'

(SEJITS)!

slide-78
SLIDE 78

RTL Network CL Network

Simulated Cycles Simulated Cycles

1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x

CPython Verilator C++

1K 10K 100K 1M 10x 5x 1x

CPython

PerformanceHProductivity(Gap(

31'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

300x' 1200x'

slide-79
SLIDE 79

RTL Network CL Network

Simulated Cycles Simulated Cycles

1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x

CPython PyPy Verilator C++

1K 10K 100K 1M 10x 5x 1x

CPython PyPy

PerformanceHProductivity(Gap(

31'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

30x' 240x'

slide-80
SLIDE 80

Outline(

32'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

The'Computer'Architecture' Research'Methodology'Gap' The'PerformanceF ProducGvity'Gap'

PyMTL' SimJIT'

slide-81
SLIDE 81

PyMTL(SimJIT(Architecture(

33'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

PyMTL' RTL'Model' Instance'

TranslaGon' Verilog' Source'

SimJITFRTL'Tool'

slide-82
SLIDE 82

PyMTL(SimJIT(Architecture(

33'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

PyMTL' RTL'Model' Instance'

TranslaGon' Verilator' Verilog' Source' RTL'C++' Source'

SimJITFRTL'Tool'

slide-83
SLIDE 83

PyMTL(SimJIT(Architecture(

33'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

PyMTL' RTL'Model' Instance'

TranslaGon' Verilator' Verilog' Source' RTL'C++' Source' C'Interface' Source'

SimJITFRTL'Tool'

slide-84
SLIDE 84

PyMTL(SimJIT(Architecture(

33'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

PyMTL' RTL'Model' Instance'

TranslaGon' Verilator' LLVM/GCC' Verilog' Source' RTL'C++' Source' C'Interface' Source' C'Shared' Library'

SimJITFRTL'Tool'

slide-85
SLIDE 85

PyMTL(SimJIT(Architecture(

33'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

PyMTL' RTL'Model' Instance'

TranslaGon' Verilator' LLVM/GCC' Wrapper' Gen' Verilog' Source'

PyMTL' CFFI'Model' Instance'

RTL'C++' Source' C'Interface' Source' C'Shared' Library'

SimJITFRTL'Tool'

slide-86
SLIDE 86

PyMTL(SimJIT(Architecture(

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

PyMTL' RTL'Model' Instance'

TranslaGon' Verilator' LLVM/GCC' Wrapper' Gen' Verilog' Source'

PyMTL' CFFI'Model' Instance'

RTL'C++' Source' C'Interface' Source' C'Shared' Library' TranslaGon' Cache'

SimJITFRTL'Tool'

33'/'39'

slide-87
SLIDE 87

PyMTL(SimJIT(Architecture(

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

PyMTL' RTL'Model' Instance'

TranslaGon' Verilator' LLVM/GCC' Wrapper' Gen' Verilog' Source'

PyMTL' CFFI'Model' Instance'

RTL'C++' Source' C'Interface' Source' C'Shared' Library' TranslaGon' Cache'

SimJITFRTL'Tool'

33'/'39'

Fairly'robust,'ready'for'use'in'research!'

slide-88
SLIDE 88

PyMTL(SimJIT(Architecture(

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

34'/'39' PyMTL' CL'Model' Instance'

TranslaGon' LLVM/GCC' Wrapper' Gen'

PyMTL' CFFI'Model' Instance'

CL'C++' Source' C'Interface' Source' C'Shared' Library'

SimJITFCL'Tool'

Just'a'prototype!'

slide-89
SLIDE 89

RTL Network CL Network

Simulated Cycles Simulated Cycles

1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x

CPython PyPy Verilator C++

1K 10K 100K 1M 10x 5x 1x

CPython PyPy

PerformanceHProductivity(Gap(

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

30x' 240x'

35'/'39'

slide-90
SLIDE 90

RTL Network CL Network

Simulated Cycles Simulated Cycles

1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x

CPython PyPy SimJIT-CL SimJIT-RTL Verilator C++

1K 10K 100K 1M 10x 5x 1x

CPython PyPy

PyMTL(SimJIT(Performance(

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

10x' 20x'

35'/'39'

slide-91
SLIDE 91

RTL Network CL Network

Simulated Cycles Simulated Cycles

1K 10K 100K 1M 10M 1x 5x 10x 30x 75x 150x 300x 60x 200x 1000x

CPython PyPy SimJIT-CL SimJIT-RTL SimJIT-CL+PyPy SimJIT-RTL+PyPy Verilator C++

1K 10K 100K 1M 10x 5x 1x

CPython PyPy

PyMTL(SimJIT(Performance(

36'/'39'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

4.5x' 6x'

slide-92
SLIDE 92

PyMTL(SimJIT(Performance(

37'/'39'

OpportuniGes'to'further'reduce'the'performance'gap:' '

  • Reduce'overhead'of'PythonVtoVC++'interfaces'
  • OpGmized'(nonFPython)'event'queue'
  • Be>er'code'generaGon'
  • Be>er'event'queue'scheduling'
  • Removal'of'unnecessary'doubleFbuffering'
  • Parallel'simulaGon'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-93
SLIDE 93

Contributions(

38'/'39'

PyMTL'is'a'producGve'Python'framework'for'FL,'CL,'and'RTL' modeling,'enabling:'

  • VerGcally'Integrated'Computer'Architecture'Research'
  • Accelerator'Design'Space'ExploraGon'
  • ConstrucGon'of'Flexible'RTL'Chip'Generators'

' SimJIT'considerably'closes'the'performanceFproducGvity'gap' between'Python'and'C++'simulaGons.'

  • 72x'Speedup'over'CPython'for'SimJITFCL'(within'4.5x'of'C++)'
  • 200x'Speedup'over'CPython'for'SimJITFRTL'(within'6x'of'Verilator)'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

slide-94
SLIDE 94

Conclusion(

39'/'39'

PyMTL'is'a'producGve,'openVsource'Python'framework'for' FL/CL/RTL'modeling'and'hardware'design.' ' ' heps://github.com/cornellVbrg/pymtl' '

Thank'you'to'our'sponsors'for'their'support:'' NSF,'DARPA,'and'donaGons'from'Intel'CorporaGon'and'Synopsys,'Inc.'

PyMTL:'A'Unified'Framework'for'Ver8cally'Integrated' Computer'Architecture'Research'

PyMTL PyMTL

'