SLIDE 1 Andrés Amaya García
Reinventing a parallel machine from the past
aa1399@bristol.ac.uk
SLIDE 2 About me… About me…
▸ Andrés Amaya García ▸ Graduated from the University of Bristol in 2015 ▸ MEng Computer Science
SLIDE 3
“
Once upon a time, nine (rainy) months ago…
SLIDE 4
The The Transputer Transputer
SLIDE 5 Transistor Transputer System
SLIDE 6
Transputer Transputer applications applications
SLIDE 7 Transputer Transputer applications applications
Amstrad TV set-top box contains ST20
SLIDE 8 Transputer Transputer applications applications
Amstrad TV set-top box contains ST20 HETE-2 contains T805 Transputers
SLIDE 9
Project objectives Project objectives
SLIDE 10 Project objectives Project objectives
▸ Open-source.
SLIDE 11 Project objectives Project objectives
▸ Open-source. ▸ Supports Transputer instruction set.
SLIDE 12 Project objectives Project objectives
▸ Open-source. ▸ Supports Transputer instruction set. ▸ Different micro-architecture.
SLIDE 13 The Internet
(IoT) is all about connectivity!
SLIDE 14 Project objectives Project objectives
▸ Open-source. ▸ Supports Transputer instruction set. ▸ Different micro-architecture.
SLIDE 15 Project objectives Project objectives
▸ Open-source. ▸ Supports Transputer instruction set. ▸ Different micro-architecture. ▸ New external communication mechanism and I/O interface.
SLIDE 16
How does the How does the Transputer Transputer work? work?
SLIDE 17 Occam Occam
▸ Occam is a high-level programming language developed at Inmos hand-in-hand with the Transputer. ▸ Explicit concurrency and interprocess communication.
SLIDE 18 Occam Occam
▸ Occam is a high-level programming language developed at Inmos hand-in-hand with the Transputer. ▸ Explicit concurrency and interprocess communication.
SLIDE 19
“
Microcomputer Division Confidential Author: Roger Shepherd
W H I L E a c t i v e
YA L I N T i n t e r r u p t a b l e I S G o t o S N P B i t \ / ( l O B i t \ / ( M o v e B i t \ / ( T i m e l n s B i t \ / T i m e D e l B i t ) ) ) :
S E Q
— c o m p l e t e d i n d i c a t e s 1 £ c u r r e n t I n s t r u c t i o n h a s t e r m i n a t e d
c o m p l e t e d : = ( S t a t u s R e g / \ I n t e r r u p t a b l e ) = 0 v a l l d P r o c e s s : = W p t r < > N o t P r o c e s s . p
P R I A L T
(StatusReg /\ GotoSNPBit) <> 0 & SKIP
S t a r t N e x t P r o c e s s ( )
(Priority = 0) AND (NOT (TNextReg[0] AFTER ClockReg[0])) AND
c o m p l e t e d & S K I P
HandleTlmerReguest (0)
A L T h e = 0 F O R L l n k C h a n s
( P r i o r i t y = 0 ) A N D c o m p l e t e d & F r o m C h a n [ h c ] [ 0 ] ? t o k e n
H a n d l e C h a n n e l R e q u e s t ( t o k e n , h e ) ( P r i o r i t y = 1 ) A N D
(NOT (TNextRegCO] AFTER ClockReg[0])) & SKIP HandleTlmerReguest (0)
A L T h e = F O R L l n k C h a n s
( P r i o r i t y = 1 ) & F r o m C h a n [ h c ] [ 0 ] ? t o k e n H a n d l e C h a n n e l R e q u e s t ( t o k e n , h e )
(Priority = 1) AND (NOT (TNextReg[l] AFTER ClockRog[l]))AND
c o m p l e t e d & S K I P H a n d l e T l m e r R e g u e s t ( 1 )
A L T h e = F O R L l n k C h a n s
( P r i o r i t y = 1 ) A N D c o m p l e t e d & F r o m C h a n [ h c ] [ 1 ] ? t o k e n
H a n d l e C h a n n e l R e q u e s t ( t o k e n , h e )
v a l l d P r o c e s s & S K I P
I F
( S t a t u s R e g / \ Ti m e D e l B i t ) < > 0
D e l e t e M l d d l e S t e p ( B r e g , C r e g )
( S t a t u s R e g / \ T l m e l n s B l t ) < > 0 I n s e r t M l d d l e S t e p ( A r e g , B r e g , C r e g ) ( S t a t u s R e g / \ M o v e B i t ) < > 0 B l o c k M o v e M l d d l e S t e p ( C r e g , B r e g , A r e g )
T R U E
SEQ
B u l l d N e x t I n s t r u c t i o n ( I p t r R e g , O r e g , c o d e )
I F
c o d e < > f . o p r
P r i m a r y ( c o d e )
c o d e = £ . o p r S e c o n d a r y ( O r e g ) Oreg : =s 0
Prioritised scheduiing
The execution of a low priority process can be interrupted when a high priority process becomes runnable as
defined above. In particular certain instructions are interruptable: move message // input message // output message // Microcomputer Division Confidential
2 0
Restricted Document September 27,1988
The Devil is in the detail
SLIDE 20
“
Microcomputer Division Confidential Author: Roger Shepherd
W H I L E a c t i v e
YA L I N T i n t e r r u p t a b l e I S G o t o S N P B i t \ / ( l O B i t \ / ( M o v e B i t \ / ( T i m e l n s B i t \ / T i m e D e l B i t ) ) ) :
S E Q
— c o m p l e t e d i n d i c a t e s 1 £ c u r r e n t I n s t r u c t i o n h a s t e r m i n a t e d
c o m p l e t e d : = ( S t a t u s R e g / \ I n t e r r u p t a b l e ) = 0 v a l l d P r o c e s s : = W p t r < > N o t P r o c e s s . p
P R I A L T
(StatusReg /\ GotoSNPBit) <> 0 & SKIP
S t a r t N e x t P r o c e s s ( )
(Priority = 0) AND (NOT (TNextReg[0] AFTER ClockReg[0])) AND
c o m p l e t e d & S K I P
HandleTlmerReguest (0)
A L T h e = 0 F O R L l n k C h a n s
( P r i o r i t y = 0 ) A N D c o m p l e t e d & F r o m C h a n [ h c ] [ 0 ] ? t o k e n
H a n d l e C h a n n e l R e q u e s t ( t o k e n , h e ) ( P r i o r i t y = 1 ) A N D
(NOT (TNextRegCO] AFTER ClockReg[0])) & SKIP HandleTlmerReguest (0)
A L T h e = F O R L l n k C h a n s
( P r i o r i t y = 1 ) & F r o m C h a n [ h c ] [ 0 ] ? t o k e n H a n d l e C h a n n e l R e q u e s t ( t o k e n , h e )
(Priority = 1) AND (NOT (TNextReg[l] AFTER ClockRog[l]))AND
c o m p l e t e d & S K I P H a n d l e T l m e r R e g u e s t ( 1 )
A L T h e = F O R L l n k C h a n s
( P r i o r i t y = 1 ) A N D c o m p l e t e d & F r o m C h a n [ h c ] [ 1 ] ? t o k e n
H a n d l e C h a n n e l R e q u e s t ( t o k e n , h e )
v a l l d P r o c e s s & S K I P
I F
( S t a t u s R e g / \ Ti m e D e l B i t ) < > 0
D e l e t e M l d d l e S t e p ( B r e g , C r e g )
( S t a t u s R e g / \ T l m e l n s B l t ) < > 0 I n s e r t M l d d l e S t e p ( A r e g , B r e g , C r e g ) ( S t a t u s R e g / \ M o v e B i t ) < > 0 B l o c k M o v e M l d d l e S t e p ( C r e g , B r e g , A r e g )
T R U E
SEQ
B u l l d N e x t I n s t r u c t i o n ( I p t r R e g , O r e g , c o d e )
I F
c o d e < > f . o p r
P r i m a r y ( c o d e )
c o d e = £ . o p r S e c o n d a r y ( O r e g ) Oreg : =s 0
Prioritised scheduiing
The execution of a low priority process can be interrupted when a high priority process becomes runnable as
defined above. In particular certain instructions are interruptable: move message // input message // output message // Microcomputer Division Confidential
2 0
Restricted Document September 27,1988
The Devil is in the detail
SLIDE 21 Front Back A B C Workspace pointer Instruction pointer Operand Scheduling registers Running process registers Process X (queued process) Process Y (queued process) Process Z (running process) Workspaces Instructions
SLIDE 22 Interprocess Interprocess communication communication
▸ Special Transputer instructions implement Occam primitives efficiently. ▸ Communication performed either through channel in memory or physical links.
SLIDE 23
But I want to But I want to hear about the hear about the OpenTransputer! OpenTransputer!
SLIDE 24 Processor Processor components components
OpenTransputer CPU 16 input link controllers Output link controllers 15 I/O pin handlers Memory
SLIDE 25 OpenTransputer CPU 16 input link controllers Output link controllers 15 I/O pin handlers Memory
Processor Processor components components
SLIDE 26 OpenTransputer CPU 16 input link controllers Output link controllers 15 I/O pin handlers Memory
Processor Processor components components
SLIDE 27 OpenTransputer CPU 16 input link controllers Output link controllers 15 I/O pin handlers Memory
Processor Processor components components
SLIDE 28 OpenTransputer CPU 16 input link controllers Output link controllers 15 I/O pin handlers Memory
Processor Processor components components
SLIDE 29 CPU CPU
Control Unit Fetch Unit Datapath (Register file, AU, LU, Memory addressing, etc.) Memory
SLIDE 30 Control Unit Fetch Unit Datapath (Register file, AU, LU, Memory addressing, etc.) Memory
CPU CPU
SLIDE 31 Control Unit Fetch Unit Datapath (Register file, AU, LU, Memory addressing, etc.) Memory
CPU CPU
SLIDE 32 CPU CPU
Control Unit Fetch Unit Datapath (Register file, AU, LU, Memory addressing, etc.) Memory
SLIDE 33 Control unit Control unit
▸ Microcoded control unit. ▸ Control signals stored in Read-Only Memory (ROM). ▸ Area savings and potentially less complex. ▸ Designed with a microcode strategy. ▸ Control signals generated by hardwired logic. ▸ Potentially faster than ROM.
Inmos Transputer Inmos Transputer OpenTransputer OpenTransputer
SLIDE 34 Control unit Control unit
Microinstructions (human-readable)
SLIDE 35 Control unit Control unit
Microinstructions (human-readable) Verilog HDL (Not so human-readable) Run tools
SLIDE 36 Control unit Control unit
Microinstructions (human-readable) Verilog HDL (Not so human-readable) Control Unit Run tools Integrate
SLIDE 37 OpenTransputer OpenTransputer microinstructions microinstructions
CJ0 ¡ROMp0(0) ¡ ¡ ¡ ¡ ¡CmpconstfromA ¡ ¡ ¡ ¡ ¡Condall(CJ1,CJ2); ¡ CJ1 ¡AfromB ¡ ¡ ¡ ¡ ¡BfromC ¡ ¡ ¡ ¡ ¡OfromClear ¡ ¡ ¡ ¡ ¡Gotoplus1; ¡
SLIDE 38 Control Unit Fetch Unit Datapath (Register file, AU, LU, Memory addressing, etc.) Memory
CPU CPU
SLIDE 39
Inmos Transputer datapath
SLIDE 40
OpenTransputer datapath
SLIDE 41 OpenTransputer datapath
‘Wider’ datapath
SLIDE 42 OpenTransputer datapath
More parallelism
SLIDE 43 OpenTransputer datapath
Shadow registers
SLIDE 44 OpenTransputer CPU 16 input link controllers Output link controllers 15 I/O pin handlers Memory
Processor Processor components components
SLIDE 45 External External communication communication
Inmos Transputer Inmos Transputer OpenTransputer OpenTransputer
Transputer Transputer Transputer Transputer Transputer Transputer Transputer Transputer Transputer
OpenTransputer OpenTransputer OpenTransputer OpenTransputer
SLIDE 46 ▸ There are 16 input ports and a single output port.
- ▸ Extended the original Transputer controllers to
support virtual channels allowing an arbitrary number of processes to be queued to perform
Autonomous link Autonomous link controllers controllers
SLIDE 47 Message routing Message routing
Route towards edge (RTE) Route towards core (RTC) OpenTransputer (receiver) OpenTransputer (sender) Turning point Layer: 3 RTE: 10 RTC: 00 Layer: 2 RTE: 10 RTC: 00 Layer: 1 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00
1
1 31
0000
4
10
5 15
00
16 26
0011
27 30
Initial address
SLIDE 48 Message routing Message routing
Route towards edge (RTE) Route towards core (RTC) OpenTransputer (receiver) OpenTransputer (sender) Turning point Layer: 3 RTE: 10 RTC: 00 Layer: 2 RTE: 10 RTC: 00 Layer: 1 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00
1
1 31
0000
4
10
5 15
00
16 26
0011
27 30
Initial address
Input port
SLIDE 49 Message routing Message routing
Route towards edge (RTE) Route towards core (RTC) OpenTransputer (receiver) OpenTransputer (sender) Turning point Layer: 3 RTE: 10 RTC: 00 Layer: 2 RTE: 10 RTC: 00 Layer: 1 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00
1
1 31
0000
4
10
5 15
00
16 26
0011
27 30
Initial address
RTE and RTC
SLIDE 50 Message routing Message routing
Route towards edge (RTE) Route towards core (RTC) OpenTransputer (receiver) OpenTransputer (sender) Turning point Layer: 3 RTE: 10 RTC: 00 Layer: 2 RTE: 10 RTC: 00 Layer: 1 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00
1
1 31
0000
4
10
5 15
00
16 26
0011
27 30
Initial address
Depth
SLIDE 51 Message routing Message routing
Route towards edge (RTE) Route towards core (RTC) OpenTransputer (receiver) OpenTransputer (sender) Turning point Layer: 3 RTE: 10 RTC: 00 Layer: 2 RTE: 10 RTC: 00 Layer: 1 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00
1
1 31
0000
4
10
5 15
00
16 26
0011
27 30
Initial address
SLIDE 52 OpenTransputer CPU 16 input link controllers Output link controllers 15 I/O pin handlers Memory
Processor Processor components components
SLIDE 53 ▸ I/O pins expose the same interface as communication links.
- ▸ Introduced an instruction (confio) that can be
used to configure the I/O pins.
OpenTransputer I/O OpenTransputer I/O interface interface
SLIDE 54
So… what about So… what about performance? performance?
SLIDE 55 Synthesis Synthesis
Synthesised design for both the ZedBoard XC7Z020-CLG484 FPGA and a 180nm manufacturing process.
SLIDE 56 Inmos Transputer OpenTransputer Area 64 mm 64 mm2 3.69 mm 3.69 mm2 Manufacturing technology 1000 nm 1000 nm 180 nm 180 nm
Comparing Comparing synthesis results synthesis results
SLIDE 57 Inmos Transputer OpenTransputer Area 64 mm 64 mm2 3.69 mm 3.69 mm2 Manufacturing technology 1000 nm 1000 nm 180 nm 180 nm
Comparing Comparing synthesis results synthesis results
5.6
SLIDE 58 Inmos Transputer OpenTransputer Area 64 mm 64 mm2 3.69 mm 3.69 mm2 Manufacturing technology 1000 nm 1000 nm 180 nm 180 nm
Comparing Comparing synthesis results synthesis results
5.6 4.22
SLIDE 59 Instruction Inmos Transputer OpenTransputer ldl 2 1 startp 12 12 3-5 3-5 endp 13 13 3 move 8 + 2w* 8 + 2w* 6 + 5w* 6 + 5w*
Comparing cycle Comparing cycle counts counts
* w is the number of words to copy.
SLIDE 60 ü OpenTransputer based on the Transputer architecture.
- ü Different micro-architecture to take advantage of
modern manufacturing technologies
- ü New external communication mechanism.
- ü New I/O interface.
Conclusion Conclusion
Andrés Amaya García aa1399@my.bristol.ac.uk
SLIDE 61 ü OpenTransputer based on the Transputer architecture.
- ü Different micro-architecture to take advantage of
modern manufacturing technologies
- ü New external communication mechanism.
- ü New I/O interface.
Conclusion Conclusion Thank you! Thank you!
Andrés Amaya García aa1399@my.bristol.ac.uk