Andrs Amaya Garca aa1399@bristol.ac.uk About me About me Andrs - - PowerPoint PPT Presentation

andr s amaya garc a
SMART_READER_LITE
LIVE PREVIEW

Andrs Amaya Garca aa1399@bristol.ac.uk About me About me Andrs - - PowerPoint PPT Presentation

Reinventing a parallel machine from the past Andrs Amaya Garca aa1399@bristol.ac.uk About me About me Andrs Amaya Garca Graduated from the University of Bristol in 2015 MEng Computer Science


slide-1
SLIDE 1

Andrés Amaya García

Reinventing a parallel machine from the past

aa1399@bristol.ac.uk

slide-2
SLIDE 2

About me… About me…

▸ Andrés Amaya García ▸ Graduated from the University of Bristol in 2015 ▸ MEng Computer Science

slide-3
SLIDE 3

Once upon a time, nine (rainy) months ago…

slide-4
SLIDE 4

The The Transputer Transputer

slide-5
SLIDE 5

Transistor Transputer System

slide-6
SLIDE 6

Transputer Transputer applications applications

slide-7
SLIDE 7

Transputer Transputer applications applications

Amstrad TV set-top box contains ST20

slide-8
SLIDE 8

Transputer Transputer applications applications

Amstrad TV set-top box contains ST20 HETE-2 contains T805 Transputers

slide-9
SLIDE 9

Project objectives Project objectives

slide-10
SLIDE 10

Project objectives Project objectives

▸ Open-source.

slide-11
SLIDE 11

Project objectives Project objectives

▸ Open-source. ▸ Supports Transputer instruction set.

slide-12
SLIDE 12

Project objectives Project objectives

▸ Open-source. ▸ Supports Transputer instruction set. ▸ Different micro-architecture.

slide-13
SLIDE 13

The Internet

  • f Things

(IoT) is all about connectivity!

slide-14
SLIDE 14

Project objectives Project objectives

▸ Open-source. ▸ Supports Transputer instruction set. ▸ Different micro-architecture.

slide-15
SLIDE 15

Project objectives Project objectives

▸ Open-source. ▸ Supports Transputer instruction set. ▸ Different micro-architecture. ▸ New external communication mechanism and I/O interface.

slide-16
SLIDE 16

How does the How does the Transputer Transputer work? work?

slide-17
SLIDE 17

Occam Occam

▸ Occam is a high-level programming language developed at Inmos hand-in-hand with the Transputer. ▸ Explicit concurrency and interprocess communication.

slide-18
SLIDE 18

Occam Occam

▸ Occam is a high-level programming language developed at Inmos hand-in-hand with the Transputer. ▸ Explicit concurrency and interprocess communication.

slide-19
SLIDE 19

Microcomputer Division Confidential Author: Roger Shepherd W H I L E a c t i v e YA L I N T i n t e r r u p t a b l e I S G o t o S N P B i t \ / ( l O B i t \ / ( M o v e B i t \ / ( T i m e l n s B i t \ / T i m e D e l B i t ) ) ) : S E Q — c o m p l e t e d i n d i c a t e s 1 £ c u r r e n t I n s t r u c t i o n h a s t e r m i n a t e d c o m p l e t e d : = ( S t a t u s R e g / \ I n t e r r u p t a b l e ) = 0 v a l l d P r o c e s s : = W p t r < > N o t P r o c e s s . p P R I A L T (StatusReg /\ GotoSNPBit) <> 0 & SKIP S t a r t N e x t P r o c e s s ( ) (Priority = 0) AND (NOT (TNextReg[0] AFTER ClockReg[0])) AND c o m p l e t e d & S K I P HandleTlmerReguest (0) A L T h e = 0 F O R L l n k C h a n s ( P r i o r i t y = 0 ) A N D c o m p l e t e d & F r o m C h a n [ h c ] [ 0 ] ? t o k e n H a n d l e C h a n n e l R e q u e s t ( t o k e n , h e ) ( P r i o r i t y = 1 ) A N D (NOT (TNextRegCO] AFTER ClockReg[0])) & SKIP HandleTlmerReguest (0) A L T h e = F O R L l n k C h a n s ( P r i o r i t y = 1 ) & F r o m C h a n [ h c ] [ 0 ] ? t o k e n H a n d l e C h a n n e l R e q u e s t ( t o k e n , h e ) (Priority = 1) AND (NOT (TNextReg[l] AFTER ClockRog[l]))AND c o m p l e t e d & S K I P H a n d l e T l m e r R e g u e s t ( 1 ) A L T h e = F O R L l n k C h a n s ( P r i o r i t y = 1 ) A N D c o m p l e t e d & F r o m C h a n [ h c ] [ 1 ] ? t o k e n H a n d l e C h a n n e l R e q u e s t ( t o k e n , h e ) v a l l d P r o c e s s & S K I P I F ( S t a t u s R e g / \ Ti m e D e l B i t ) < > 0 D e l e t e M l d d l e S t e p ( B r e g , C r e g ) ( S t a t u s R e g / \ T l m e l n s B l t ) < > 0 I n s e r t M l d d l e S t e p ( A r e g , B r e g , C r e g ) ( S t a t u s R e g / \ M o v e B i t ) < > 0 B l o c k M o v e M l d d l e S t e p ( C r e g , B r e g , A r e g ) T R U E SEQ B u l l d N e x t I n s t r u c t i o n ( I p t r R e g , O r e g , c o d e ) I F c o d e < > f . o p r P r i m a r y ( c o d e ) c o d e = £ . o p r S e c o n d a r y ( O r e g ) Oreg : =s 0 Prioritised scheduiing The execution of a low priority process can be interrupted when a high priority process becomes runnable as defined above. In particular certain instructions are interruptable: move message // input message // output message // Microcomputer Division Confidential 2 0 Restricted Document September 27,1988

The Devil is in the detail

slide-20
SLIDE 20

Microcomputer Division Confidential Author: Roger Shepherd W H I L E a c t i v e YA L I N T i n t e r r u p t a b l e I S G o t o S N P B i t \ / ( l O B i t \ / ( M o v e B i t \ / ( T i m e l n s B i t \ / T i m e D e l B i t ) ) ) : S E Q — c o m p l e t e d i n d i c a t e s 1 £ c u r r e n t I n s t r u c t i o n h a s t e r m i n a t e d c o m p l e t e d : = ( S t a t u s R e g / \ I n t e r r u p t a b l e ) = 0 v a l l d P r o c e s s : = W p t r < > N o t P r o c e s s . p P R I A L T (StatusReg /\ GotoSNPBit) <> 0 & SKIP S t a r t N e x t P r o c e s s ( ) (Priority = 0) AND (NOT (TNextReg[0] AFTER ClockReg[0])) AND c o m p l e t e d & S K I P HandleTlmerReguest (0) A L T h e = 0 F O R L l n k C h a n s ( P r i o r i t y = 0 ) A N D c o m p l e t e d & F r o m C h a n [ h c ] [ 0 ] ? t o k e n H a n d l e C h a n n e l R e q u e s t ( t o k e n , h e ) ( P r i o r i t y = 1 ) A N D (NOT (TNextRegCO] AFTER ClockReg[0])) & SKIP HandleTlmerReguest (0) A L T h e = F O R L l n k C h a n s ( P r i o r i t y = 1 ) & F r o m C h a n [ h c ] [ 0 ] ? t o k e n H a n d l e C h a n n e l R e q u e s t ( t o k e n , h e ) (Priority = 1) AND (NOT (TNextReg[l] AFTER ClockRog[l]))AND c o m p l e t e d & S K I P H a n d l e T l m e r R e g u e s t ( 1 ) A L T h e = F O R L l n k C h a n s ( P r i o r i t y = 1 ) A N D c o m p l e t e d & F r o m C h a n [ h c ] [ 1 ] ? t o k e n H a n d l e C h a n n e l R e q u e s t ( t o k e n , h e ) v a l l d P r o c e s s & S K I P I F ( S t a t u s R e g / \ Ti m e D e l B i t ) < > 0 D e l e t e M l d d l e S t e p ( B r e g , C r e g ) ( S t a t u s R e g / \ T l m e l n s B l t ) < > 0 I n s e r t M l d d l e S t e p ( A r e g , B r e g , C r e g ) ( S t a t u s R e g / \ M o v e B i t ) < > 0 B l o c k M o v e M l d d l e S t e p ( C r e g , B r e g , A r e g ) T R U E SEQ B u l l d N e x t I n s t r u c t i o n ( I p t r R e g , O r e g , c o d e ) I F c o d e < > f . o p r P r i m a r y ( c o d e ) c o d e = £ . o p r S e c o n d a r y ( O r e g ) Oreg : =s 0 Prioritised scheduiing The execution of a low priority process can be interrupted when a high priority process becomes runnable as defined above. In particular certain instructions are interruptable: move message // input message // output message // Microcomputer Division Confidential 2 0 Restricted Document September 27,1988

The Devil is in the detail

slide-21
SLIDE 21

Front Back A B C Workspace pointer Instruction pointer Operand Scheduling registers Running process registers Process X (queued process) Process Y (queued process) Process Z (running process) Workspaces Instructions

slide-22
SLIDE 22

Interprocess Interprocess communication communication

▸ Special Transputer instructions implement Occam primitives efficiently. ▸ Communication performed either through channel in memory or physical links.

slide-23
SLIDE 23

But I want to But I want to hear about the hear about the OpenTransputer! OpenTransputer!

slide-24
SLIDE 24

Processor Processor components components

OpenTransputer CPU 16 input link controllers Output link controllers 15 I/O pin handlers Memory

slide-25
SLIDE 25

OpenTransputer CPU 16 input link controllers Output link controllers 15 I/O pin handlers Memory

Processor Processor components components

slide-26
SLIDE 26

OpenTransputer CPU 16 input link controllers Output link controllers 15 I/O pin handlers Memory

Processor Processor components components

slide-27
SLIDE 27

OpenTransputer CPU 16 input link controllers Output link controllers 15 I/O pin handlers Memory

Processor Processor components components

slide-28
SLIDE 28

OpenTransputer CPU 16 input link controllers Output link controllers 15 I/O pin handlers Memory

Processor Processor components components

slide-29
SLIDE 29

CPU CPU

Control Unit Fetch Unit Datapath (Register file, AU, LU, Memory addressing, etc.) Memory

slide-30
SLIDE 30

Control Unit Fetch Unit Datapath (Register file, AU, LU, Memory addressing, etc.) Memory

CPU CPU

slide-31
SLIDE 31

Control Unit Fetch Unit Datapath (Register file, AU, LU, Memory addressing, etc.) Memory

CPU CPU

slide-32
SLIDE 32

CPU CPU

Control Unit Fetch Unit Datapath (Register file, AU, LU, Memory addressing, etc.) Memory

slide-33
SLIDE 33

Control unit Control unit

▸ Microcoded control unit. ▸ Control signals stored in Read-Only Memory (ROM). ▸ Area savings and potentially less complex. ▸ Designed with a microcode strategy. ▸ Control signals generated by hardwired logic. ▸ Potentially faster than ROM.

Inmos Transputer Inmos Transputer OpenTransputer OpenTransputer

slide-34
SLIDE 34

Control unit Control unit

Microinstructions (human-readable)

slide-35
SLIDE 35

Control unit Control unit

Microinstructions (human-readable) Verilog HDL (Not so human-readable) Run tools

slide-36
SLIDE 36

Control unit Control unit

Microinstructions (human-readable) Verilog HDL (Not so human-readable) Control Unit Run tools Integrate

slide-37
SLIDE 37

OpenTransputer OpenTransputer microinstructions microinstructions

CJ0 ¡ROMp0(0) ¡ ¡ ¡ ¡ ¡CmpconstfromA ¡ ¡ ¡ ¡ ¡Condall(CJ1,CJ2); ¡ CJ1 ¡AfromB ¡ ¡ ¡ ¡ ¡BfromC ¡ ¡ ¡ ¡ ¡OfromClear ¡ ¡ ¡ ¡ ¡Gotoplus1; ¡

slide-38
SLIDE 38

Control Unit Fetch Unit Datapath (Register file, AU, LU, Memory addressing, etc.) Memory

CPU CPU

slide-39
SLIDE 39

Inmos Transputer datapath

slide-40
SLIDE 40

OpenTransputer datapath

slide-41
SLIDE 41

OpenTransputer datapath

‘Wider’ datapath

slide-42
SLIDE 42

OpenTransputer datapath

More parallelism

slide-43
SLIDE 43

OpenTransputer datapath

Shadow registers

slide-44
SLIDE 44

OpenTransputer CPU 16 input link controllers Output link controllers 15 I/O pin handlers Memory

Processor Processor components components

slide-45
SLIDE 45

External External communication communication

Inmos Transputer Inmos Transputer OpenTransputer OpenTransputer

Transputer Transputer Transputer Transputer Transputer Transputer Transputer Transputer Transputer

OpenTransputer OpenTransputer OpenTransputer OpenTransputer

slide-46
SLIDE 46

▸ There are 16 input ports and a single output port.

  • ▸ Extended the original Transputer controllers to

support virtual channels allowing an arbitrary number of processes to be queued to perform

  • utput operations.

Autonomous link Autonomous link controllers controllers

slide-47
SLIDE 47

Message routing Message routing

Route towards edge (RTE) Route towards core (RTC) OpenTransputer (receiver) OpenTransputer (sender) Turning point Layer: 3 RTE: 10 RTC: 00 Layer: 2 RTE: 10 RTC: 00 Layer: 1 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00

1

1 31

0000

4

10

5 15

00

16 26

0011

27 30

Initial address

slide-48
SLIDE 48

Message routing Message routing

Route towards edge (RTE) Route towards core (RTC) OpenTransputer (receiver) OpenTransputer (sender) Turning point Layer: 3 RTE: 10 RTC: 00 Layer: 2 RTE: 10 RTC: 00 Layer: 1 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00

1

1 31

0000

4

10

5 15

00

16 26

0011

27 30

Initial address

Input port

slide-49
SLIDE 49

Message routing Message routing

Route towards edge (RTE) Route towards core (RTC) OpenTransputer (receiver) OpenTransputer (sender) Turning point Layer: 3 RTE: 10 RTC: 00 Layer: 2 RTE: 10 RTC: 00 Layer: 1 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00

1

1 31

0000

4

10

5 15

00

16 26

0011

27 30

Initial address

RTE and RTC

slide-50
SLIDE 50

Message routing Message routing

Route towards edge (RTE) Route towards core (RTC) OpenTransputer (receiver) OpenTransputer (sender) Turning point Layer: 3 RTE: 10 RTC: 00 Layer: 2 RTE: 10 RTC: 00 Layer: 1 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00

1

1 31

0000

4

10

5 15

00

16 26

0011

27 30

Initial address

Depth

slide-51
SLIDE 51

Message routing Message routing

Route towards edge (RTE) Route towards core (RTC) OpenTransputer (receiver) OpenTransputer (sender) Turning point Layer: 3 RTE: 10 RTC: 00 Layer: 2 RTE: 10 RTC: 00 Layer: 1 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00 Layer: 0 RTE: 10 RTC: 00

1

1 31

0000

4

10

5 15

00

16 26

0011

27 30

Initial address

slide-52
SLIDE 52

OpenTransputer CPU 16 input link controllers Output link controllers 15 I/O pin handlers Memory

Processor Processor components components

slide-53
SLIDE 53

▸ I/O pins expose the same interface as communication links.

  • ▸ Introduced an instruction (confio) that can be

used to configure the I/O pins.

OpenTransputer I/O OpenTransputer I/O interface interface

slide-54
SLIDE 54

So… what about So… what about performance? performance?

slide-55
SLIDE 55

Synthesis Synthesis

Synthesised design for both the ZedBoard XC7Z020-CLG484 FPGA and a 180nm manufacturing process.

slide-56
SLIDE 56

Inmos Transputer OpenTransputer Area 64 mm 64 mm2 3.69 mm 3.69 mm2 Manufacturing technology 1000 nm 1000 nm 180 nm 180 nm

Comparing Comparing synthesis results synthesis results

slide-57
SLIDE 57

Inmos Transputer OpenTransputer Area 64 mm 64 mm2 3.69 mm 3.69 mm2 Manufacturing technology 1000 nm 1000 nm 180 nm 180 nm

Comparing Comparing synthesis results synthesis results

5.6

slide-58
SLIDE 58

Inmos Transputer OpenTransputer Area 64 mm 64 mm2 3.69 mm 3.69 mm2 Manufacturing technology 1000 nm 1000 nm 180 nm 180 nm

Comparing Comparing synthesis results synthesis results

5.6 4.22

slide-59
SLIDE 59

Instruction Inmos Transputer OpenTransputer ldl 2 1 startp 12 12 3-5 3-5 endp 13 13 3 move 8 + 2w* 8 + 2w* 6 + 5w* 6 + 5w*

Comparing cycle Comparing cycle counts counts

* w is the number of words to copy.

slide-60
SLIDE 60

ü OpenTransputer based on the Transputer architecture.

  • ü Different micro-architecture to take advantage of

modern manufacturing technologies

  • ü New external communication mechanism.
  • ü New I/O interface.

Conclusion Conclusion

Andrés Amaya García aa1399@my.bristol.ac.uk

slide-61
SLIDE 61

ü OpenTransputer based on the Transputer architecture.

  • ü Different micro-architecture to take advantage of

modern manufacturing technologies

  • ü New external communication mechanism.
  • ü New I/O interface.

Conclusion Conclusion Thank you! Thank you!

Andrés Amaya García aa1399@my.bristol.ac.uk