hardw are softw are hardw are softw are hardw are softw
play

Hardw are/Softw are Hardw are/Softw are Hardw are/Softw are - PowerPoint PPT Presentation

Hardw are/Softw are Hardw are/Softw are Hardw are/Softw are Instruction Set Configurability Instruction Set Configurability Instruction Set Configurability for Sytem-on-Chip Processors for Sytem-on-Chip Processors for Sytem-on-Chip


  1. Hardw are/Softw are Hardw are/Softw are Hardw are/Softw are Instruction Set Configurability Instruction Set Configurability Instruction Set Configurability for Sytem-on-Chip Processors for Sytem-on-Chip Processors for Sytem-on-Chip Processors 38 th DAC, Las Vegas, 38 th DAC, Las Vegas, June 18-22, 2001 June 18-22, 2001 Albert Wang, Chris Row en, Dror Maydan, Earl Killia

  2. Landscape of reconfigurable reconfigurable computing computing Landscape of Landscape of reconfigurable computing Optimality/ integration (e.g. mW, $) Instruction-set ASIC Configurable Processor ∆ ~10x FPGA + Processor FPGA General Processor Flexibility/modularity ∆ ~10x (e.g. time-to-market) 2

  3. Computing using temporal connection Computing using temporal connection Computing using temporal connection Processor Solution Registers Memory (Program) Control Datapath � X Correct Efficient � Processor X 3

  4. Computing using spatial connection Computing using spatial connection Computing using spatial connection Processor Solution ASIC Solution Registers Memory (Program) Control Storage FSM Datapath � X Correct Efficient � ASIC X 4

  5. Configurable Processors: best of both Configurable Processors: best of both Configurable Processors: best of both Processor with Application-specific Instructions Processor Solutions ASIC Solutions Registers Memory (Program) Control Storage FSM Datapath � � Correct Efficient � Processor � ASIC 5

  6. Outline Outline Outline � Configurable processor solution � Xtensa ™ processor Architecture � Instruction extension automation � Software development tools � An Example � Results � Summary 6

  7. Conventional Architecture Conventional Architecture Conventional Architecture •More registers Decoder •More FU’s S0 S1 RF0 RF1 RF2 •Deeper pipeline Source •Bypass/forward Control FU0 FU0 FU0 FU0 Result 7

  8. Conventional Architecture - cont. Conventional Architecture - cont. Conventional Architecture - cont. • More FU’s Decoder S0 S1 RF0 RF1 RF2 Source routing Control FU0 FU1 FU2 FU3 Result routing 8

  9. Conventional Architecture – cont. Conventional Architecture – cont. Conventional Architecture – cont. •More FU’s Decoder • More registers S0 S1 RF0 RF1 RF2 Source routing Control FU0 FU1 FU2 FU3 Result routing 9

  10. Conventional Architecture – cont. Conventional Architecture – cont. Conventional Architecture – cont. •More registers Decoder •More FU’s S0 S1 RF0 RF1 RF2 • Deeper pipeline Source routing Control FU0 FU1 FU2 FU3 Result routing 10

  11. Conventional Architecture – cont. Conventional Architecture – cont. Conventional Architecture – cont. •More registers Decoder •More FU’s S0 S1 RF0 RF1 RF2 •Deeper pipeline Source routing • Bypass/forward Control FU0 FU1 FU2 FU3 Result routing 11

  12. Conventional Architecture – cont. Conventional Architecture – cont. Conventional Architecture – cont. � Problem with fixed processor: � Waste silicon • There is no universal extensions, or even one for each application class � Not fast enough, compared with hardware implementation � Waste power � The Tensilica solution: � Small core processor � Allow easy and efficient application-specific instruction extensions 12

  13. Xtensa Architecture – Base Xtensa Architecture – Base Xtensa Architecture – Base � Good performance � Comparable to any embedded 32-bit Decoder RISC � Good code density S0 S1 RF0 RF1 RF2 � Much better than 32-bit RISC � Use 16b/24b instructions Source routing � Small � .7mm 2 in .18 � Low power Control � .37mw / MHz � Easy extension FU0 FU0 FU0 FU0 � With Tensilica Instruction Extension (TIE) language – ISA level � Efficient extension � TIE compiler generates efficient pipelined implementation Result routing � TIE compiler extends all software development tools 13

  14. TIE language - opcode TIE language - opcode TIE language - opcode • Opcode Decoder S0 S1 RF0 RF1 RF2 Source routing Control FU0 FU0 FU0 FU0 Result routing opcode MAC op2 =5 CUST0 14

  15. TIE Language – regfile regfile / / state TIE Language – state TIE Language – regfile / state •Opcode Decoder • Register file / State S0 … as needed RF0 Source routing Control FU0 FU0 FU0 FU0 Result routing state ACC 40 15

  16. TIE Language – semantics TIE Language – semantics TIE Language – semantics •Opcode Decoder •Register file / state … as needed S0 RF0 • semantics Source routing Control … as needed FU0 MAC Result routing semantic sem1 {MAC} { assign ACCL=ACCL+ars[16:0]*art[15:0];} 16

  17. TIE Language – iclass TIE Language – iclass TIE Language – iclass •Opcode Decoder •Register file / state … as needed S0 RF0 •semantics Source routing • Instruction class Control … as needed FU0 MAC Result routing iclass c1 {MAC} { in ars , in art } { inout ACC} 17

  18. TIE Language - schedule TIE Language - schedule TIE Language - schedule •Opcode Decoder •Register file / state … as needed S0 RF0 •semantics Source routing • Instruction class • schedule Control MAC … as needed FU0 Result routing schedule s1 {MAC}{ use ars 1; use art 1; use ACC 2; def ACC 2;} 18

  19. A Complete Example – parallel MAC A Complete Example – parallel MAC A Complete Example – parallel MAC opcode PMAC op2=0 CUST0 state ACC1 40 state ACC2 40 iclass rr {PMAC}{in ars, in art}{inout ACC1, inout ACC2} semantic pmac_sem {PMAC} { assign ACC1 = ACC1 + ars[15:0] * art[15:0]; assign ACC2 = ACC2 + ars[31:16] * art[31:16]; } schedule pmac_schd {PMAC} { use ars 1; use art 1; use ACC1 2; use ACC2 2; def ACC1 2; def ACC2 2; } 19

  20. Productivity Gain – language + compiler Productivity Gain – language + compiler Productivity Gain – language + compiler I/O ALU Timer Pipe Cache Register File MMU Tailored, synthesizable HDL uP core Select processor options Using the ∗∗∗∗∗∗∗ Xtensa ∗∗∗∗ processor Customized ∗∗∗∗∗∗∗∗ generator, Compiler, ∗∗∗ create... Assembler, Linker, Describe new Debugger, In Minutes! instructions Simulator 20

  21. Productivity Gain – Softw are Tools Productivity Gain – Softw are Tools Productivity Gain – Softw are Tools I/O ALU Timer Pipe Cache Register File MMU Tailored, synthesizable HDL uP core Select processor options Using the ∗∗∗∗∗∗∗ Xtensa ∗∗∗∗ processor Customized ∗∗∗∗∗∗∗∗ generator, Compiler, ∗∗∗ create... Assembler, Linker, Describe new Debugger, instructions Simulator 21

  22. Softw are Support – Assembler Softw are Support – Assembler Softw are Support – Assembler • Assembler Loop a2, .L1 Decoder l16si a10, a3, 0 l16si a11, a3, 2 ACC1 ACC2 RF0 addi.n a3, a3, 2 PMAC a10, a11 .L1: ∗ ∗ Control • Custom data type + + • Register allocation FU0 • Code Scheduling • RTOS • Simulator/debugger 22

  23. Softw are Support – custom data type Softw are Support – custom data type Softw are Support – custom data type • Assembler Decoder • Custom data type ACC1 ACC2 RF0 sat_int x,y,z; C Code: z = sat_add(x,y); ∗ ∗ Control • Register allocation + + • Code Scheduling FU0 • RTOS • Simulator/debugger 23

  24. Softw are Support – register allocation Softw are Support – register allocation Softw are Support – register allocation • Assembler Decoder • Custom data type ACC1 ACC2 RF0 • Register allocation Spilling around a call: sat_add s3, s1, s2 ∗ ∗ Control sat_store s3, a1, 0 call8 foo + + FU0 sat_load s3, a1, 0 • Code Scheduling • RTOS • Simulator/debugger 24

  25. Softw are Support – code scheduling Softw are Support – code scheduling Softw are Support – code scheduling • Assembler • Custom data type Decoder ACC1 ACC2 • Register allocation RF0 • Code Scheduling t = sat_mult(x,y); ∗ ∗ Control z = sat_add(z, t); t2 = sat_mult(x2, y2); + + FU0 sat_mult s3, s1, s2 sat_mult s6, s5, s4 sat_add s7, s7, s3 • RTOS • Simulator/debugger 25

  26. Softw are Support - RTOS Softw are Support - RTOS Softw are Support - RTOS • Assembler • Custom data type Decoder • Register allocation ACC1 ACC2 RF0 • Code Scheduling • RTOS ∗ ∗ Control Context Switch + + FU0 Task0 Task1 sat_store S0, S0, S1, S1, Memory … … sat_load s15 s15 • Simulator/debugger 26

  27. Softw are Support – simulator/debugger Softw are Support – simulator/debugger Softw are Support – simulator/debugger • Assembler Decoder ? • Custom data type ? ACC1 ACC2 RF0 • Register allocation • Code Scheduling ∗ ∗ Control • RTOS ? + + • Simulator/debugger FU0 gdb> break … gdb> cont gdb> step gdb> display … 27

  28. Outline Outline Outline � Configurable processors � Architecture � Instruction extension � Software support � An Example � Results � Summary 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend