vector iram isa and micro architecture
play

Vector IRAM: ISA and Micro-architecture Christoforos E. Kozyrakis - PowerPoint PPT Presentation

Vector IRAM: ISA and Micro-architecture Christoforos E. Kozyrakis Computer Science Division University of California, Berkeley kozyraki@cs.berkeley.edu http://iram.cs.berkeley.edu/ Outline Project motivation, goals and approach


  1. Vector IRAM: ISA and Micro-architecture Christoforos E. Kozyrakis Computer Science Division University of California, Berkeley kozyraki@cs.berkeley.edu http://iram.cs.berkeley.edu/

  2. Outline • Project motivation, goals and approach • Vector IRAM ISA • VIRAM-1 micro-architecture • Project status C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 2

  3. Project Motivation • Processor-memory gap is growing exponentially • Applications shifting from engineering/desktop to multimedia – importance of performance of media functions importance of real-time predictable performance • Embedded/ portable systems gain popularity – importance of energy consumption – importance system size • Focus on processors for portable, multimedia systems C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 3

  4. The Vector IRAM Approach Vector processing Embedded DRAM • multimedia ready • high memory bandwidth • predictable, high • low memory latency performance • energy savings • simple • system size benefits • energy savings Serial I/O • high code density • Gbit/sec I/O bandwidth • well understood • low pin count programming model • low power C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 4

  5. Outline • Project motivation and goals • Vector IRAM ISA – Overview of VIRAM ISA extensions – Fixed-point and DSP support – Conditional and speculative execution – Memory model • VIRAM-1 micro-architecture • Project status C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 5

  6. Vector Execution Model SCALAR VECTOR (1 operation) (N operations) v1 v2 r1 r2 + + r3 v3 vector length add.vv v3, v1, v2 add r3, r1, r2 C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 6

  7. Vector Architectural State Virtual Processors ($vlr) VP 0 VP 1 VP $vlr-1 vr 0 General Control vr 1 Purpose Registers Registers vcr 0 vr 31 (32) vcr 1 $vpw vf 0 Flag vf 1 vcr 31 Registers (32) 32b vf 31 1b C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 7

  8. Overview of V-IRAM ISA Extensions Scalar MIPS-V scalar instruction set 8 .v s.int All ALU / memory Vector 16 .vv u.int alu op operations under 32 .vs s.fp ALU mask 64 .sv d.fp 8 8 Vector unit stride load s.int 16 16 constant stride Memory store u.int 32 32 indexed 64 64 Vector 32 x VL x 64b data 32 x VL x 1b flag + 32 x 4VL x 32b data 32 x 2VL x 1b flag Registers 32 x 8VL x 16b data 32 x 8VL x 1b flag Plus: flag , convert , fixed-point , and transfer operations C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 8

  9. Fixed-point and DSP support • GOAL: Competitive DSP performance • Many DSP features already provided – narrow data widths [provided] – high speed MACs [instruction chaining] – multiple LD/ST per cycle [multiple memory units] – auto increment / decrement [strided memory access] – zero overhead loops [vector instructions] – fixed � floating convert [provided] – bit reverse addressing [use better FFT algorithm] C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 9

  10. Fixed-point Multiply-Add Model Mul Add z x n + w F n/2 * n n Round y n/2 a truncate signed saturate round nearest even unsigned saturate F = Round = round nearest up shift by one jam C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 10

  11. Fixed-point instructions • Vector half-width integer multiply • Vector fixed-point shift and add • Vector saturate • Vector saturating left arithmetic shift C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 11

  12. Conditional (Predicated) Execution • Almost every vector instruction is executed subject to one of two vector masks • 15 GP flag register provided to buffer masks or operate on them • 6 flag logical and 13 flag processing instructions (like population count, iota etc) • 15 flag registers used for sticky exception bits for arithmetic/FP operations and speculative operations C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 12

  13. Speculative Execution • Vectorizing loops with conditional exit conditions – Need to speculate past loop exit – Need to temporarily suppress exceptions • Speculation controlled by software • Solution: – A duplicate set of arithmetic exception flag registers – A flag register reserved for load faults – Speculative loads and speculative arithmetic instructions write these duplicate exception bits C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 13

  14. Speculative Execution (cont.) • Perform loads and enough arithmetic to determine loop exit condition – Stores cannot be speculated! • Generate mask to exclude iterations after loop exit (flag processor instruction) • VCOMMIT instruction (under mask): – ORs speculative flags into real flags – Raises memory exceptions C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 14

  15. Memory Model • Relaxed consistency to simplify hardware: no guarantee about ordering of memory operations, even within the same VP • Register interlocks provided on a per-element basis • Vector memory barrier used for ordering between scalar unit and vector unit and between VPs • Indexed memory operations do not specify ordering; separate ordered indexed store instruction C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 15

  16. Outline • Project motivation and goals • Vector IRAM ISA • VIRAM-1 micro-architecture – Overview of VIRAM-1 micro-architecture – Vector pipelines – Memory system architecture • Project status C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 16

  17. VIRAM-1 Block Diagram C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 17

  18. VIRAM-1 Features • Scalar unit 64-bit MIPS core with FP unit 8KB I+D caches, write-through cache invalidation interface • Vector unit maximum vector length 32 64, 32, 16 bit data-types 2 vector arithmetic units 2 vector flag processing units 4 pipelines per functional unit 2 vector load/store units 64 entry vector TLB, multi-ported C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 18

  19. Vector Pipelines • Multiple pipelines can increase performance OR • Energy decrease by decreasing clock frequency and power supply C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 19

  20. VIRAM-1 Memory System • 16 to 32MB DRAM • 16 independently addressed banks • 8 2Mbit DRAM macros per bank with 256-bit synchronous interface • Memory crossbar – interconnects scalar, vector unit and I/O to memory – 8 addresses per cycle – 12.8GB/sec maximum data bandwidth per direction – implemented using low-swing techniques C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 20

  21. VIRAM-1 Floorplan C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 21

  22. VIRAM-1 Goals Technology 0.20 micron, 5 metal layers, embedded DRAM-logic process Memory 16-32 MB Die size 250-300 mm 2 Vector pipelines 4 64-bit (or 8 32-bit or 16 16-bit) Clock Frequency 200MHz scalar, 200MHz vector, 100MHz DRAM Serial I/O 4 lines @ 1 Gbit/s Power 2 W @ 1.5 volt logic Performance 1.6 GFLOPS 64 – 6.4 GOPS 16 First microprocessor above 0.25B transistors? C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 22

  23. Scaling Down VIRAM-1 ● Scaled-down version automatically generated from the the original ● 8 MB in 4 banks ● Vector unit with single pipeline per functional unit => same control ● die: 80 mm 2 ● transistors: 70M ● power: 0.5 Watts ● performance: 0.4 GFLOPS 64 1.6 GOPS 16 C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 23

  24. Project Status • ISA extensions frozen • Micro-architecture still under development but design has started • Developing simulation infrastructure • Designed 2 test-chips for circuit evaluation – serial I/O @ 1Gbit/s – embedded DRAM and on-chip crossbar • Expected VIRAM-1 tape-out: early 2000 C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 24

  25. Acknowledgments • Thanks for advice/support: DARPA, California MICRO, ARM, Hitachi, IBM, Intel, LG Semicon, Microsoft, Mitsubishi, Neomagic, Samsung, SGI/Cray, Sun Microsystems • The IRAM/ISTORE cast: D. Patterson, K. Asanovic, A. Brown, J. Gebis, B. Gribstad, R. Fromm, J. Golbus, K. Keeton, C. Kozyrakis, J. Kubiatowicz, D. Martin, S. Perissakis, R. Thomas, N. Treuhaft and K. Yelick C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend