Vector IRAM: ISA and Micro-architecture Christoforos E. Kozyrakis - PowerPoint PPT Presentation

Vector IRAM: ISA and Micro-architecture Christoforos E. Kozyrakis Computer Science Division University of California, Berkeley kozyraki@cs.berkeley.edu http://iram.cs.berkeley.edu/

Outline • Project motivation, goals and approach • Vector IRAM ISA • VIRAM-1 micro-architecture • Project status C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 2

Project Motivation • Processor-memory gap is growing exponentially • Applications shifting from engineering/desktop to multimedia – importance of performance of media functions importance of real-time predictable performance • Embedded/ portable systems gain popularity – importance of energy consumption – importance system size • Focus on processors for portable, multimedia systems C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 3

The Vector IRAM Approach Vector processing Embedded DRAM • multimedia ready • high memory bandwidth • predictable, high • low memory latency performance • energy savings • simple • system size benefits • energy savings Serial I/O • high code density • Gbit/sec I/O bandwidth • well understood • low pin count programming model • low power C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 4

Outline • Project motivation and goals • Vector IRAM ISA – Overview of VIRAM ISA extensions – Fixed-point and DSP support – Conditional and speculative execution – Memory model • VIRAM-1 micro-architecture • Project status C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 5

Vector Execution Model SCALAR VECTOR (1 operation) (N operations) v1 v2 r1 r2 + + r3 v3 vector length add.vv v3, v1, v2 add r3, r1, r2 C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 6

Vector Architectural State Virtual Processors ($vlr) VP 0 VP 1 VP $vlr-1 vr 0 General Control vr 1 Purpose Registers Registers vcr 0 vr 31 (32) vcr 1 $vpw vf 0 Flag vf 1 vcr 31 Registers (32) 32b vf 31 1b C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 7

Overview of V-IRAM ISA Extensions Scalar MIPS-V scalar instruction set 8 .v s.int All ALU / memory Vector 16 .vv u.int alu op operations under 32 .vs s.fp ALU mask 64 .sv d.fp 8 8 Vector unit stride load s.int 16 16 constant stride Memory store u.int 32 32 indexed 64 64 Vector 32 x VL x 64b data 32 x VL x 1b flag + 32 x 4VL x 32b data 32 x 2VL x 1b flag Registers 32 x 8VL x 16b data 32 x 8VL x 1b flag Plus: flag , convert , fixed-point , and transfer operations C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 8

Fixed-point and DSP support • GOAL: Competitive DSP performance • Many DSP features already provided – narrow data widths [provided] – high speed MACs [instruction chaining] – multiple LD/ST per cycle [multiple memory units] – auto increment / decrement [strided memory access] – zero overhead loops [vector instructions] – fixed � floating convert [provided] – bit reverse addressing [use better FFT algorithm] C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 9

Fixed-point Multiply-Add Model Mul Add z x n + w F n/2 * n n Round y n/2 a truncate signed saturate round nearest even unsigned saturate F = Round = round nearest up shift by one jam C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 10

Fixed-point instructions • Vector half-width integer multiply • Vector fixed-point shift and add • Vector saturate • Vector saturating left arithmetic shift C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 11

Conditional (Predicated) Execution • Almost every vector instruction is executed subject to one of two vector masks • 15 GP flag register provided to buffer masks or operate on them • 6 flag logical and 13 flag processing instructions (like population count, iota etc) • 15 flag registers used for sticky exception bits for arithmetic/FP operations and speculative operations C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 12

Speculative Execution • Vectorizing loops with conditional exit conditions – Need to speculate past loop exit – Need to temporarily suppress exceptions • Speculation controlled by software • Solution: – A duplicate set of arithmetic exception flag registers – A flag register reserved for load faults – Speculative loads and speculative arithmetic instructions write these duplicate exception bits C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 13

Speculative Execution (cont.) • Perform loads and enough arithmetic to determine loop exit condition – Stores cannot be speculated! • Generate mask to exclude iterations after loop exit (flag processor instruction) • VCOMMIT instruction (under mask): – ORs speculative flags into real flags – Raises memory exceptions C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 14

Memory Model • Relaxed consistency to simplify hardware: no guarantee about ordering of memory operations, even within the same VP • Register interlocks provided on a per-element basis • Vector memory barrier used for ordering between scalar unit and vector unit and between VPs • Indexed memory operations do not specify ordering; separate ordered indexed store instruction C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 15

Outline • Project motivation and goals • Vector IRAM ISA • VIRAM-1 micro-architecture – Overview of VIRAM-1 micro-architecture – Vector pipelines – Memory system architecture • Project status C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 16

VIRAM-1 Block Diagram C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 17

VIRAM-1 Features • Scalar unit 64-bit MIPS core with FP unit 8KB I+D caches, write-through cache invalidation interface • Vector unit maximum vector length 32 64, 32, 16 bit data-types 2 vector arithmetic units 2 vector flag processing units 4 pipelines per functional unit 2 vector load/store units 64 entry vector TLB, multi-ported C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 18

Vector Pipelines • Multiple pipelines can increase performance OR • Energy decrease by decreasing clock frequency and power supply C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 19

VIRAM-1 Memory System • 16 to 32MB DRAM • 16 independently addressed banks • 8 2Mbit DRAM macros per bank with 256-bit synchronous interface • Memory crossbar – interconnects scalar, vector unit and I/O to memory – 8 addresses per cycle – 12.8GB/sec maximum data bandwidth per direction – implemented using low-swing techniques C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 20

VIRAM-1 Floorplan C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 21

VIRAM-1 Goals Technology 0.20 micron, 5 metal layers, embedded DRAM-logic process Memory 16-32 MB Die size 250-300 mm 2 Vector pipelines 4 64-bit (or 8 32-bit or 16 16-bit) Clock Frequency 200MHz scalar, 200MHz vector, 100MHz DRAM Serial I/O 4 lines @ 1 Gbit/s Power 2 W @ 1.5 volt logic Performance 1.6 GFLOPS 64 – 6.4 GOPS 16 First microprocessor above 0.25B transistors? C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 22

Scaling Down VIRAM-1 ● Scaled-down version automatically generated from the the original ● 8 MB in 4 banks ● Vector unit with single pipeline per functional unit => same control ● die: 80 mm 2 ● transistors: 70M ● power: 0.5 Watts ● performance: 0.4 GFLOPS 64 1.6 GOPS 16 C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 23

Project Status • ISA extensions frozen • Micro-architecture still under development but design has started • Developing simulation infrastructure • Designed 2 test-chips for circuit evaluation – serial I/O @ 1Gbit/s – embedded DRAM and on-chip crossbar • Expected VIRAM-1 tape-out: early 2000 C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 24

Acknowledgments • Thanks for advice/support: DARPA, California MICRO, ARM, Hitachi, IBM, Intel, LG Semicon, Microsoft, Mitsubishi, Neomagic, Samsung, SGI/Cray, Sun Microsystems • The IRAM/ISTORE cast: D. Patterson, K. Asanovic, A. Brown, J. Gebis, B. Gribstad, R. Fromm, J. Golbus, K. Keeton, C. Kozyrakis, J. Kubiatowicz, D. Martin, S. Perissakis, R. Thomas, N. Treuhaft and K. Yelick C.E. Kozyrakis, IEEE Computer Elements Workshop, June 22, 1998 25

Vector IRAM: ISA and Micro-architecture Christoforos E. Kozyrakis - PowerPoint PPT Presentation

Vector IRAM: ISA and Micro-architecture Christoforos E. Kozyrakis Computer Science Division University of California, Berkeley kozyraki@cs.berkeley.edu http://iram.cs.berkeley.edu/ Outline Project motivation, goals and approach

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

ISAs and Y86-64 Samira Khan Agenda ISA vs Microarchitecture ISA Tradeoffs Y86-64 ISA

Corporate Presentation December 2019 Agenda Overview ISA Group 1 Overview ISA Group in Per

Instruction Set Architecture ( ISA ) 1 / 28 instructions 2 / 28 Instruction Set Architecture

Instructions and Addressing 1 ISA vs. Microarchitecture ISA vs. Microarchitecture An ISA or

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

ISA Implementations Partly in Run programs for one ISA on hardware with different ISA Techniques:

Quo Vadis, ISA & Cui Bono? Michael Engel TU Dortmund GI FG-BS TU Berlin

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

INSTITUTIONAL PRESENTATION 1 Q 2 0 | R E S U L T S ISA Viso geral CTEEP ISA CTEEP in

CEO Conference N e w Y o r k | M a y , 2 0 1 9 Viso ISA CTEEP geral Why Invest in ISA

INSTITUTIONAL PRESENTATION 4 Q 1 9 | R E S U L T S A ISA Viso geral CTEEP ISA CTEEP in

PRESENTATION 2 Q 1 9 | R E S U L T S ISA CTEEP ISA CTEEP in the Transmission Sector

PRESENTATION 3 Q 1 9 | R E S U L T S ISA CTEEP ISA CTEEP in the Transmission Sector

Appendix A: ISA Principles 1 MO401 2014 Tpicos IC-UNICAMP Tipos de ISA (Instruction Set

Micro-frontends Architecture @lucamezzalira 1 Ciao :) Luca Mezzalira VP of Architecture at

Computer Science & Engineering 150A Introduction Problem Solving Using Computers Declaring,

Dissecting Memory Problems A Semantic Approach Alfredo Gimenez Motivation Historical trends

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU

Cs Memory Model C0 C 1 Balance Sheet so far Lost Gained

Permuting Upper and Lower bounds [Aggarwal, Vitter, 88] Page 1 Upper Bound Assume instance is

Topics Paging Virtual Memory File Systems I/O Devices Operating Systems

(Dynamic Strings) Personal Software Engineering Memory Organization The call stack grows from

Pattern Based Method For P/G Grid Analysis Jin Shi, Yici Cai, Xianlong Hong Sheldon X.-D. Tan

Vector IRAM: ISA and Micro-architecture Christoforos E. Kozyrakis - PowerPoint PPT Presentation

Vector IRAM: ISA and Micro-architecture Christoforos E. Kozyrakis Computer Science Division University of California, Berkeley kozyraki@cs.berkeley.edu http://iram.cs.berkeley.edu/ Outline Project motivation, goals and approach

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

ISAs and Y86-64 Samira Khan Agenda ISA vs Microarchitecture ISA Tradeoffs Y86-64 ISA

Corporate Presentation December 2019 Agenda Overview ISA Group 1 Overview ISA Group in Per

Instruction Set Architecture ( ISA ) 1 / 28 instructions 2 / 28 Instruction Set Architecture

Instructions and Addressing 1 ISA vs. Microarchitecture ISA vs. Microarchitecture An ISA or

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

ISA Implementations Partly in Run programs for one ISA on hardware with different ISA Techniques:

Quo Vadis, ISA &amp; Cui Bono? Michael Engel TU Dortmund GI FG-BS TU Berlin

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

INSTITUTIONAL PRESENTATION 1 Q 2 0 | R E S U L T S ISA Viso geral CTEEP ISA CTEEP in

CEO Conference N e w Y o r k | M a y , 2 0 1 9 Viso ISA CTEEP geral Why Invest in ISA

INSTITUTIONAL PRESENTATION 4 Q 1 9 | R E S U L T S A ISA Viso geral CTEEP ISA CTEEP in

PRESENTATION 2 Q 1 9 | R E S U L T S ISA CTEEP ISA CTEEP in the Transmission Sector

PRESENTATION 3 Q 1 9 | R E S U L T S ISA CTEEP ISA CTEEP in the Transmission Sector

Appendix A: ISA Principles 1 MO401 2014 Tpicos IC-UNICAMP Tipos de ISA (Instruction Set

Micro-frontends Architecture @lucamezzalira 1 Ciao :) Luca Mezzalira VP of Architecture at

Computer Science &amp; Engineering 150A Introduction Problem Solving Using Computers Declaring,

Dissecting Memory Problems A Semantic Approach Alfredo Gimenez Motivation Historical trends

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU

Cs Memory Model C0 C 1 Balance Sheet so far Lost Gained

Permuting Upper and Lower bounds [Aggarwal, Vitter, 88] Page 1 Upper Bound Assume instance is

Topics Paging Virtual Memory File Systems I/O Devices Operating Systems

(Dynamic Strings) Personal Software Engineering Memory Organization The call stack grows from

Pattern Based Method For P/G Grid Analysis Jin Shi, Yici Cai, Xianlong Hong Sheldon X.-D. Tan

Quo Vadis, ISA & Cui Bono? Michael Engel TU Dortmund GI FG-BS TU Berlin

Computer Science & Engineering 150A Introduction Problem Solving Using Computers Declaring,