Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.edu) 1 - PowerPoint PPT Presentation

CS 152 Computer Architecture and Engineering Introduction to Architectures for Digital Signal Processing Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.edu) 1

Processor Applications • General Purpose - high performance – Pentiums, Alpha’s, SPARC Increasing – Used for general purpose software – Heavy weight OS - UNIX, NT Cost – Workstations, PC’s • Embedded processors and processor cores – ARM, 486SX, Hitachi SH7000, NEC V800 – Single program – Lightweight, often realtime OS volume Increasing – DSP support – Cellular phones, consumer electronics (e.g. CD players) • Microcontrollers – Extremely cost sensitive – Small word size - 8 bit common – Highest volume processors by far – Automobiles, toasters, thermostats, ... 2

The Processor Design Space Application specific architectures for performance Microprocessors Embedded processors Performance Performance is everything & Software rules Microcontrollers Cost is everything Cost 3

World’s Cellular Subscribers Millions 700 Will provide a ubiquitous 600 infrastructure 500 for wireless data as well 400 as voice 300 Digital 200 100 Analog 0 Year 1993 1994 1995 1996 1997 1998 1999 2000 2001 Source: Ericsson Radio Systems, Inc.

Multimedia I/O Architecture Embedded Radio Processor Modem Sched ECC Pact Interface Low Power Bus FB Video Fifo Fifo Decomp Pen SRAM Data Graphics Audio Flow Video 5

Embedded applications E.g. Multimedia terminal electronics Graphics Out Uplink Radio Video I/O Downlink Radio Voice I/O Pen In • Future chips will be a mix of processors, memory µP Video Unit and dedicated hardware for specific algorithms Coms custom and I/O Memory DSP 6

Requirements of the Embedded Processors • Optimized for a single program - code often in on-chip ROM or off chip EPROM • Minimum code size (one of the motivations initially for Java) • Performance obtained by optimizing datapath • Low cost – Lowest possible area – Technology behind the leading edge – High level of integration of peripherals (reduces system cost) • Fast time to market – Compatible architectures (e.g. ARM) allows reuseable code – Customizable core • Low power if application requires portability 7

Area of processor cores = Cost Nintendo processor Cellular phones 8

Another figure of merit Computation per unit area Nintendo processor ??? Cellular phones 9

National Semiconductor - Embedded Processor Family • Simple architecture • 3 stage pipeline - fetch - decode - execute • Minimum power and size – Short pipeline avoids branch prediction and bypass – Versions range from 8-64 bit - choose minimum that meets requirements 10

Code size • If a majority of the chip is the program stored in ROM, then code size is a critical issue • The Piranha has 3 sized instructions - basic 2 byte, and 2 byte plus 16 or 32 bit immediate 11

Example application (single chip system) 12

The DSP Module (DSPM) • Vector instructions directly supported • Pipelined datapath supprts single cycle: Multiply, Add, Shift, Load/Store and Pointer adjustment • Operates in parallel to processor core • Saturation, overflow and rounding for ALU operations • Automatic support for cyclic buffers (modulo arithmetic) 13

The National DSP Module Architecture Three simultaneous Zero overhead addresses repeat X Y Z Single cycle MAC support is typical for DSP acceleration 14

The 486 “Embedded Processor” Look familiar??? 15

The “Embedded” Features of the 486 GX • Said to be designed “for embedded battery- operated and hand-held applications” (???) • Fully static design (clock can stop and all state is kept) • “Auto Clock Freeze” stops circuits which are not being used in a given instruction (gated clocks) • Stop Clock (60 µ W), Stop Grant - clock runs but no program execution (40-85 mW) • Split power supply - 2.0-3.3 Volt core, 3.3V. I/O, 16

Power = C V 2 f clock Power 130 mW 350 mW 190 mW 430 mW 290 mW 540 mW 490 mW 730 mW 17 mW 20 mW 23 mW 30 mW Note the clock rates 17

Characterizing programs for their energy consumption Process Subframe 330 µ W ComputeLag(...) ComputeLag 107 µ W { IFilterCodebook 63 µ W R=dotprod(res,res); for (lag=0..127) QuantizeGains 46 µ W { lp=getLT ( lt); CodebookSearch 44 µ W G = dotprod(lp, lp); UpdateFilterState 8 µ W } } OrthogonalizeCodebook 6 µ W ComputeWeightedInput 22 µ W ThetaToCodeword 8 µ W Top four functions account for 90 % of the power 65% of power dissipation in dot-vector products (data obtained from profiling of C++-code, weighted with estimated instruction energy costs) 18

An architecture optimized for multiply- accumulate AddressGen AddressGen Energy/Flexibility Tradeoff’s Arm 6 core (5V, 20 MHz): Memory Memory .02 MIPS/mW ZSP DSP Superscaler (3V, 200 MHz) .3 MOPS/mW MAC MAC Reconfigurable Dot-Vector Processor (1.5V, 30 MHz) L G C 5.9 MIPS/mW Control * MOPS = millions of operations/sec Processor = millions of MACS/sec 19

DSP Application - equalization • The audio data streams from the source (computer) through the digital analysis and synthesis • Hard realtime requirement - the processing must be done at the sample rate 20

Common DSP algorithms and applications • Applications – Instrumentation and measurement – Communications – Audio and video processing – Graphics, image enhancement, 3-D rendering – Navigation, radar, GPS – Control - robotics, machine vision, guidance • Algorithms – Frequency domain filtering - FIR and IIR – Frequency-time transformations - FFT – Correlation 21

Sampled data processing R V in (t) V out (t) C This RC low pass filter takes this time waveform (signal) and turns it into this filtered version This analog circuit really is just an solution of the differential equation calculated using the physics of electric fields and currents: dV + = out RC V ( t ) V ( t ) out in dt To implement this digitally we need to convert this expression to discrete time. First we need to convert from a continuous time representation of the signal to discrete time sequences: V out (t) => Y 1 Y 2 Y 3 … Y n and V in (t) => X 1 X 2 X 3 … X n 22

Discrete time representation The sampled version of V in (t) is a sequence of numbers 6,8,4,12, …. This then provides the input to the digital signal processing algorithm Digital signal processor ∆ t = t sample =1/f sample Y 1 Y 2 Y 3 …. X 1 X 2 X 3 …. Now what is the processing that goes on to implement the filtering? Using a discrete approximation to the derivative we obtain the discrete time equivalent of the continuous time differential equation: −   Y Y + = −   n n 1 RC Y X − − ∆ n 1 n 1   t 23

A computational structure This can be rewritten as: ∆  ∆    t t = − + = α + β     1 Y Y X Y X − − − n n 1 n 1 n 1 n     RC RC since the new sample is only a function of past samples it can be computed using the following procedure: Σ X n X Y n β α Y n-1 X Delay α 24

Direct mapping architecture Σ X n X Y n β α Y n-1 X Delay α • These calculations need to be finished after every sample period, since Y n depends on Y n-1 and new data is continuously coming => hard real time requirement • In each sample period there are 2 multiply adds and one accumulate. • We could directly map this structure into hardware and then the delay becomes a pipeline register and we would need two multipliers and an adder - this is the most direct approach, almost no control, but also no flexibility 25

Filter structures 26

Mapping of the filter onto a DSP execution unit 4 6 1 3 5 Σ X n X Y n 2 1 β 2 6 α Y n-1 X D α 4 5 D 3 • The critical hardware unit in a DSP is the multiplier - much of the architecture is organized around allowing use of the multiplier on every cycle • This means providing two operands on every cycle, through multiple data and address busses, multiple address units and local accumulator feedback 27

IIR and FIR filters • Infinite Impulse Response (IIR) filter - has a feedback loop and the response to an impulse goes on forever Σ Y n X β α Y n-1 X D α • The impulse response completely characterizes the filter response, so a more direct (purely digital) approach is the finite impulse response filter or FIR. h 4 1 h 1 h 2 000 h 3 h 5 28

FIR filter frequency response 15 stages 128 stages • FIR filters are a very general structure and form the base of much more sophisticated processing, e.g. adaptive filters which make possible 56 kbit modems 29

Transformations result in different critical paths for direct map architectures MAC X D D D D computations X h 2 h 5 h 4 h 1 X h 3 X X X Σ Σ Y Σ Σ Critical path = 4 adders + multiply X X h 5 h 2 h 4 h 1 X h 3 X X X Σ Σ Σ Σ Y D D D D Critical path = 1 adder + multiply 30

Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.edu) 1 - PowerPoint PPT Presentation

CS 152 Computer Architecture and Engineering Introduction to Architectures for Digital Signal Processing Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.edu) 1 Processor Applications General Purpose - high performance

Dec 2017 Progress Report Nov Dec Maddux Nov Dec Maddux Nov Dec Maddux Nov Dec Maddux

= = = f f BOB BOB not does like not like = Alice Bob Alice Bob not Bob Coecke

RSA Question 2 Bob thinks that p and q are primes but p isnt. Then, Bob thinks Bob

Optimization Models EECS 127 / EECS 227AT Laurent El Ghaoui EECS department UC Berkeley Spring

EECS 252 Graduate Computer Architecture Lec 1 - Introduction David Culler Electrical

Lecture #09: UC Berkeley EECS Lecturer M ichael Ball Object-Oriented Programming Nov 4, 2019

Lecture #10: UC Berkeley EECS Lecturer M ichael Ball Efficiency & Data Structures Nov 12,

September 3, 1997 Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides:

= = = f f BOB BOB meaning vectors of words not does like not like = Alice Bob Alice

EECS 228a Lecture 1 Overview: Networks Jean Walrand www.eecs.berkeley.edu/~wlr Fall 2002

Iterators and Generators April 17, 2020 http://inst.eecs.berkeley.edu/~cs88 Computational

Lecture 12: Mutability March 9, 2020 http://inst.eecs.berkeley.edu/~cs88 Announcements Maps

Hierarchical Routing EECS 228 Abhay Parekh parekh@eecs.berkeley.edu Hierarchical Routing Is

Routing on Overlay Networks EECS 228 Abhay Parekh parekh@eecs.berkeley.edu October 28, 2002

= = = f f BOB BOB meaning vectors of words not does like not like = Alice Bob Alice

= = = f f BOB BOB meaning vectors of words not does like = not like Alice Bob Alice

On the duality of topological Boolean algebras Matthew de Brecht 1 Graduate School of Human and

A Brief Introduction to Machine Learning (With Applications to Communications) Osvaldo Simeone

Real-Time Embedded Convex Optimization Stephen Boyd joint work with Michael Grant, Jacob

CEE 370 Environmental Engineering Principles Lecture #34 Solid Waste II: Landfills Reading:

GSM System Overview GSM System Overview GSM System Overview GSM System Overview Phone Lin

ADVANCE IN IN GSM ECE 2526 MOBILE COMMUNICATION Monday, 17 February 2020 RECALL - NUMBER OF

Nonlinear Control Lecture # 4 Stability of Equilibrium Points Nonlinear Control Lecture # 4

Foundations of Chemical Kinetics Lecture 7: Statistical treatment of equilibrium Marc R. Roussel

Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.edu) 1 - PowerPoint PPT Presentation

CS 152 Computer Architecture and Engineering Introduction to Architectures for Digital Signal Processing Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.edu) 1 Processor Applications General Purpose - high performance

Dec 2017 Progress Report Nov Dec Maddux Nov Dec Maddux Nov Dec Maddux Nov Dec Maddux

= = = f f BOB BOB not does like not like = Alice Bob Alice Bob not Bob Coecke

RSA Question 2 Bob thinks that p and q are primes but p isnt. Then, Bob thinks Bob

Optimization Models EECS 127 / EECS 227AT Laurent El Ghaoui EECS department UC Berkeley Spring

EECS 252 Graduate Computer Architecture Lec 1 - Introduction David Culler Electrical

Lecture #09: UC Berkeley EECS Lecturer M ichael Ball Object-Oriented Programming Nov 4, 2019

Lecture #10: UC Berkeley EECS Lecturer M ichael Ball Efficiency &amp; Data Structures Nov 12,

September 3, 1997 Dave Patterson (http.cs.berkeley.edu/~patterson) lecture slides:

= = = f f BOB BOB meaning vectors of words not does like not like = Alice Bob Alice

EECS 228a Lecture 1 Overview: Networks Jean Walrand www.eecs.berkeley.edu/~wlr Fall 2002

Iterators and Generators April 17, 2020 http://inst.eecs.berkeley.edu/~cs88 Computational

Lecture 12: Mutability March 9, 2020 http://inst.eecs.berkeley.edu/~cs88 Announcements Maps

Hierarchical Routing EECS 228 Abhay Parekh parekh@eecs.berkeley.edu Hierarchical Routing Is

Routing on Overlay Networks EECS 228 Abhay Parekh parekh@eecs.berkeley.edu October 28, 2002

= = = f f BOB BOB meaning vectors of words not does like not like = Alice Bob Alice

= = = f f BOB BOB meaning vectors of words not does like = not like Alice Bob Alice

On the duality of topological Boolean algebras Matthew de Brecht 1 Graduate School of Human and

A Brief Introduction to Machine Learning (With Applications to Communications) Osvaldo Simeone

Real-Time Embedded Convex Optimization Stephen Boyd joint work with Michael Grant, Jacob

CEE 370 Environmental Engineering Principles Lecture #34 Solid Waste II: Landfills Reading:

GSM System Overview GSM System Overview GSM System Overview GSM System Overview Phone Lin

ADVANCE IN IN GSM ECE 2526 MOBILE COMMUNICATION Monday, 17 February 2020 RECALL - NUMBER OF

Nonlinear Control Lecture # 4 Stability of Equilibrium Points Nonlinear Control Lecture # 4

Foundations of Chemical Kinetics Lecture 7: Statistical treatment of equilibrium Marc R. Roussel

Lecture #10: UC Berkeley EECS Lecturer M ichael Ball Efficiency & Data Structures Nov 12,