ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
DSP Introduction DSP Introduction Instructor: Prof. An-Yeu Wu - - PowerPoint PPT Presentation
Graduate Institute of Electronics Engineering, NTU DSP Introduction DSP Introduction Instructor: Prof. An-Yeu Wu 2004/September ACCESS IC LAB ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Outline Outline Digital
Graduate Institute of Electronics Engineering, NTU
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P2
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P3
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P4
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P5
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P6
1 : SIGN, INDICATION 2a : an act, event, or watchword that has been agreed on as the occasion of concerted action b : something that incites to action
Continuous time or Discrete time Continuous valued or Discrete values 1-D signals or 2-D signals (different dimension) Real valued or Complex valued Scalar or Vector Deterministic or Random
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P7
Continuous time & continuous valued: Analog signal Discrete time & continuous valued: Sampled signal Continuous time & discrete valued: Quantized signal Discrete time & discrete valued: Digital signal
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P8
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P9
2 b (1) : to subject to or handle through an established usually routine set of procedures (2) : to subject to examination or analysis
Communication: Modulation, Demodulation Signal enhancement: Filtering, Equalization… Spectral analysis: Transform… Image processing: Reconstruction, Watermarking... Data compression: Transform, Quantization… Security: Encryption, Decryption
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P10
Higher complexity: nonlinear, time-variant systems
Temperature, Pressure, Gravity…
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P11
Digital signal processing is to process real world signals (represented discrete and quantized or naturally digital) using mathematical techniques or algorithmic manipulation to perform transformations or extract information.
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P12
Signals in DSP system are sequences of quantized samples (discrete both in time and value). Signals are obtained from physical signals via transducers (e.g., microphones) and than become electric signals (e.g. voltage). Electric signals are converted to digital signal by sampling and quantizing of analog-to-digital converters (ADC). Digital signals may be recorded or converted into analog signals (e.g., voltage) through digital-to-analog converters (DAC). Transducers (e.g., speaker) convert electrical signal back into physical signals.
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P13
⎥ ⎦ ⎥ ⎢ ⎣ ⎢ ⋅ = ε ε y y Q ) (
Sampling interval: T Quantize
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P14
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P15
Perfect reproduction without error and perfect duplication of processing result Accuracy in digital signal representations can be controlled better by changing word-length of the signal.
Digital signals can be stored and recovered, transmitted and received, processed and manipulated, all virtually without error.
Complicated or sophisticated DSP techniques can be easily applied to target signal. Faster system design, and verification in every development cycles.
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P16
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P17
– Communications – Audio and video processing – Graphics, image enhancement, rendering – Navigation, radar, GPS – Control - robotics, machine vision, guidance
– Frequency domain filtering – FIR, IIR – Frequency- time transformations – FFT, DCT – Correlation
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P18
Spatial domain Frequency domain Quantize--Dequantize
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P19
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P20
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P21
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P22
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P23
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P24
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P25
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P26
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P27
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P28
General Purpose - high performance
– Pentiums, Alpha's, SPARC – Used for general purpose software – Heavy weight OS - UNIX, NT – Workstations, PC's
Embedded processors and processor cores
– ARM, 486SX, Hitachi SH7000, NEC V800 – Single program – Lightweight OS – eCos, uLinux, … – Need DSP processor support in such oriented application – Cellular phones, consumer electronics (e. g. CD players)
Microcontrollers
– Extremely cost sensitive – Single program, OS is usually needless – Small word size - 8 bit common – Automobiles, toasters, thermostats, ...
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P29
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P30
ADI Blackfin processor, TI TMS320CX processor…
Pentium CPU, ARM
FFT processor, Equalizer
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P31
ADC, DAC
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P32
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P33
Field-Programmable Gate Arrays have the capability of being reconfigurable within a system. Fast time prototyping and development. Offer greater raw performance per specific operation because of the resulting dedicated logic circuit. FPGAs are significantly more expensive and typically have much higher power dissipation than DSPs with similar functionality. When FPGAs are the chosen performance technology in designs, DSPs are typically used in conjunction with FPGAs to provide greater flexibility, better price/performance ratios, and lower system power.
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P34
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P35
In contrast to ASICs that are optimized for specific functions, general-purpose microprocessors (GPPs) are best suited for performing a broad array of tasks. High performace GPPs are usually too expensive for many DSP applications. Such as CPU in our desktop. Low cost GPPs’ comparatively poor real time performance and high power consumption make them rule out in DSP applicatiion. Now in many system GPPs usually play the role of system controller instead of algorithm-computation unit.
Such as Cell phone
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P36
Power for one tap computation
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P37
Programmable DSPs come in 2 flavors, fixed and floating point. Floating point DSP:
Expensive, longer instruction cycle Large signal dynamic range Adopted in very presicion-sensitive case
Communication infrastructure Medical image system Military weapons
Fixed point DSP:
Cheaper, shorter instruction cycle Less signal dynamic range: constrained by wordlength Overflow possibility
Multimedia
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P38
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P39
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P40
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P41
M most recent samples in the delay line : x(i) New sample moves data down delay line “Tap” is a multiply-add Each tap (M+1 taps total) nominally requires:
Two data fetches Multiply Accumulate Memory write-back to update delay line
Goal: 1 FIR Tap / DSP instruction cycle
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P42
− = =
− =
1
) ( ) ( ) (
N i i
i n x i c n y
On Von-Neumann machine, the expressions are executed row by row.
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P43
loop: lw x0, 0(r0) lw y0, 0(r1) mul a, x0,y0 add y0,a,b sw y0,(r2) inc r0 inc r1 inc r2 dec ctr tst ctr jnz loop
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P44
instruction execute in parallel
1. Read MAC instruction 2. Read data value x from memory 3. Read coefficient c from memory 4. Write data value to next location in the delay line
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P45
Separate program and data memory spaces Usually refer to separate program and data buses
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P46
T-Register Accumulator ALU Multiplier Datapath: P-Register Mem
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P47
LT X4 ; Load T with x(n-4) MPY H4 ; P = H4*X4 LTD X3 ; Load T with x(n-3); x(n-4) = x(n-3); ; Acc = Acc + P MPY H3 ; P = H3*X3 LTD X2 MPY H2 ...
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P48
Program bus can be use for coefficient loading for MAC
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P49
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P50
MACD = Multiply by Program MEM and Accumulate with delay
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P51
Harvard architecture has many modified version:
Basic: separated program and data space Mod.1: program space contain read only data Mod.2: use multi-port memory for data space Mod.3: add program cache to enhance throughput of shared program/data memory block. Mod.4: use 2 separated data memory for simultaneous instruction/operands fetch Mod.5: use 4 separated data memory and add an I/O specific memory
Programmer can ignore Harvard architecture until it becomes necessary to optimize the code. While optimizing with multiple memories, programmer must carefully arrange data in memory to take advantage of the multiple memories.
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P52
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P53
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P54
Real number, Fractions, …
Integers
radix point
radix point
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P55
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P56
Set to most positive (2N–1–1) or most negative value(–2N–1) when overflow detected.
Arithmetic shift right (shift down), with sign extension
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P57
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P58
Motorola DSP: 24b x 24b => 48b product, 56b Accumulator
Accumulator ALU Multiplier G Accumulator ALU Multiplier Shift
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P59
Truncation: chop results => biases results up Round to nearest: < 1/2 round down, 1/2 round up (more positive) => smaller bias Convergent: < 1/2 round down, > 1/2 round up (more positive), = 1/2 round to make lsb a zero (+1 if 1, +0 if 0) => no bias IEEE 754 calls this round to nearest even
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P60
Pin count requirement of DSP processor will increase dramatically if implementing multiple memory bank off chip. More expensive package Physical memory with multi-port is also much more expensive.
Adopt 1~2 additional external bus for off-chip memory space Multiple internal memory banks with software-controlled paging system and external page pool.
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P61
It can simutaneous access 2 memory spaces.
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P62
Instruction repeat buffer: do 1 instruction 256 times Often use maskable interrupts, thereby increasing interrupt response time
Even then may allow programmer to “lock in” instructions into cache Option to turn cache into fast program memory or data memory
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P63
Registers Out
I/D Cache Physical memory TLB Registers DMA Controller I Cache Internal memories External memories TLB: Translation Look aside Buffer DMA: Direct Memory Access
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P64
Complex addressing is a better choice Prevent using data path to calculate address
lw r1,0(r2)+ => r1 <- M[r2]; r2<-r2+1 Option to do it before addressing, positive or negative
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P65
I/O buffer for data on delay-lines
Use modulo/circular addressing mode for circular buffer
Convolution Correlation FIR filters
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P66
0 (000) => 0 (000) 1 (001) => 4 (100) 2 (010) => 2 (010) 3 (011) => 6 (110) 4 (100) => 1 (001) 5 (101) => 5 (101) 6 (110) => 3 (011) 7 (111) => 7 (111)
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P67
Loop an instruction or sequence by an iterator No branch instruction is taken for looping In many DSP processor, if iterator=0, usually means looping maximum number of times (infinite looping).
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P68
Interlocking Time-stationary coding Data-stationary coding
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P69
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P70
Simultaneously update a0( accumulate ), p( multiply ) ,y( operand through pointer dereference ) , x( operand through pointer dereference )
Timing of a program is clear. Very fast interrupts: programmers explicitly control over the pipeline, and there is no need to flush the pipe prior to invoking the interrupt.
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P71
Instructions specify all of the operations performed on a set of
These instructions specify what happens to data, rather than what happens at a particular time in the hardware. Operations proceed in parallel, specified by neighbor instructions. Data-stationary coding is no less efficient than time-stationary coding. Fast interrupt are more difficult in data-stationary coding than time-stationary coding. e.g. AT&T DSP32
r5++ = a1 = a0+ *r7 * *r10 ++r17
Parallel write to memory and location specified by r5 Accumulate with value in memory Dereference and multiply Update pointer for operands
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P72
Some problems conspire to make it difficult to achieve a efficient branching.
If program address space is large, the destination address may not fit in an instruction word. More fetching from instruction memory may be required. Alternatives are paging and PC-relative addressing. In conditional branching, the fetch of the next instruction cannot
Solutions
Use delayed branch: fetch more instructions independent of branch and execute before branch occurs; or separate data arithmetic instructions several cycles prior to the test. Design low-overhead looping instructions for tight inner loop, rather than use branch instructions for loop. Design conditional instructions to institute conditional branches.
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P73
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P74
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P75
Specialized execution unit and instruction set Difficult to program in assembly Unfriendly compiler targets One instruction per instruction cycle such as multiply- accumulate and store
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P76
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P77
Packed into single large instruction word / packet Instructions may be positional or include routing information with in each sub-instruction word
Each instruction packet may be dispatched to several execution unit.
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P78
It’s a static superscalar DSP can execute simultaneously from
Combine VLIW with SIMD (single instruction multiple data)
The programmer has the option of directing both computation blocks to operate on the same data (broadcast distribution) or different data (merged distribution). Each computation block can execute four 16-bit or eight 8-bit SIMD computations in parallel.
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P79
More regular execution unit More instructions executed in parallel than traditional DSP with time-stationary coding instructions
Program sequence, to tell independent and dependent instructions. Compile-time specified dispatch rather than specified in silicon
Able to add more execution unit in processor core, allow more sub-instructions to be packed into one VLIW instruction
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P80
Programmer or code-generation tool must keep tracking of instruction scheduling Deep pipelines and long latencies can be confusing, and may make it hard to reach peak performance.
Higher memory bandwidth is required
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P81
The “MIPS/MFLOPS” of DSPs is speed of Multiply-Accumulate (MAC).
DSP are judged by whether they can keep the multipliers busy 100% of the time.
The "SPEC" of DSPs is 4 algorithms:
Infinite Impulse Response (IIR) filters Finite Impulse Response (FIR) filters FFT, and Convolution
Algorithm is everything for DSP Processor Software compatibility is not a concern
Programmers often write in assembly language to minimize requiring for ROM and optimize performance.
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P82
ACCESS IC LAB
Graduate Institute of Electronics Engineering, NTU
P83
[1] http://www.webster.com/ [2] http://www.BDTI.com/ [3] Gregory K. Wallace, “The JPEG Still Picture Compression Standard”, Communications of the ACM, Volume 34, Issue 4 (April 1991),Pages: 30 – 44, 1991, ISSN:0001-0782, http://portal.acm.org/citation.cfm?id=103089&coll=portal&dl=ACM&CFID=26765 382&CFTOKEN=77630149 [4] “TMS320C1X Digital Signal Processors Datasheet”, http://focus.ti.com/docs/prod/folders/print/tms320c10.html [5] http://www.ee.ucla.edu/~schaum/ee201a_S02/ [6] “Quick Guide to Developing with ADI DSPs - DSP Selection”, http://www.analog.com/processors/resources/beginnersGuide/quickguide1.html [7] Edward A. Lee,“Programmable DSP Architectures: Part I”, IEEE ASSP Magazine, p.4~p.19, October 1988 [8] Edward A. Lee,“Programmable DSP Architectures: Part II”, IEEE ASSP Magazine, p.4~p.19, January 1989