Slides taken from: TMS320C54x DSP Design Workshop Module 1 - - PowerPoint PPT Presentation
Slides taken from: TMS320C54x DSP Design Workshop Module 1 - - PowerPoint PPT Presentation
Slides taken from: TMS320C54x DSP Design Workshop Module 1 Introduction and Overview C5 block diagram 1 - 2 DSP: Sum-of-Products 100 = y x a n n = 1 n x a MPY ADD y 1 - 3 MAC Unit Details D C P A T D A s/u s/u D
1 - 2
C5 block diagram
1 - 3
DSP: Sum-of-Products
y x a
n n n
=
=
1 100
x a
MPY ADD
y
1 - 4
MAC Unit Details
MPY ADD D C acc A acc B MAC *AR2+, *AR3+, A A B P A T D A s/u s/u FRCT
D = Data Bus C = Coefficient Bus P = Program Bus A = A accumulator B = B accumulator T = Temporary register s/u = signed/unsigned FRCT = Fractional mode bit
1 - 5
'C54x Buses
P D C E INTERNAL MEMORY
M U X E S
EXTERNAL MEMORY
M U X
MAC *AR2+, *AR3+, A
ALU SHIFT B A MAC T D C
1 - 6
Pipeline - Concept
F Fetch Get instruction from memory D Decode Schedule activity R Read Get operand from memory X Execute Perform operation
1 - 7
Fetch and Read - Memory Interaction
◆ Broken into two phases:
- 1. Calculate address
- 2. Collect data or instruction
◆ Allows more time for memory interface.
1 - 8
‘C54x Pipeline - Enhanced
P Prefetch Calculate address of instruction F Fetch Collect instruction D Decode Interpret instruction A Access Calculate address of operand R Read Collect operand X Execute Perform operation
1 - 9
Memory Write
◆ When storing results back to memory ◆ Two phases
⚫ Address set up ⚫ Data written
◆ Overlaid onto R + X phases ◆ Best balance of:
⚫ Processor loading ⚫ Speed ⚫ Cost
1 - 10
‘C5: kernel of a FIR routine
1 - 11
'C54x Pipeline Events
P Drive address of instruction F Collect instruction D Interpret instruction, plan job A Set up pointers, Calc data address R Collect operand X Execute operation PA PD ctlr DA/CA DD/CD *,+ Calculate Write address Send result EA ED
1 - 12
‘C54x Pipeline Hardware
P F D A R X PC, PA Program Mem, PD Controller ARs, DA + CA , ARAUs Data Mem, DD + CD CALU (MAC, ALU) ; AR, ARAU, EA ; ED, Data Mem
CALU = Combined Arithmetic Logic Unit (MAC +ALU)
1 - 13
'C54x Components and Bus Usage
INTERNAL MEMORY EXTERNAL MEMORY
M U X E S M U X
PC P CNTL ARs C D ALU SHIFT B A MAC T E
1 - 14
Pipeline Performance
TIME
P1 F1 P2 D1 F2 P3 A1 D2 F3 P4 R1 A2 D3 F4 P5 X1 P6 R2 A3 D4 F5 F6 X2 R3 A4 D5 D6 X3 R4 A5 A6 X4 R5 R6 X5 X6
FULLY LOADED 'PIPE'
1 - 15
Pipeline Conflicts - External Memory
P1
P D
54x
P2 F1 D1 P3 F2 A1 D2 P4 F3 D3 A2
- R1
X1 A3
- R2
X2
- R3
X3 P5
- F4
D4 P6 F5 A4 D5 F6 A5 D6 R4 X4 A6 R5 X5 R6
1 - 16
Pipeline Flow: Internal and External Memories
P 54x D
- r
P1 F1 D1 A1 R1 X1 P2 F2 P3 F3 D3 A3 R3 X3 D2 A2 R2 X2 P4 F4 D4 A4 R4 X4 P5 F5 D5 A5 R5 X5 P6 F6 D6 A6 R6 X6
NO CONFLICT
D 54x P
I’m not sure this work
?
1 - 17 ALU MAC P D C
'C54x
ROM RAM . . . 4K 4K 8K 8K . . .
Pipeline: Internal Memory Only
DARAM: Two accesses per block per cycle
DARAM 2K 2K . . .
ROM & RAM: One access per block per cycle
1 - 18
'C54x Memory Mix
1 5 28 8 2 10 2 3 10 2 4 4 24 8 5 6 48 16 6 6 48 16 9 8 24 16 C54x
DARAM SARAM ROM DROM
1 - 19
'C54x Peripheral Mix
C54x SER TDM BSP HPI
1 2 2 1 1 1 3 1 1 4 2 5 1 1 1 6 1 1 9 1 2 1
1 - 20
'C54x Review - System
◆
Four buses allow 1 fetch, 2 reads, and 1 write each cycle.
◆
Built from and for cDSP:
⚫ Fast growing family ⚫ Easy to modify for custom use
◆
Attributes
⚫ Static design
➢ Low power ➢ Any clock below maximum
⚫ Low $/MIP ⚫ Fast/dense instructions ⚫ Small size for functionality ⚫ LC version for 3V operation, VC for 2.5/3.3V operation