Slides taken from: TMS320C54x DSP Design Workshop Module 1 Introduction and Overview
C5 block diagram 1 - 2
DSP: Sum-of-Products 100 = y x a n n = 1 n x a MPY ADD y 1 - 3
MAC Unit Details D C P A T D A s/u s/u D = Data Bus C = Coefficient Bus MPY A P = Program Bus FRCT B A = A accumulator 0 ADD B = B accumulator T = Temporary register s/u = signed/unsigned acc A acc B FRCT = Fractional mode bit MAC *AR2+, *AR3+, A 1 - 4
'C54x Buses P D M M INTERNAL EXTERNAL U U X C MEMORY MEMORY X E S E C D A T MAC B ALU SHIFT MAC *AR2+, *AR3+, A 1 - 5
Pipeline - Concept F Fetch Get instruction from memory D Decode Schedule activity R Read Get operand from memory X Execute Perform operation 1 - 6
Fetch and Read - Memory Interaction ◆ Broken into two phases: 1. Calculate address 2. Collect data or instruction ◆ Allows more time for memory interface. 1 - 7
‘C54x Pipeline - Enhanced P Prefetch Calculate address of instruction F Fetch Collect instruction D Decode Interpret instruction A Access Calculate address of operand R Read Collect operand X Execute Perform operation 1 - 8
Memory Write ◆ When storing results back to memory ◆ Two phases ⚫ Address set up ⚫ Data written ◆ Overlaid onto R + X phases ◆ Best balance of: ⚫ Processor loading ⚫ Speed ⚫ Cost 1 - 9
‘C5: kernel of a FIR routine 1 - 10
'C54x Pipeline Events P Drive address of instruction P A F Collect instruction P D ctlr D Interpret instruction, plan job A Set up pointers, Calc data address D A /C A R Collect operand D D /C D Calculate Write address E A X Execute operation *,+ Send result E D 1 - 11
‘C54x Pipeline Hardware P PC, P A F Program Mem, P D D Controller A ARs, D A + C A , ARAUs R ; AR, ARAU, E A Data Mem, D D + C D X CALU (MAC, ALU) ; E D , Data Mem CALU = Combined Arithmetic Logic Unit (MAC +ALU) 1 - 12
'C54x Components and Bus Usage PC ARs CNTL P D M M INTERNAL EXTERNAL U U C X MEMORY MEMORY X E S E A T MAC B ALU SHIFT 1 - 13
Pipeline Performance TIME P 1 F 1 D 1 A 1 R 1 X 1 P 2 F 2 D 2 A 2 R 2 X 2 P 3 F 3 D 3 A 3 R 3 X 3 P 4 F 4 D 4 A 4 R 4 X 4 P 5 F 5 D 5 A 5 R 5 X 5 P 6 F 6 D 6 A 6 R 6 X 6 FULLY LOADED 'PIPE' 1 - 14
Pipeline Conflicts - External Memory P 54x D P 1 F 1 D 1 A 1 R 1 X 1 P 2 F 2 D 2 A 2 R 2 X 2 P 3 F 3 D 3 A 3 R 3 X 3 P 4 -- -- -- F 4 D 4 A 4 R 4 X 4 -- -- -- P 5 F 5 D 5 A 5 R 5 X 5 -- -- -- P 6 F 6 D 6 A 6 R 6 1 - 15
Pipeline Flow: Internal and External Memories 54x 54x ? or P D P D I’m not sure this work P 1 F 1 D 1 A 1 R 1 X 1 P 2 F 2 D 2 A 2 R 2 X 2 P 3 F 3 D 3 A 3 R 3 X 3 P 4 F 4 D 4 A 4 R 4 X 4 P 5 F 5 D 5 A 5 R 5 X 5 P 6 F 6 D 6 A 6 R 6 X 6 NO CONFLICT 1 - 16
Pipeline: Internal Memory Only 'C54x ROM DARAM RAM 4K 2K 8K 2K 8K 4K . . . . . . . . . P D C MAC ALU ROM & RAM: One access per block per cycle DARAM: Two accesses per block per cycle 1 - 17
'C54x Memory Mix C54x DARAM SARAM ROM DROM 1 5 28 8 2 10 2 3 10 2 4 4 24 8 5 6 48 16 6 6 48 16 9 8 24 16 1 - 18
'C54x Peripheral Mix C54x SER TDM BSP HPI 1 2 2 1 1 1 3 1 1 4 2 5 1 1 1 6 1 1 9 1 2 1 1 - 19
'C54x Review - System Four buses allow 1 fetch, 2 reads, and 1 write each cycle. ◆ Built from and for cDSP: ◆ ⚫ Fast growing family ⚫ Easy to modify for custom use Attributes ◆ ⚫ Static design ➢ Low power ➢ Any clock below maximum ⚫ Low $/MIP ⚫ Fast/dense instructions ⚫ Small size for functionality ⚫ LC version for 3V operation, VC for 2.5/3.3V operation 1 - 20
Recommend
More recommend