Blackfin Processor Architecture Processor Architecture Blackfin - - PowerPoint PPT Presentation

blackfin processor architecture processor architecture
SMART_READER_LITE
LIVE PREVIEW

Blackfin Processor Architecture Processor Architecture Blackfin - - PowerPoint PPT Presentation

Graduate Institute of Electronics Engineering, NTU Blackfin Processor Architecture Processor Architecture Blackfin Instructor: Prof. Andy Wu ACCESS IC LAB ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Introduction


slide-1
SLIDE 1

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Blackfin Blackfin Processor Architecture Processor Architecture

Instructor: Prof. Andy Wu

slide-2
SLIDE 2

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Introduction Blackfin Processor Blackfin Processor Product Highlights

slide-3
SLIDE 3

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Introduction Blackfin Processor Blackfin Processor Product Highlights

slide-4
SLIDE 4

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Berkeley incorporated a Reduced Instruction Set Computer (RISC) architecture It has the following key features:

A fixed (32-bit) instruction size with few formats CISC processors typically had variable length instruction sets with many formats A load store architecture were instructions that process data operate only on registers and are separate from instructions that access memory CISC processors typically allowed values in memory to be used as operands in data processing instructions A large register bank of thirty-two 32-bit registers, all of which could be used for any purpose, to allow the load-store architecture to operate efficiently CISC register sets were getting larger, but none was this large and most had different registers for different purposes

slide-5
SLIDE 5

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Hard-wired instruction decode logic

CISC processor used large microcode ROMs to decode their instructions

Pipelined execution

CISC processors allowed little, if any, overlap between consecutive instructions (though they do now)

Single-cycle execution

CISC processors typically took many clock cycles to completes a single instruction

slide-6
SLIDE 6

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Single memory space for program and data Shared global bus

slide-7
SLIDE 7

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Separate program and data memory spaces Usually refer to separate program and data buses

slide-8
SLIDE 8

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Program bus can be use for coefficient loading for MAC

slide-9
SLIDE 9

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Introduction Blackfin Processor Blackfin Processor Product Highlights

slide-10
SLIDE 10

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Made by Analog Devices Coporation A new breed of embedded media processor designed specifically for today s embedded audio, video and communication applications. Combine a 32-bit RISC-like instruction set and dual 16-bit multiply accumulate (MAC) signal processing functionality Perform equally well both in signal processing and control processing applications-in many cases deleting the requirement for separate heterogeneous processors

slide-11
SLIDE 11

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

slide-12
SLIDE 12

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Two 16-bit MACs, two 40-bit ALUs, four 8-bit Video ALUs Support for 8/16/32-bit integer and 16/32-bit fractional data types Concurrent Fetch of One instruction and two unique data elements Two loop counters that allow for nested zero-overhead looping A Modified Harvard architecture in combinational with a hierarchical memory

slide-13
SLIDE 13

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

slide-14
SLIDE 14

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Arbitrary bit and bit field manipulation, insertion and extraction Two data address generator (DAG) units with circular and bit-reversed addressing

Data address generator contains two 32-bit address ALUs and an address register file Address register file consists of six 32-bit general purpose pointer registers and four 32-bit circular buffer addressing registers

slide-15
SLIDE 15

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Unified 4GB memory space Mixed 16/32-bit instruction encoding for best code density Memory protection for support of OS operation

slide-16
SLIDE 16

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Three modes of operation

User mode

User mode has restricted access to a subset of system resources, thus providing a protected software environment User mode is considered the domain of application programs

Supervisor mode and Emulation mode

Supervisor mode and Emulation mode have unrestricted access to the core resources Supervisor mode and Emulation mode are usually reserved for the kernel code of an operating system

slide-17
SLIDE 17

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Blackfin Blackfin Architecture Support Architecture Support (Single Cycle ) (Single Cycle )

Possibility of the following parallel operations processed in one clock cycle

Execution of a single instruction operating on both MACs or ALUs Execution of a 2 x 32-bit data moves 2 reads or 1 read/1 write Execution of two pointer updates Execution of hardware loop updates

slide-18
SLIDE 18

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Blackfin Blackfin Processor Compute Unit Processor Compute Unit

slide-19
SLIDE 19

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

BF533 Memory Access BF533 Memory Access

Under the right conditions 4 memory accesses at same time 64 bit Instruction Fetch, 2x32 bit Data Loads, 32 bit Data Store PLUS up to 2 ALU(32 bit) and 2 MAC(16 bit) operations at the same time PLUS background DMA activity

slide-20
SLIDE 20

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Compute Unit Architecture Compute Unit Architecture

slide-21
SLIDE 21

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Register File Register File

Data Register Syntax

R0, R1 etc. refer to 32 bit registers R0.L refers to the low 16 bits of the R0 32 bit reg R0.H refers to the high 16 bits of the R0 register

Accumulator Syntax

A0.L => low 16 bits A0.H => next 16 bits A0.W => least significant 32 bit word A0.X => MS 8 bit extension

SHARC 16 32-bit data registers, integer and float. There is a pair of SHARC accumulator registers too

8 x 32 bit OR 16 x 16 bit 2 x 40 bit accumulators

slide-22
SLIDE 22

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

slide-23
SLIDE 23

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

68K MOVE.L R2, R0 ADD.L R1, R0 MOVE.W R2, R0 ADD.W R1, R0 MOVE.L R2, R0 ASR.L #16, R0 MOVE.L R1, R3 ASR.L #16, R3 ADD.W R3, R0 ASL.L #16, R0 MOVE.W R2, R0 ADD.W R1, R0 SHARC R0 = R1 + R2; Closest R0 = R1 + R2, R4 = R1 R2; Blackfin R0 = R1 + R2; R0.L = R1.L + R2.H; R0 = R1 +|- R2; Means R0.L = R1.L R2.L in parallel with R0.H = R1.H + R2.H

slide-24
SLIDE 24

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

slide-25
SLIDE 25

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

A & B registers must stay on the same side of the | for both Instruction For dual and quad 16 bit operations the (CO) option causes the destination registers to cross

slide-26
SLIDE 26

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

slide-27
SLIDE 27

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Multiplies are signed fractional by default Signed fractional multiply result is automatically left shifted 1 bit Signed fractional multiply != signed integer multiply Rounding available on fractional number multiplies and special

  • ption of integer number multiplies
slide-28
SLIDE 28

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Two cases

Rounding adds 0x8000 to the 32 bit multiplier result or accumulator value before extracting a 16 bit value to the destination register too

slide-29
SLIDE 29

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

When extracting a 16 bit fractional value from an accumulator the high 16 bits is taken Where in the destination register it goes depends on which accumulator is being extracted from

slide-30
SLIDE 30

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

When extracting a 16 bit integer value from an accumulator the low 16 bits is taken Where in the destination register the 16 bit value goes depends on which accumulator is being extracted from

slide-31
SLIDE 31

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

slide-32
SLIDE 32

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

slide-33
SLIDE 33

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

In general there are 16 and 32 bit versions of the arithmetic instructions Most of the 32 bit instructions can be executed in parallel with 2 x 16 bit memory/index operations Exceptions are DIVS, DIVQ and MULTIPLY with 32 bit operands || means parallel Examples:

A1=R2.L*R1.L,A0=R2.H*R1.H||R2.H=W[I2++] || [I3++]=R3;\ R2=R2+|+R4, R4=R2-|-R4 || I0+=M0||R1=[I0];

slide-34
SLIDE 34

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Blackfin Blackfin Processor Processor Memory Architecture Memory Architecture

slide-35
SLIDE 35

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

A single, unified 4G byte address space using 32-bit addresses The L1 memory system is the primary highest performance memory available to the core and is faster than L2 memory system The L2 memory system is off-chip and have longer access latencies

slide-36
SLIDE 36

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Blackfin Blackfin Processor Peripherals Processor Peripherals

slide-37
SLIDE 37

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Parallel Peripheral Interface (PPI) Serial Ports (SPORTs) Serial Peripheral Interface (SPI) General-purpose timers Universal Asynchronous Receiver Transmitter (UART) Real-Time Clock (RTC) Watchdog timer General-purpose I/O (programmable flags)

slide-38
SLIDE 38

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Introduction Blackfin Processor Blackfin Processor Product Highlights

slide-39
SLIDE 39

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF535 EZ

BF535 EZ-

  • KIT

KIT Lite Lite

Key features

Attributes

ADSP-BF535 Blackfin Processor 4M x 32-bit SDRAM 272K x 16-bit FLASH memory AD1885 48 kHz AC 97 SoundMax codec Power management capability JTAG ICE 14-pin header Evaluation suite of VisualDSP++ Three 90-pin conncetors for analyzing and interfacing with the processors peripheral interfaces CE Certified

System Requirements

Pentium 166 MHz or higher Minimum of 32 MB of RAM Windows 98, Windows 2000, or Windows XP One USB port

slide-40
SLIDE 40

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

Analog Devices CROSSCORE Tools Analog Devices CROSSCORE Tools

CROSSCORE, Analog Devices development tools product line, provides easier and more robust methods for engineers to develop and optimize systems by shortening product development cycles for faster time-to-market

VisualDSP++ software development and debugging environment An integrated software development and debugging environment allowing for fast and easy development, debug, and deployment EZ-KIT Lite evaluation systems Provides an easy way to investigate the power of the ADI s family of Embedded Processors and DSPs to develop applications Emulators Emulators are available for PCI and USB host platforms

slide-41
SLIDE 41

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF535

BF535 Blackfin Blackfin Processor Processor

Key features

High performance 16-bit dual MAC processor core up to 350 MHz Flexible, software controlled Dynamic Power Management Optimized RISC instruction set for high code density and programming C/C++ language Enhanced media instructions to process audio, image, and video for multimedia applications Integrated system peripherals including USB device, PCI, serial ports, UARTs, SPIs, 32-bit timers, and more

Blackfin processors utilize

Single processor core Single instruction set Single programming model Single set of development tools

slide-42
SLIDE 42

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF535

BF535 Blackfin Blackfin Processor Processor

Target applications

Automotive Broadband access Central office/network switch Digital imaging and printing Global positioning systems Industrial signal processing Instrumentation/telemetry Internet appliances Modem solutions Personal branch exchanges (PBX) POS terminals Telecommunications Video conferencing VoIP phone solutions

slide-43
SLIDE 43

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF535

BF535 Blackfin Blackfin Processor Processor

Blackfin Processor System Environment

slide-44
SLIDE 44

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF535

BF535 Blackfin Blackfin Processor Processor

Blackfin Processor Memory Hierarchy

L1 instruction and data memories can be dynamically configured as SRAM, cache, or a combination of both L2 for larger storage need of instruction and data

slide-45
SLIDE 45

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF535

BF535 Blackfin Blackfin Processor Processor

Portable Low Power Architecture

Dynamic power management

slide-46
SLIDE 46

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF535

BF535 Blackfin Blackfin Processor Processor

ADSP-BF535 Block Diagram

slide-47
SLIDE 47

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF561

BF561 Blackfin Blackfin Symmetric Symmetric Multi Multi-

  • Processor

Processor

ADSP-BF561 Symmetric Multi-Processor Block Diagram

slide-48
SLIDE 48

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF561

BF561 Blackfin Blackfin Symmetric Symmetric Multi Multi-

  • Processor

Processor

Key features

Blackfin Symmetric Multi-Processor Dual high performance Blackfin Processors up to 756 MHz

Capable of over 3000 MMACs Independent processor cores for image processing and system control functions

RISC-like register and instruction model for ease of programming and C/C++ complier friendly support Enhanced media instructions process audio, image, and video data for multimedia applications Software controlled Dynamic Power Management with on-chip voltage regulation minimizes power consumption

slide-49
SLIDE 49

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF561

BF561 Blackfin Blackfin Symmetric Symmetric Multi Multi-

  • Processor

Processor

Key features

Highest Level of integration 328 Kbytes of total on-chip memory Dual Parallel Peripheral Interface and ITU-R 656 video data formats External memory controller providing glueless connection to multiple banks

  • f external SDRAM, SRAM, FLASH, or ROM memory

High bandwidth, two-dimensional internal DMA controllers UART with support for IrDA Integrated on-chip voltage regulator 256-ball Pb-Free Mini-BGA, and 297-ball Sparse PBGA package options

slide-50
SLIDE 50

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF561

BF561 Blackfin Blackfin Symmetric Symmetric Multi Multi-

  • Processor

Processor

Key features

Target Applications Digital still cameras Digital video cameras Hybrid digital video/still cameras Video security/surveillance system Portable multimedia players

slide-51
SLIDE 51

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF531/BF532/BF533

BF531/BF532/BF533 Blackfin Blackfin Processor Series Processor Series

Key features

Blackfin Processors Offer Features Attractive to a Broad Application Base Performance to 756 MHz/1512 MMAC enables multichannel audio plus VGA/D1 video processing in multimedia applications Enhanced Dynamic Power Management with on-chip voltage regulation allows operation to 0.8V, extending battery life in portable applications Application-tuned peripherals provide glueless connectivity to general- purpose converters in data acquisition applications Multiple low cost, pin and code compatible derivatives enable software differentiation in cost-sensitive consumer applications

slide-52
SLIDE 52

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF531/BF532/BF533

BF531/BF532/BF533 Blackfin Blackfin Processor Series Processor Series

Key features

High Level of Integration Up to 148 Kbytes of on-chip SRAM Parallel Peripheral Interface supporting ITU-R 656 video data formats Two-dual channel, full duplex synchronous serial ports supporting eight stereo IS channels 12 DMA channels supporting one- and two-dimensional data transfers Memory controller providing glueless connection to multiple banks of external SDRAM, SRAM, flash, or ROM Three timers supporting PWM and pulsewidth /event count modes UART with support for IrDA SPI compatible port Real-time clock Watchdog timer PLL capable of 1x to 63xfrequency multiplication 160-ball mini-BGA, 169-ball Pb-Free PBGA and 176-lead LQFP packages Commercial and industrial temperature ranges

slide-53
SLIDE 53

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF531/BF532/BF533

BF531/BF532/BF533 Blackfin Blackfin Processor Series Core Architecture Processor Series Core Architecture

Key features

Two 16-bit multipliers Two 40-bit accumulators Two 40-bit arithmetic logic units (ALU) Four 8-bit video ALUs One 40-bit shifter Compute register file Contains eight 32-bit registers Can be operated as 16 Independent 16-bit registers MAC Can perform a 16 - by 16 bit multiply per cycle, with accumulation to a 40-bit result Signed and unsigned formats, rounding, and saturation are supported

slide-54
SLIDE 54

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF531/BF532/BF533

BF531/BF532/BF533 Blackfin Blackfin Processor Series Core Architecture Processor Series Core Architecture

Key features

Program sequencer Controls the instruction execution flow, including instruction alignment and decoding For program flow control, the sequencer supports PC-relative and indirect conditional jumps ( with static branch prediction ) and subroutine calls Hardware is provided to support zero-overhead looping The architecture is fully interlocked, meaning there are no visible pipeline effects when executing instructions with data dependencies Address arithmetic unit Provides two addresses for simultaneous dual fetches from memory Contains a multiported register file consisting of four sets of 64-bit index, Modify, Length, and Base registers (for circular buffering) and eight additional 32-bit pointer registers (for C-style indexed stack manipulation)

slide-55
SLIDE 55

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

ADSP ADSP-

  • BF531/BF532/BF533

BF531/BF532/BF533 Blackfin Blackfin Processor Series Core Architecture Processor Series Core Architecture

Key features

Blackfin processor support a modified Harvard architecture in combination with a hierarchical memory structure Level 1 (L1) memories typically operate at the full processor speed with little

  • r no latency

At the L1 level, the instruction memory holds instructions only. The two data memories hold data, and a dedicated scratchpad data memory stores stack and local variable information Three modes of operation User mode has restricted access to a subset of system resources, thus providing a protected software environment Supervisor and Emulation modes have unrestricted access to the system core resources

slide-56
SLIDE 56

ACCESS IC LAB

Graduate Institute of Electronics Engineering, NTU

[1] Analog Devices Web Site, http://www.analog.com/ [2] Blackfin Processor http://www.analog.com/processors/processors/blackfin/ [2] ADSP-BF533 Blackfin Processor Hardware Reference, Rev 1.0, December 2003, Analog Devices. Section 2 [3] Blackfin Processor Instruction Set Reference, Rev 3, June 2004, Analog Devices. Sections 8 ~ 10, 14 & 15 I suggest that students who want to be familiar with the Blackfin Processor should read reference 3 and 4 thoroughly.