Embedded Embedded Architecture Architecture Systems Systems - - PowerPoint PPT Presentation

embedded embedded
SMART_READER_LITE
LIVE PREVIEW

Embedded Embedded Architecture Architecture Systems Systems - - PowerPoint PPT Presentation

Embedded Systems Systems Embedded Computer Computer Embedded Embedded Architecture Architecture Systems Systems Jakob Engblom, PhD Jakob Engblom, PhD Uppsala Unive University rsity & Virtutech Inc. & Virtutech Inc. Uppsala


slide-1
SLIDE 1

Jakob Engblom, PhD Jakob Engblom, PhD

Uppsala Uppsala Unive University rsity & Virtutech Inc. & Virtutech Inc.

jakob.engblom@it.uu.se jakob.engblom@it.uu.se jakob@virtutech.com jakob@virtutech.com

Embedded Embedded Systems Systems Computer Computer Architecture Architecture

tech virtutech virtutech virtutech virtu

14 Nov 2003 Embedded Computer Architecture 2

Embedded Embedded Systems Systems

14 Nov 2003 Embedded Computer Architecture 3

Embedded Systems Embedded Systems

It is a It is a snake snake! ! No, a No, a wall wall! ! No, a No, a pillar! pillar! No, it is a No, it is a treetrunk treetrunk! ! You You’ ’re re all all wrong wrong, it is a , it is a fan! fan!

Now what Now what is this is this elephant thing elephant thing? ?

14 Nov 2003 Embedded Computer Architecture 4

Embedded Systems Embedded Systems

“A computer that doesn A computer that doesn’ ’t t look like a computer look like a computer” ”

  • Interacts with world

Interacts with world

  • Primitive or no user interface

Primitive or no user interface

  • Part of other products

Part of other products

slide-2
SLIDE 2

14 Nov 2003 Embedded Computer Architecture 5

Embedded Systems Embedded Systems

  • Single purpose products

Single purpose products

  • Not

Not general purpose general purpose like desktop PCs like desktop PCs

  • Do one thing very efficiently

Do one thing very efficiently

  • Software very important:

Software very important:

  • Gives character to product

Gives character to product

  • Used to differentiate inside a

Used to differentiate inside a “ “platform platform” ”

  • Can be changed late

Can be changed late

  • Processor cheaper than special HW

Processor cheaper than special HW

  • T

Today, dominates dev cost

  • day, dominates dev cost

14 Nov 2003 Embedded Computer Architecture 6

"Desktop" 2% "Embedded" 98%

Processor Market Processor Market

  • Embedded

Embedded = most = most processors! processors!

  • 200 million PC and server

200 million PC and server

  • 8000 million embedded

8000 million embedded

14 Nov 2003 Embedded Computer Architecture 7

Processor Market Processor Market

  • Processors:

Processors:

  • 50% of all

50% of all semiconductor revenue semiconductor revenue

  • Explains why everyone

Explains why everyone wants to do processors wants to do processors

  • 32

32-

  • bit dominant

bit dominant

  • 30% of total

30% of total semiconductors semiconductors

  • PC processors:

PC processors:

  • 50% of CPU revenue

50% of CPU revenue

  • 15% of total

15% of total semiconductors semiconductors

  • AMD and Intel share it

AMD and Intel share it

32-bit

16-bit 8-bit 4-bit DSP

32-bit

16-bit 8-bit 4-bit DSP

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Units Money

14 Nov 2003 Embedded Computer Architecture 8

Real Real-

  • Time System

Time System

  • Timing as important as result

Timing as important as result

  • Hard real

Hard real-

  • time:

time:

  • Hard deadlines

Hard deadlines

  • Dead if missed deadline

Dead if missed deadline

  • Worst

Worst-

  • case

case

  • Soft real

Soft real-

  • time:

time:

  • Fuzzier deadlines

Fuzzier deadlines

  • Can miss some deadlines

Can miss some deadlines

  • Average

Average-

  • case

case

slide-3
SLIDE 3

14 Nov 2003 Embedded Computer Architecture 9

Real Real-

  • Time Systems

Time Systems

  • Embedded and Real

Embedded and Real-

  • Time

Time

  • Synonymous?

Synonymous?

  • Most embedded

Most embedded systems are systems are real real-

  • time

time

  • Most real

Most real-

  • time

time systems are systems are embedded embedded

embedded embedded real real-

  • time

time embedded embedded real real-

  • time

time

14 Nov 2003 Embedded Computer Architecture 10

Simple Embedded Simple Embedded Systems Systems

8-bit Hitachi H8/300 32 kB ROM, 32 kB RAM Standard microcontroller chip Byte-code machine, sensor drivers, … 8-bit Intel 8051, standard microcontroller Behavior, talk, IR communications

14 Nov 2003 Embedded Computer Architecture 11

Fun App: Smart Beer Glass Fun App: Smart Beer Glass

8 8-

  • b

bit, 8 it, 8-

  • pin

pin PIC processor PIC processor Capacitive Capacitive sens sensor for

  • r for

fluid level fluid level Indu Inductive coil for ctive coil for RF ID activation RF ID activation & power & power

CPU and reading coil in the

  • table. Reports the level of

fluid in the glass, alerts servers when close to empty

Cont Contactless actless transmission of transmission of power and power and readings readings

14 Nov 2003 Embedded Computer Architecture 12

No Upgrades Possible No Upgrades Possible

  • Once a product ships

Once a product ships… …

…it often cannot be serviced it often cannot be serviced

  • No download ability

No download ability

  • No writable persistent storage

No writable persistent storage

  • No disks

No disks

  • No loader

No loader

  • Software is write

Software is write-

  • once
  • nce
  • (There are exceptions)

(There are exceptions)

slide-4
SLIDE 4

14 Nov 2003 Embedded Computer Architecture 13

Consumer Electronics Consumer Electronics

  • Heterogeneous

Heterogeneous multiprocessor multiprocessor

  • 8

8-

  • bit Atmel AVR for UI, games,

bit Atmel AVR for UI, games, … …

  • 16

16-

  • bit fixed

bit fixed-

  • point TI C54 DSP for

point TI C54 DSP for GSM coding, radio interface, GSM coding, radio interface, … …

  • 32

32-

  • bit ARM7 in Bluetooth module

bit ARM7 in Bluetooth module

  • + maybe ARM7 in IRDA interface

+ maybe ARM7 in IRDA interface

  • All in custom chips

All in custom chips

  • Software is large:

Software is large:

  • 16 MB of code in control part

16 MB of code in control part

  • Plus signal processing code

Plus signal processing code

14 Nov 2003 Embedded Computer Architecture 14

Autom Automotive

  • tive
  • Multiple networks

Multiple networks

  • CAN for body

CAN for body electronics: 30+ nodes electronics: 30+ nodes

  • CAN for engine control:

CAN for engine control: few nodes few nodes

  • LIN for instruments

LIN for instruments

  • Many processors

Many processors

  • Up to 100

Up to 100

  • Large diversity in processor types:

Large diversity in processor types:

  • 8

8-

  • bit CPUs (PIC, HC08) for door locks, lights, etc.

bit CPUs (PIC, HC08) for door locks, lights, etc.

  • 16

16-

  • bit CPUs (C167, HC11, HC12) for most functions

bit CPUs (C167, HC11, HC12) for most functions

  • 32

32-

  • bit CPUs (PPC,V850) for engine control, airbags

bit CPUs (PPC,V850) for engine control, airbags

  • Total amount of code: 40

Total amount of code: 40-

  • 50 MB

50 MB

14 Nov 2003 Embedded Computer Architecture 15

Automotive Automotive

  • Form follows function

Form follows function

  • Processing where the action is

Processing where the action is

  • Architecture given by application

Architecture given by application

  • Sensors and actuators distributed

Sensors and actuators distributed

  • Heterogeneous systems

Heterogeneous systems

  • Many

Many different makes of different makes of CPUs CPUs

  • Standardized

Standardized at the at the network network/bus /bus

14 Nov 2003 Embedded Computer Architecture 16

Timing Aspects Timing Aspects

  • Interrupt latency

Interrupt latency

  • Important criterion for embedded

Important criterion for embedded

  • A few clock cycles at most

A few clock cycles at most

  • Measure of RTOS performance

Measure of RTOS performance

  • Real

Real-

  • Time = predictability

Time = predictability

  • In

In-

  • order pipelines
  • rder pipelines
  • SRAM instead of caches

SRAM instead of caches

  • Lockable caches

Lockable caches

  • Several small CPUs instead of one big

Several small CPUs instead of one big

slide-5
SLIDE 5

14 Nov 2003 Embedded Computer Architecture 17

Military Military Sh Shipboard ipboard

Standard multiprocessor UltraSparc servers for radar, target tracking, combat control, … Many CPUs in missiles, gun controls, engines, …

14 Nov 2003 Embedded Computer Architecture 18

Mobile Phone Base Station Mobile Phone Base Station

  • Handle signals

Handle signals

  • Data streams to and from

Data streams to and from phones phones

  • Massively parallel system

Massively parallel system

  • Thousands of DSP tasks

Thousands of DSP tasks

  • Perfect parallel scalability

Perfect parallel scalability

  • Custom or standard

Custom or standard DSPs DSPs

  • Up to 8

Up to 8 DSPs DSPs on a single chip

  • n a single chip

14 Nov 2003 Embedded Computer Architecture 19

Trends Trends

  • Hardware to software

Hardware to software

  • Increase flexibility, lower cost

Increase flexibility, lower cost

  • Software on fast processor can equal HW

Software on fast processor can equal HW

  • Software to hardware

Software to hardware

  • Better power consumption & performance

Better power consumption & performance

  • Design custom hardware for application

Design custom hardware for application

  • Hardware

Hardware-

  • software

software codesign codesign

  • Delay division HW/SW to late in project

Delay division HW/SW to late in project

  • Obtain

Obtain “ “optimal

  • ptimal”

” HW/SW division HW/SW division

14 Nov 2003 Embedded Computer Architecture 20

On-chip bus

System System-

  • on
  • n-
  • a

a-

  • chip

chip

  • Integration

Integration extreme extreme

  • Thanks to modern

Thanks to modern semiconductors semiconductors

  • Entire product

Entire product

  • n a chip
  • n a chip
  • One or more

One or more processors, processors, accelerators, accelerators, … …

DSP LCD driver CPU Bluetooth GSM Radio Code memory Data mem

slide-6
SLIDE 6

14 Nov 2003 Embedded Computer Architecture 21

Embedded Embedded Proc Processing essing

14 Nov 2003 Embedded Computer Architecture 22

Microcontrollers Microcontrollers

  • Classic embedded hardware

Classic embedded hardware

  • Standard parts

Standard parts

  • Quite broad application domains

Quite broad application domains

  • Sold in large series

Sold in large series

  • Defined by hardware vendors

Defined by hardware vendors

  • As cheap as a single dollar

As cheap as a single dollar

  • Single processor + devices

Single processor + devices

  • Huge number of variants

Huge number of variants

  • Usually intended for control plane

Usually intended for control plane

Microcontrollers

14 Nov 2003 Embedded Computer Architecture 23

Microcon Microcontroller troller

  • A single chip:

A single chip:

  • CPU Core

CPU Core

  • Integrated memory

Integrated memory

  • Integrated peripherals

Integrated peripherals

  • Integrated services

Integrated services

  • Goal:

Goal:

  • System on one chip

System on one chip

  • No external HW

No external HW

  • Fit application

Fit application “ “perfectly perfectly” ”

CPU Core RAM (small) ROM (big)

UART A/D Timer LCD D

Outside World

14 Nov 2003 Embedded Computer Architecture 24

Microcontroller Microcontroller

  • CPU

CPU Bitness Bitness: 4 to 64 bits : 4 to 64 bits

  • Most common: 8 bit (4G units)

Most common: 8 bit (4G units)

  • 32

32-

  • bit growing fastest

bit growing fastest

  • 32/64

32/64-

  • bit outnumbers desktop

bit outnumbers desktop

  • Frequency: DC to

Frequency: DC to 2 2 Ghz Ghz

  • Memory on

Memory on-

  • ch

chip ip: : 0.5 kB to 5 MB 0.5 kB to 5 MB

  • Power:

Power: mW mW (and up) (and up)

  • 1/30 to 10 instructions per cycle

1/30 to 10 instructions per cycle

slide-7
SLIDE 7

14 Nov 2003 Embedded Computer Architecture 25

Example: PIC 12CE674 Example: PIC 12CE674

  • Memory arch:

Memory arch: Harvard Harvard

  • Program memory:

Program memory: 2048 x 14 (OTP/Flash) 2048 x 14 (OTP/Flash)

  • EEPROM:

EEPROM: 16 bytes 16 bytes

  • RAM:

RAM: 128 bytes 128 bytes

  • ADC channels:

ADC channels: 4 (8 bits) 4 (8 bits)

  • I/O ports:

I/O ports: 6 6

  • Timers:

Timers: One 8 One 8-

  • bit, One WDT

bit, One WDT

  • Clock:

Clock:

  • nchip
  • nchip crystal, 10MHz

crystal, 10MHz

  • Package:

Package: 8 pins (Pentium 4: 8 pins (Pentium 4:700 700 pins) pins)

  • Cost:

Cost: < <$1.00 (Pentium 4:>$200.00) $1.00 (Pentium 4:>$200.00)

14 Nov 2003 Embedded Computer Architecture 26

Example: AT91M42800A Example: AT91M42800A

  • ARM7TDMI 32

ARM7TDMI 32-

  • bit core

bit core

  • Static design: 0 to 33

Static design: 0 to 33 Mhz Mhz

  • Memory

Memory

  • 8

8 kB kB SRAM on chip SRAM on chip

  • External memory interface, 8/16 bit interface

External memory interface, 8/16 bit interface

  • Devices

Devices

  • 6 timers

6 timers

  • 2 serial ports

2 serial ports

  • JTAG debug interface

JTAG debug interface

  • About 0.5 W power

About 0.5 W power

  • About 18 USD

About 18 USD

  • 144 Pin package

144 Pin package

  • One of 13 AT91

One of 13 AT91 variants variants

14 Nov 2003 Embedded Computer Architecture 27

Devices on the Chip Devices on the Chip

  • Interface with the world

Interface with the world

  • Digital I/O

Digital I/O

  • Analog/Digital conversion

Analog/Digital conversion

  • Digital/Analog conversion

Digital/Analog conversion

  • Communications

Communications

  • CAN networks

CAN networks

  • Ethernet networks

Ethernet networks

  • Radio

Radio

  • Serial ports (UART, USART)

Serial ports (UART, USART)

  • USB, FireWire, ...

USB, FireWire, ...

14 Nov 2003 Embedded Computer Architecture 28

Devices on the Chip Devices on the Chip

  • Timers

Timers

  • Trigger interrupts

Trigger interrupts

  • Watchdogs

Watchdogs

  • Graphics

Graphics

  • LCD drivers

LCD drivers

  • 2D/3D graphics acceleration

2D/3D graphics acceleration

  • Buses

Buses

  • On

On-

  • chip

chip: : between devices: AMBA, between devices: AMBA, … …

  • Off

Off-

  • chip: PCI,

chip: PCI, HyperTransport HyperTransport, , RapidIO RapidIO … …

slide-8
SLIDE 8

14 Nov 2003 Embedded Computer Architecture 29

ASIPs ASIPs / / ASSPs ASSPs

  • Application

Application-

  • specific

specific integrated/standard processor integrated/standard processor

  • Targeting a particular niche market

Targeting a particular niche market

  • More targeted than microcontroller

More targeted than microcontroller

  • Domain

Domain-

  • specific accelerators

specific accelerators

  • Usually more upscale

Usually more upscale

  • 32

32-

  • bit processors

bit processors

  • Multiprocessors

Multiprocessors

  • Expensive peripherals

Expensive peripherals

  • External memory assumed

External memory assumed

  • Higher performance, includes data

Higher performance, includes data-

  • plane

plane

ASIP / ASSP

14 Nov 2003 Embedded Computer Architecture 30

Example: Example: PowerQUICC PowerQUICC III III

  • Motorola

Motorola

  • Target market

Target market

  • Communications

Communications

  • Processing

Processing

  • PowerPC e500

PowerPC e500

  • 666

666-

  • 1000

1000 Mhz Mhz

  • 256

256 kB kB L2 cache L2 cache

  • Networking

Networking

  • CPM module, RISC

CPM module, RISC-

  • based microcode

based microcode

  • About 160 USD

About 160 USD

Features Capabilities

256 Multichannel HDLC (from MCC2) 2 Utopia II ATM (from FCC) 2 Ethernet 10/100/1000 3 Ethernet, 10/100 (from FCC) 4 Ethernet, 10 (from SCC) 2 Ethernet 10/100/1000 controller 1 RapidIO controller 1 PCI-X/PCI controller 1 1 DDR Memory controller 1 I2C controller 1 Serial Peripheral Interface (SPI) 2 Serial Management Controller (SMC) 2 Multi-Channel Controller (MCC2) 3 Fast Communications Controller (FCC) 4 Serial Communications Controller (SCC) 14 Nov 2003 Embedded Computer Architecture 31

Example: C167CS Example: C167CS

  • Infineon

Infineon

  • Target Market

Target Market

  • Automotive control

Automotive control

  • Processing

Processing

  • 16

16-

  • bit C16x core

bit C16x core

  • 4

4-

  • stage simple pipeline

stage simple pipeline

  • 40

40 Mhz Mhz operation

  • peration
  • 16 MB memory space,

16 MB memory space, including ROM, RAM, including ROM, RAM, devices devices

  • 144 pin package

144 pin package

  • Tolerates

Tolerates -

  • 40 C to +125 C

40 C to +125 C

  • About 25 USD

About 25 USD

1 Synchronous Serial Comms (SSC) 8 kB Extension Internal RAM (XRAM) 3 kB Fast General Internal RAM (IRAM)

Devices External Ports

32 kB ROM

Memory

1 16-bit ports from devices 8 8-bit ports from devices 2 CAN interfaces 2x16 Capture/Compare Channels 1 USART 24+8 Analog-Digital Converter Channels 1 Pulse-Width Modulator (PWM) 1 Watch-Dog Timer (WDT) 5 General-Purpose Timers (GPT) 2 CAN 2.0b controllers 14 Nov 2003 Embedded Computer Architecture 32

Example: Cisco Toaster3 Example: Cisco Toaster3

8 clusters of 2 8 clusters of 2 processors processors each each Each TMC Each TMC is a is a VLIW machine VLIW machine with 74 bit with 74 bit instructions, 2k instructions, 2k instructions in instructions in local memory local memory Total ca Total capacity: pacity: about 5 GOps, at about 5 GOps, at around 160 Mhz around 160 Mhz Two 32 Two 32-

  • bit

bit ALUs and three ALUs and three control/data control/data movement units movement units per TMC per TMC

Image from Microprocessor Report, Oct 2002

slide-9
SLIDE 9

14 Nov 2003 Embedded Computer Architecture 33

Example: Cisco Toaster3 Example: Cisco Toaster3

  • Massive

Massive multiprocessing multiprocessing

  • 16 cores on a chip

16 cores on a chip

  • 4 chips in serial

4 chips in serial

  • Routing:

Routing:

  • 10

10 Gbps Gbps

  • @ 20

@ 20 Mpackets/s Mpackets/s

  • 1000 ops per packet

1000 ops per packet passing through passing through

14 Nov 2003 Embedded Computer Architecture 34

FPGA FPGA

  • Field Programmable Gate Array

Field Programmable Gate Array

  • Reconfigurable hardware:

Reconfigurable hardware: “ “soft logic soft logic” ”

“Program Program” ” is circuit layout is circuit layout

  • Can be changed after

Can be changed after ini initial load tial load

  • Kilos to Megs of

Kilos to Megs of ” ”gates gates” ” available available

  • Competitor to

Competitor to ASICs ASICs

  • More expensive per unit,

More expensive per unit, but no start but no start-

  • up cost for manufacturing

up cost for manufacturing

  • Less flexible, slightly slower

Less flexible, slightly slower

  • Perfect for low

Perfect for low-

  • volume products

volume products

FPGA

14 Nov 2003 Embedded Computer Architecture 35

FPGA Architecture FPGA Architecture

Computation cells Computation cells

  • Programmable

Programmable function function

  • Adder, Logic

Adder, Logic funcs funcs, ... , ...

  • Memory, Registers, ...

Memory, Registers, ...

Input/Output cells Input/Output cells Interconnect Interconnect

  • Reconfigurable

Reconfigurable

  • Programmable

Programmable

14 Nov 2003 Embedded Computer Architecture 36

FPGA Architecture FPGA Architecture

  • Computation cells

Computation cells

  • Look

Look-

  • Up Table

Up Table

  • Arbitrary 4

Arbitrary 4-

  • input,

input, 1 1-

  • output function
  • utput function
  • Coarse

Coarse-

  • grained

grained

  • Lots of functionality

Lots of functionality

  • Several

Several LUTs LUTs

  • Plus flip

Plus flip-

  • flops etc.

flops etc.

  • Fine

Fine-

  • grained

grained

  • Little functionality

Little functionality

Config RAM LUT

slide-10
SLIDE 10

14 Nov 2003 Embedded Computer Architecture 37

FPG FPGA with CPU Cores A with CPU Cores

  • CPU on

CPU on-

  • board FPGA

board FPGA

  • HW accelerate critical

HW accelerate critical tasks in FPGA tasks in FPGA fab fabric ric

  • Data pumps in FPGA

Data pumps in FPGA

  • Control in CPU

Control in CPU

  • Cool new possibilities

Cool new possibilities

  • Reconfigure FPGA online

Reconfigure FPGA online

  • Adapt to workloads

Adapt to workloads

CPU

14 Nov 2003 Embedded Computer Architecture 38

Soft CPUs in FPGAs Soft CPUs in FPGAs

  • Processor in the FPGA fabric

Processor in the FPGA fabric

”Soft Soft” ” processor processor

  • Special design considerations

Special design considerations

  • Examples

Examples

  • Altera Nios

Altera Nios

  • Xilinx Microblaze

Xilinx Microblaze

  • Research projects

Research projects

  • V

Vä äster sterå ås ARM clone s ARM clone

  • Leon processor also prototyped

Leon processor also prototyped

14 Nov 2003 Embedded Computer Architecture 39

Examples Examples

  • Altera Apex 20kC

Altera Apex 20kC

“Volume Volume” ”

  • 30

30k to 1.5M gates k to 1.5M gates

  • Xilinx

Xilinx Virtex Virtex II II: :

“High High-

  • end

end” ”

  • 1

1-

  • 4 PPC405 cores

4 PPC405 cores (optional) (optional)

  • 10M gates

10M gates

  • Price at about $1000

Price at about $1000

  • Altera

Altera Stratix Stratix

“Advanced Advanced” ”

  • 10

10 Mbit Mbit RAM RAM

  • 28 DSP elements

28 DSP elements

  • 100000 LE

100000 LE

  • 1300 user I/O pins

1300 user I/O pins

  • Optimized for

Optimized for Nios Nios

  • ATMEL FPSLIC:

ATMEL FPSLIC:

“Low Low-

  • end

end” ”

  • AVR 8

AVR 8-

  • bit CPU

bit CPU

  • 50

50k k gates gates

14 Nov 2003 Embedded Computer Architecture 40

C Case Study: ase Study: ARM ARM 1026EJ 1026EJ-

  • S

S

slide-11
SLIDE 11

14 Nov 2003 Embedded Computer Architecture 41

Overview Overview

14 Nov 2003 Embedded Computer Architecture 42

The Basics: The Basics: ARM1026EJ ARM1026EJ-

  • S

S

  • Not a stand

Not a stand-

  • alone processor

alone processor

  • For integration in your own chips

For integration in your own chips

  • Processor package:

Processor package:

  • CPU core

CPU core

  • C

Caches aches, configurable in size , configurable in size

  • Tightly

Tightly-

  • coupled memories, configurable

coupled memories, configurable in size in size

  • Bus interface

Bus interface

  • MMU (supports WinCE, Symbian, etc.)

MMU (supports WinCE, Symbian, etc.)

14 Nov 2003 Embedded Computer Architecture 43

Business Model Business Model

  • Sold as an

Sold as an IP Core IP Core

  • IP =

IP = “ “Intellectual Property Intellectual Property” ”

  • Not a physical chip, just a design

Not a physical chip, just a design

”Source code component Source code component” ”

  • Similar in scope to classic processor

Similar in scope to classic processor

  • For integration in

For integration in ASIC ASICs s

  • ASIC = Application

ASIC = Application-

  • specific

specific integrated circuit integrated circuit

14 Nov 2003 Embedded Computer Architecture 44

ASICs ASICs

  • Fully custom chips

Fully custom chips

  • Custom for your application

Custom for your application

  • As small or large as necessary

As small or large as necessary

  • Characteristics

Characteristics

  • Expensive to develop

Expensive to develop

  • 10s of engineers, often 100s

10s of engineers, often 100s

  • Large series necessary to pay off

Large series necessary to pay off

  • At least 100 000 units necessary on average

At least 100 000 units necessary on average

  • Mostly for large companies

Mostly for large companies

  • To streamline: build from

To streamline: build from IP blocks IP blocks

slide-12
SLIDE 12

14 Nov 2003 Embedded Computer Architecture 45

IP Blocks IP Blocks

  • IP

IP

  • Hardware components

Hardware components

  • Integrated on chip by

Integrated on chip by customer customer

  • Examples:

Examples:

  • CPU Cores

CPU Cores

  • Memory

Memory

  • Buses

Buses

  • Network interfaces

Network interfaces

  • Accelerator circuits

Accelerator circuits

On-chip bus DSP LCD driver CPU Bluetooth GSM Radio Code memory Data mem

14 Nov 2003 Embedded Computer Architecture 46

CPU Cores CPU Cores

  • The biggest

The biggest “ “IP IP” ” business business

“Fabless Fabless” ” ch chip ip companies companies

  • Biggest players:

Biggest players:

  • ARM (best

ARM (best-

  • selling 32

selling 32-

  • bit

bit architecture architecture) )

  • MIPS (and its licensees)

MIPS (and its licensees)

  • Crowded field

Crowded field

  • New companies appear monthly

New companies appear monthly

  • Niched components can find a market

Niched components can find a market

14 Nov 2003 Embedded Computer Architecture 47

Component Styles Component Styles

  • Hard IP:

Hard IP:

  • Tied to a particular fab process

Tied to a particular fab process

  • Like IBM 0.13u Cu, TSMC 0.18, etc.

Like IBM 0.13u Cu, TSMC 0.18, etc.

  • Black box to customer

Black box to customer

  • Synthesizable IP:

Synthesizable IP:

  • Source code for compilation by customer

Source code for compilation by customer

  • Offers configuration options like cache sizes, TCMs

Offers configuration options like cache sizes, TCMs

  • MIPS 24k, ARM 9S, 1026S, 1136S

MIPS 24k, ARM 9S, 1026S, 1136S

  • Soft IP:

Soft IP:

  • Get full source code for the component

Get full source code for the component

  • Purpose is to customize heavily

Purpose is to customize heavily

  • ARC

ARC ARCtangent 5, ARCtangent 5, Ten Tensilica Xtensa V silica Xtensa V

14 Nov 2003 Embedded Computer Architecture 48

Synthesizable Vs Hard IP Synthesizable Vs Hard IP

  • Synthesizable

Synthesizable

+ + Use any process

Use any process

+ + Use any fab

Use any fab

+ + Customize details

Customize details

+ + Customize chips

Customize chips

+ + Add instructions

Add instructions

  • Slower memories

Slower memories

  • Higher power

Higher power

  • Lower

Lower performance performance

  • Hard IP

Hard IP

+ + Optimized layout

Optimized layout

+ + Small area

Small area

+ + Low power

Low power

+ + Best performance

Best performance

  • No flexibility

No flexibility

For For best results, best results, cores need to be cores need to be redesigned to be redesigned to be synthesizable synthesizable

slide-13
SLIDE 13

14 Nov 2003 Embedded Computer Architecture 49

1026EJ 1026EJ-

  • S Core

S Core

  • 6

6-

  • stage pipeline:

stage pipeline:

  • Max clock, best case: 475 Mhz

Max clock, best case: 475 Mhz

  • Depends on process, synthesis used

Depends on process, synthesis used

  • Optimized for synthesis of core

Optimized for synthesis of core

  • Integer

Integer-

  • only
  • nly
  • Power:

Power:

  • Depends on process & configuration

Depends on process & configuration

  • Quoted numbers: 0.5mW/Mhz

Quoted numbers: 0.5mW/Mhz

  • With 16kB+16kB L1 caches

With 16kB+16kB L1 caches

  • 130 nm process at TSMC

130 nm process at TSMC

  • (Pen

(Pentium tium 4: >35 4: >35 mW/Mhz mW/Mhz) )

14 Nov 2003 Embedded Computer Architecture 50

ARM1026EJ ARM1026EJ-

  • S Pipeline

S Pipeline

Fetch Issue Decode Shift/ALU Sat Write MAC1 MAC2 LS1 LS2 LS write Static branch Static branch pre prediction (75% diction (75% accurate): uses accurate): uses less power than less power than dynamic dynamic Retu Return stack rn stack (single entry). (single entry). Simple but Simple but effective effective

14 Nov 2003 Embedded Computer Architecture 51

ARM1026EJ ARM1026EJ-

  • S Pipeline

S Pipeline

Fetch Issue Decode Shift/ALU Sat Write MAC1 MAC2 LS1 LS2 LS write AR ARM/Thumb/Java M/Thumb/Java decode decode A Access to ccess to coprocessors coprocessors

14 Nov 2003 Embedded Computer Architecture 52

ARM1026EJ ARM1026EJ-

  • S Pipeline

S Pipeline

Fetch Issue Decode Shift/ALU Sat Write MAC1 MAC2 LS1 LS2 LS write Register read, Register read, initialize memory initialize memory accesses accesses Evaluate Evaluate immediates immediates

slide-14
SLIDE 14

14 Nov 2003 Embedded Computer Architecture 53

ARM1026EJ ARM1026EJ-

  • S Pipeline

S Pipeline

Fetch Issue Decode Shift/ALU Sat Write MAC1 MAC2 LS1 LS2 LS write Ex Execution pipeline ecution pipeline for most integer for most integer instructions instructions Handle Handle saturated saturated arithme arithmetic tic

14 Nov 2003 Embedded Computer Architecture 54

ARM1026EJ ARM1026EJ-

  • S Pipeline

S Pipeline

Fetch Issue Decode Shift/ALU Sat Write MAC1 MAC2 LS1 LS2 LS write Execution pipeline Execution pipeline for for multiply multiply-

  • accumulate

accumulate instructions instructions

14 Nov 2003 Embedded Computer Architecture 55

ARM1026EJ ARM1026EJ-

  • S Pipeline

S Pipeline

Fetch Issue Decode Shift/ALU Sat Write MAC1 MAC2 LS1 LS2 LS write D Decoupled pipeline ecoupled pipeline for loads and stores for loads and stores 2 stage memory 2 stage memory access to support access to support slow synthesized slow synthesized memory memory

14 Nov 2003 Embedded Computer Architecture 56

Rounding Out Rounding Out

  • Configurable caches

Configurable caches

  • Typically 16kB/16kB

Typically 16kB/16kB

  • Optional

Optional TCMs TCMs

  • Memory interface

Memory interface

  • 2 x 64 bit AMBA AHB links

2 x 64 bit AMBA AHB links

  • Optional vector FP coprocessor

Optional vector FP coprocessor

  • Optional vector interrupt

Optional vector interrupt controller controller

slide-15
SLIDE 15

14 Nov 2003 Embedded Computer Architecture 57

ARM1026EJ ARM1026EJ-

  • S System

S System

ARM1026EJ-S Core I$ D$ I-TCM

VFP10 FP coprocessor

RAM D-TCM

VIC10 interrupt coprocessor ETM10RV trace/debug BIU Debug port connection 64-bit AMBA/AHB data bus for D 64-bit AMBA/AHB data bus for I

FLASH

14 Nov 2003 Embedded Computer Architecture 58

TCM TCM

  • Tightly

Tightly-

  • Coupled Memories

Coupled Memories

  • Alternative to caches

Alternative to caches

  • As fast as caches

As fast as caches

  • Programmer

Programmer-

  • controlled

controlled

  • No automatic management

No automatic management

  • Cheaper to implement

Cheaper to implement

  • More predictable in behavior

More predictable in behavior

  • Programming:

Programming:

  • In memory map

In memory map

  • Tagged like caches

Tagged like caches

TCM

14 Nov 2003 Embedded Computer Architecture 59

Instruction Sets for ARM Instruction Sets for ARM

  • Base: ARM v5

Base: ARM v5

  • 32

32-

  • bit integer

bit integer-

  • only instruction set
  • nly instruction set
  • T: thumb instruction set

T: thumb instruction set

  • 16

16-

  • bit, for smaller core size

bit, for smaller core size

  • J:

J: Jazelle Jazelle extensions extensions

  • Java support in hardware

Java support in hardware

  • Implements 140 out of 228 JVM byte codes

Implements 140 out of 228 JVM byte codes

  • E: DSP extensions

E: DSP extensions

  • Done in regular registers

Done in regular registers

  • Saturation, some more

Saturation, some more MACs MACs

14 Nov 2003 Embedded Computer Architecture 60

The ARM Instruction Set The ARM Instruction Set

  • Continuous evolution

Continuous evolution

  • Add features required by market

Add features required by market

  • RISC? Not anymore, if ever

RISC? Not anymore, if ever

  • Now at v6, in the ARM11 family

Now at v6, in the ARM11 family

  • v5, v5E in ARM9 and ARM10

v5, v5E in ARM9 and ARM10

  • V4 in old ARM7

V4 in old ARM7

  • Backwards compatibility!

Backwards compatibility!

slide-16
SLIDE 16

14 Nov 2003 Embedded Computer Architecture 61

T T: : Th Thumb umb

  • Compressed instruction set

Compressed instruction set

  • 16

16-

  • bit encoding of (parts of)

bit encoding of (parts of) 32 32-

  • bit instruction

bit instruction set set

  • Limitations in ARM

Limitations in ARM/ /Thumb: Thumb:

  • Only access to 8 registers (16

Only access to 8 registers (16 in ARM mode in ARM mode) )

  • No system operations

No system operations

  • Effect:

Effect:

  • More but smaller instructions

More but smaller instructions

  • 30% more, at half size

30% more, at half size

  • Usually some performance loss

Usually some performance loss

  • (Perform better on narrow buses)

(Perform better on narrow buses)

14 Nov 2003 Embedded Computer Architecture 62

T T: Thumb : Thumb

  • Thumb s

Thumb shrinks the code: hrinks the code:

Thumb ARM 386 8088 68020 SPARC

eqntott

10608 16768 17640 19106 20542 22256 0.63 1.00 1.05 1.14 1.23 1.33

xlisp

26388 40768 28097 29401 46746 44648 0.65 1.00 0.69 0.72 1.15 1.10

espresso

72596 109923 125686 137194 131854 142752 0.66 1.00 1.14 1.25 1.20 1.30

Source: Microprocessor Report, March 1995

14 Nov 2003 Embedded Computer Architecture 63

T2: Doing a Better Thumb T2: Doing a Better Thumb

  • ARM Thumb: fixed 16

ARM Thumb: fixed 16-

  • bit size

bit size

  • Saves 28% space compared to 32

Saves 28% space compared to 32-

  • bit ARM

bit ARM

  • Runs 20% slower than 32

Runs 20% slower than 32-

  • bit ARM

bit ARM

  • ARM Thumb 2: mixed 16/32

ARM Thumb 2: mixed 16/32

  • Brand new, arrives with ARM1156

Brand new, arrives with ARM1156

  • Saves 26% space compared to 32

Saves 26% space compared to 32-

  • bit ARM

bit ARM

  • Runs 2% slower than 32

Runs 2% slower than 32-

  • bit ARM

bit ARM

  • (Introduces some new instructions)

(Introduces some new instructions)

  • Conclusion: mixed length good!

Conclusion: mixed length good!

Source: Microprocessor Report, June 2003

14 Nov 2003 Embedded Computer Architecture 64

Why T? Why T?

  • Pushed by mobile phones

Pushed by mobile phones

  • More memory = more expensive

More memory = more expensive

  • More memory = bigger package

More memory = bigger package

  • More memory = higher power

More memory = higher power

  • More features in same memory!

More features in same memory!

  • Performance is not critical

Performance is not critical

slide-17
SLIDE 17

14 Nov 2003 Embedded Computer Architecture 65

T: Competitors T: Competitors

  • Compressed instruction sets

Compressed instruction sets

  • MIPS16e, shrunk MIPS32 ISA

MIPS16e, shrunk MIPS32 ISA

  • ARC

ARC

  • Tensilica

Tensilica

  • All

All-

  • small instruction sets

small instruction sets

  • SH family

SH family

  • Compressed code

Compressed code

  • IBM PowerPC 405 GX

IBM PowerPC 405 GX

  • Decompress when loaded into cache

Decompress when loaded into cache

14 Nov 2003 Embedded Computer Architecture 66

J: Jazelle J: Jazelle

  • Hardware Java acceleration

Hardware Java acceleration

  • Pushed by mobile phones

Pushed by mobile phones

  • Why?

Why?

  • To fix Java performance problems

To fix Java performance problems

  • SW JVM problems:

SW JVM problems:

  • Minimal clock frequency =

Minimal clock frequency = low interpreter performance low interpreter performance

  • JIT requires more memory

JIT requires more memory

14 Nov 2003 Embedded Computer Architecture 67

E: DSP Extensions E: DSP Extensions

  • A few new instructions

A few new instructions

  • Saturated arithmetic

Saturated arithmetic

  • Add, Sub,

Add, Sub,

  • Signed multiply, MAC

Signed multiply, MAC

  • 2 16

2 16-

  • bit values in one register

bit values in one register

  • 16x16

16x16

  • 32x16

32x16

  • Count leading zeroes

Count leading zeroes

  • Load/store pairs of registers

Load/store pairs of registers

  • Fairly typical

Fairly typical ” ”DSP DSP” ” additions additions

14 Nov 2003 Embedded Computer Architecture 68

Why E? Why E?

  • Enhance DSP performance

Enhance DSP performance

  • Of stand

Of stand-

  • alone ARM core

alone ARM core

  • Avoid multipro solution

Avoid multipro solution

  • Hard disk controllers, for example

Hard disk controllers, for example

slide-18
SLIDE 18

14 Nov 2003 Embedded Computer Architecture 69

E: Competition E: Competition

  • DSP

DSP-

  • in

in-

  • processor

processor

“MAC=DSP MAC=DSP” ”

  • Almost all embedded processors have it

Almost all embedded processors have it

  • No revolution in performance

No revolution in performance

  • DSP/processor hybrids

DSP/processor hybrids

  • Infineon

Infineon Tricore Tricore

  • Microchip

Microchip DSPic DSPic

  • Hard to get it right, not a big success so far

Hard to get it right, not a big success so far

  • SIMD extensions

SIMD extensions

  • More extensive additions than v5E

More extensive additions than v5E

  • Requires new functional units

Requires new functional units

  • Major performance gain possible

Major performance gain possible

14 Nov 2003 Embedded Computer Architecture 70

SIMD Extensions SIMD Extensions

  • Heavy

Heavy-

  • weight addition

weight addition

  • New functional units, registers

New functional units, registers

  • Small vector computers

Small vector computers

  • Examples:

Examples:

  • ARM SIMD extensions (in v6)

ARM SIMD extensions (in v6)

  • Motorola

Motorola Altivec Altivec

  • MIPS

MIPS

  • x86 MMX

x86 MMX-

  • SSE

SSE-

  • SSE2

SSE2-

  • 3Dnow!

3Dnow!

  • SPARC VIS

SPARC VIS

14 Nov 2003 Embedded Computer Architecture 71

SIMD Extensions SIMD Extensions

  • Target

Target

  • Motorola

Motorola

  • PPC 7455 (G4+)

PPC 7455 (G4+)

  • 1

1 Ghz Ghz

  • EEMBC

EEMBC

  • Telemark

Telemark suite suite

  • Networking suite

Networking suite

  • OOTB:

OOTB:

  • Out

Out-

  • of
  • f-
  • the

the-

  • box

box

  • OPT:

OPT:

  • Manually tuned to use

Manually tuned to use Altivec Altivec

  • Overall/Average:

Overall/Average:

  • 3

3-

  • 4 times speed up

4 times speed up can be expected can be expected

35,1 1 2 3 4 5 6 7 8 9 10 Autocorr 1 Convolution 1 Bit alloc 1 FFT 1 Viterbi 1 OSPF 1 Route 1 Packet 512

OOTB OPT

14 Nov 2003 Embedded Computer Architecture 72

ARM ARM vs vs DSP DSP

  • Despite

Despite “ “E E” ” and and “ “SIMD SIMD” ”... ...

  • Standard solution:

Standard solution:

  • Dual

Dual-

  • core setup

core setup

  • ARM core

ARM core

  • DSP core

DSP core

  • Control

Control vs vs data data

slide-19
SLIDE 19

14 Nov 2003 Embedded Computer Architecture 73

Control Control vs vs Data Data

  • Control plane:

Control plane:

  • Standard processor tasks

Standard processor tasks

  • Decision

Decision-

  • making

making

“Integer applications Integer applications” ”

  • UI of a phone, packet routing,

UI of a phone, packet routing, … …

  • Data plane:

Data plane:

  • Move or process data

Move or process data

  • Performance is key

Performance is key

  • Signal processing, multimedia,

Signal processing, multimedia, … …

  • Floating/fixed point

Floating/fixed point

14 Nov 2003 Embedded Computer Architecture 74

ARM ARM-

  • DSP: TI OMAP 5910

DSP: TI OMAP 5910

  • Texas Instruments

Texas Instruments

  • Target market

Target market

  • Data

Data-

  • intense real

intense real-

  • time

time

  • Audio, biometrics, etc.

Audio, biometrics, etc.

  • Processing

Processing

  • Dual

Dual-

  • core chip

core chip

  • ARM925T 150

ARM925T 150 Mhz Mhz

  • TI C55 DSP 150

TI C55 DSP 150 Mhz Mhz

  • Power 230

Power 230 mW mW

  • Price 32 USD

Price 32 USD

ARM shared devices ARM private devices System devices DSP shared devices DSP private devices

C55x DSP Core

24k I$ 64k data SRAM 96k instr SRAM

ARM925 CPU Core

16k I$ 8k D$ MMU

192k Shared SRAM Mem Ctrl 75 Mhz LCD Ctrl

USB 1.1 LCD controller MMC/SDcard intf camera interface keyboard interface RTC I2C 8 serial ports 3 UARTs 14 GPIO pins

  • USB 1.1

USB 1.1

  • LCD controller

LCD controller

  • MMC/

MMC/SDcard SDcard intf intf

  • camera interface

camera interface

  • keyboard interface

keyboard interface

  • RTC

RTC

  • I2C

I2C

  • 8 serial ports

8 serial ports

  • 3

3 UARTs UARTs

  • 14 GPIO pins

14 GPIO pins

14 Nov 2003 Embedded Computer Architecture 75

ARM Family: ARM Cores ARM Family: ARM Cores

ARM7

Performance Time

ARM9 ARM10 ARM11

3-stage pipe unified cache low power 5-stage pipe I/D caches

ARM9E

5-stage pipe I/D caches Java, DSP 1998 2000 2000 8-stage pipe Dynamic BP OOO-completion 550 Mhz 2002 6-stage pipe Static BP 64-bit BIU FP 1994

14 Nov 2003 Embedded Computer Architecture 76

ARM Family: Intel Chips ARM Family: Intel Chips

ARM7

Performance Time

ARM9 ARM10 ARM11 StrongARM XScale ARM9E

1995 5-stage pipe Legandary performer 2001 7-10-stage pipe Dynamic BP 800 Mhz

Intel makes chips Intel makes chips based on the Xscale; based on the Xscale; does not license the does not license the core to 3 core to 3rd

rd parties

parties Intel got this from Intel got this from Digital in 1998. Digital in 1998. A single variant, A single variant, big in PDAs. big in PDAs.

slide-20
SLIDE 20

14 Nov 2003 Embedded Computer Architecture 77

Conf Configurable igurable Instruction Instruction Sets Sets

14 Nov 2003 Embedded Computer Architecture 78

Instruction Sets: Configure Instruction Sets: Configure

  • Configurable instruction sets

Configurable instruction sets

  • Adapt to needs of application

Adapt to needs of application

  • User can specialize the processor

User can specialize the processor

  • Less waste on generality

Less waste on generality

  • Fast evolution of instruction sets

Fast evolution of instruction sets

  • Traditionally:

Traditionally:

  • Chip manufacturers determine

Chip manufacturers determine instruction sets aimed at some niche instruction sets aimed at some niche

  • Slow evolution of instruction sets

Slow evolution of instruction sets

14 Nov 2003 Embedded Computer Architecture 79

Instruction Sets: Configure Instruction Sets: Configure

  • Subset

Subsetting ting

  • There is a limited and predefined set of

There is a limited and predefined set of instructions available instructions available

  • Easy to compile for: restrict code

Easy to compile for: restrict code gen gen

  • Remove instructions to simplify core

Remove instructions to simplify core

  • Addition

Addition

  • F

Freedom reedom to to invent instructions invent instructions

  • Tool chain: assembly

Tool chain: assembly, C compilers , C compilers

  • Genuine development of

Genuine development of ISAs ISAs

14 Nov 2003 Embedded Computer Architecture 80

Configurable Instruction Sets Configurable Instruction Sets

  • Tight integration:

Tight integration:

  • Add to regular pipeline

Add to regular pipeline

  • Additional functional units

Additional functional units

  • Adding fine

Adding fine-

  • grained instructions

grained instructions

  • Loose integration:

Loose integration:

  • Coprocessor interface

Coprocessor interface

  • Slower communication

Slower communication

  • Offloading of macro

Offloading of macro-

  • scale tasks

scale tasks

  • Method to invoke accelerator circuits

Method to invoke accelerator circuits

slide-21
SLIDE 21

14 Nov 2003 Embedded Computer Architecture 81

Configurability Trend Configurability Trend

  • Pioneers

Pioneers

  • Tensilica

Tensilica Xtensa Xtensa

  • Arc Arctangent

Arc Arctangent

  • Configurability as key selling point

Configurability as key selling point

  • Added to general architectures

Added to general architectures

  • MIPS:

MIPS: “ “CorExtend CorExtend” ”

  • PowerPC:

PowerPC: “ “BookE BookE ASU ASU” ”

  • Usually less tight integration

Usually less tight integration

14 Nov 2003 Embedded Computer Architecture 82

Benefit of Configurability Benefit of Configurability

  • Target

Target

  • Xtensa

Xtensa III III

  • 200

200 Mhz Mhz

  • EEMBC

EEMBC

  • Telemark

Telemark suite suite

  • Networking suite

Networking suite

  • OOTB:

OOTB:

  • Out

Out-

  • of
  • f-
  • the

the-

  • box

box

  • 25k gate core

25k gate core

  • OPT:

OPT:

  • Tuned code

Tuned code

  • 25k base core gates

25k base core gates

  • 18k extra

18k extra instr instr gates gates

  • 100k DSP

100k DSP coproc coproc

  • 37k

37k config config gates gates

  • Speedups

Speedups

Benchmark OOTB OPT

Telemark

  • verall

1 37 Autocorr 1 9 Convolution 1 1249 Bit alloc 1 34 FFT 1 24 Viterbi GSM 1 14

14 Nov 2003 Embedded Computer Architecture 83

Conf Configuration Tools iguration Tools

instruction set choices Gate and memory size counters