Cyber-Physical Systems Embedded Architecture ICEN 553/453 Fall 2018 - - PowerPoint PPT Presentation

cyber physical systems embedded architecture
SMART_READER_LITE
LIVE PREVIEW

Cyber-Physical Systems Embedded Architecture ICEN 553/453 Fall 2018 - - PowerPoint PPT Presentation

Cyber-Physical Systems Embedded Architecture ICEN 553/453 Fall 2018 Prof. Dola Saha 1 Introduction to Microcontrollers 2 Introduction to Microcontrollers A microcontroller (MCU) is a small computer on a single integrated circuit


slide-1
SLIDE 1

1

Cyber-Physical Systems Embedded Architecture

ICEN 553/453– Fall 2018

  • Prof. Dola Saha
slide-2
SLIDE 2

2

Introduction to Microcontrollers

slide-3
SLIDE 3

3

Introduction to Microcontrollers

Ø A microcontroller (MCU) is a small computer

  • n a single integrated circuit consisting of a

relatively simple central processing unit (CPU) combined with peripheral devices such as memories, I/O devices, and timers.

§ By some accounts, more than half of all CPUs sold worldwide are microcontrollers. § Such a claim is hard to substantiate because the difference between microcontrollers and general- purpose processors is indistinct.

slide-4
SLIDE 4

4

Microcontrollers

Ø An Embedded Computer System on a Chip § A CPU § Memory (Volatile and Non-Volatile) § Timers § I/O Devices Ø Typically intended for limited energy usage § Low power when operating plus sleep modes Ø Where might you use a microcontroller?

slide-5
SLIDE 5

5

What is Control?

Ø Sequencing operations § Turning switches on and off Ø Adjusting continuously (or at least finely) variable

quantities to influence a process

slide-6
SLIDE 6

6

Microcontroller vs Microprocessor

Ø A microcontroller is a small computer on a single

integrated circuit containing a processor core, memory, and programmable input/output peripherals.

Ø A microprocessor incorporates the functions of a

computer’s central processing unit (CPU) on a single integrated circuit.

slide-7
SLIDE 7

7

Microcontroller vs Microprocessor

slide-8
SLIDE 8

8

Types of Processors

Ø In general-purpose computing, the variety of instruction set

architectures today is limited, with the Intel x86 architecture

  • verwhelmingly dominating all.

Ø There is no such dominance in embedded computing. On the

contrary, the variety of processors can be daunting to a system designer.

Ø Do you want same microprocessor for your watch, autonomous

vehicle, industrial sensor?

slide-9
SLIDE 9

9

How to choose micro-processors/controllers?

Ø Things that matter

§ Peripherals § Concurrency & Timing § Clock Rates § Memory sizes (SRAM & flash) § Package sizes

slide-10
SLIDE 10

10

Types of Microcontrollers

slide-11
SLIDE 11

11

DSP Processors

Ø Processors designed specifically to support numerically

intensive signal processing applications are called DSP processors, or DSPs (digital signal processors).

Ø Signal Processing Applications: interactive games; radar,

sonar, and LIDAR (light detection and ranging) imaging systems; video analytics (the extraction of information from video, for example for surveillance); driver-assist systems for cars; medical electronics; and scientific instrumentation.

slide-12
SLIDE 12

12

A Common Signal Processing Algorithm

Ø finite impulse response (FIR) filtering Ø N is the length of the filter Ø ai are tap values Ø x(n) is the input

! " = $

%&' ()*

+%,(" − /) FIR Filter Formula

slide-13
SLIDE 13

13

FIR Filter Implementation

Ø z-1 is unit delay Ø Suppose N = 4 and a0 = a1 = a2 = a3 = 1/4. Ø Then for all n ∈ N,

y(n) = (x(n) + x(n − 1) + x(n − 2) + x(n − 3))/4 .

Ø Multiply-Accumulate

Tapped delay line implementation of the FIR filter

slide-14
SLIDE 14

14

Multiply-Accumulate Instructions

Ø Digital Signal Processors provide a fast and efficient multiply-

accumulate (MAC) instruction

§ Typically including a relatively large accumulator

Ø They also typically use a Harvard memory access architecture Ø They may include auto-increment addressing modes Ø They may support circular buffer addressing

§ Efficient implementation of delay lines

Ø They may support zero-overhead loops

slide-15
SLIDE 15

15

Comparison

0.1 1 10 100 1000 Amplitude Frequency

Frequency Response Comparison

Digital Analog

slide-16
SLIDE 16

16

Digital Filter Critique

Ø The filter pole is at about ¼ of the sampling rate

§ We have only 4 samples of the impulse response § This makes the FIR filter simple: only 4 taps § This also degrades the filter performance

Ø We may be able to improve the filter performance some by

using a different design technique

§ The filter coefficients would differ

Ø A higher sampling rate with respect to the filter corner

frequency could also help

slide-17
SLIDE 17

17

FIR Filter Delay Implementation

Ø Circular Buffer

slide-18
SLIDE 18

18

Programmable Logic Controller (PLC)

Ø A microcontroller system for industrial automation § Continuous operation § Hostile environments § originated as replacements for control circuits using electrical relays to control machinery Ø PLCs are frequently programmed using ladder logic § This notation was developed to specify logic constructed with relays and switches

slide-19
SLIDE 19

19

Ladder Logic & Relays

Ø Relay is a switch where the contact is

controlled by coil.

Ø When a voltage is applied to the coil,

the contact closes, enabling current to flow through the relay.

Ø By interconnecting contacts and coils,

relays can be used to build digital controllers that follow specified patterns.

Ø Vertical Rails &

Horizontal Rungs

Ø Contact: two vertical

bars

Ø Coil: circle

slide-20
SLIDE 20

20

Example

Power Rail Ground Rail Rung 0 Rung 1 Start Run Run Motor Run Stop

slide-21
SLIDE 21

21

Example: explained

Ø Start/Run is a normally open contact Ø Stop is normally closed, indicated by the slash

§ It becomes open when the operator pushes the switch.

Ø When start is pushed, electricity flows

§ Both Start and Run contacts close so that Motor runs § When Start is released, Motor continues to run § When Stop is pressed, current is interrupted and both Run contacts become open and motor stops

Ø Contacts wired in parallel perform a logical OR function, and

contacts wired in series perform a logical AND.

slide-22
SLIDE 22

22

GPUs

Ø A graphics processing unit (GPU) is a specialized processor

designed especially to per- form the calculations required in graphics rendering.

Ø Most used for Gaming (earlier days) Ø Common programming language: CUDA

slide-23
SLIDE 23

23

Parallelism vs Concurrency

Ø Embedded computing applications typically do more than

  • ne thing “at a time.”

Ø Tasks are said to be “concurrent” if they conceptually

execute simultaneously

Ø Tasks are said to be “parallel” if they physically execute

simultaneously

§ Typically multiple servers at the same time

slide-24
SLIDE 24

24

Imperative Language

Ø Non-concurrent programs specify a sequence of

instructions to execute.

Ø Imperative Language: expresses a computation as a

sequence of operations

§ Example: C, Java Ø How to write concurrent programs in imperative

language?

§ Thread Library

slide-25
SLIDE 25

25

Program Dependency – Sequential Consistency

Ø No dependency

between lines 3 and 4

Ø Line 4 is dependent on

Line 3

slide-26
SLIDE 26

26

Thread Mapping on Processor

Ø OS Dependent Scheduler § Static Mapping § Basic Lowest Load (fill in Round Robin fashion) § Extended Lowest Load

slide-27
SLIDE 27

27

Performance Improvement

Ø Various current architectures seek to improve performance

by finding and exploiting potentials for parallel execution

§ This frequently improves processing throughput § It does not always improve processing latency § It frequently makes processing time less predictable Ø Many embedded applications rely on results being

produced at predictable regular rates

§ Embedded results must be available at the right time

slide-28
SLIDE 28

28

Parallelism

Ø Temporal Parallelism – Pipelining Ø Spatial Parallelism – § Superscalar § VLIW § Multicore

slide-29
SLIDE 29

29

RISC and CISC Architectures

Ø CISC – Complex Instruction Set Computer § Multi-clock complex instructions Ø RISC – Reduced Instruction Set Computer § Simple instructions that can be executed within one cycle

slide-30
SLIDE 30

30

5 Cycles of RISC Instruction Set

Ø Instruction fetch cycle (IF)

§ Fetch instruction from memory pointed by PC, then increment PC

Ø Instruction decode/register fetch cycle (ID)

§ Decode the instruction

Ø Execution/effective address cycle (EX)

§ ALU operates on the operands

Ø Memory access (MEM)

§ Load/Store instructions

Ø Write-back cycle (WB)

§ Register-Register ALU instruction

slide-31
SLIDE 31

31

Pipelining in RISC

PC

Instruction memory Mux Add 4 fetch decode execute memory writeback Register bank Mux ALU Decode Zero? branch taken control hazard (conditional branch) data hazard (computed branch) data memory Mux data hazard (memory read or ALU result)

slide-32
SLIDE 32

32

Simple RISC Pipeline

slide-33
SLIDE 33

33

Pipelining Hazard

Ø Data Hazard Ø Control Hazard Ø Out-of-order Execution Ø Speculative Execution

slide-34
SLIDE 34

34

Out-of-order Execution

instruction memory register bank read 1 ALU data memory A register bank read 2 register bank write A A A A A cycle 1 2 3 4 5 6 7 8 B B B B B B C C C C C C D D D D D D hardware resources: E E E E E E 9 instruction memory register bank read 1 ALU data memory A register bank read 2 register bank write A A A A A cycle 1 2 3 4 5 6 7 8 B B B B B B C C C C C C D D D D D D hardware resources: E E E E E E 9 10 11 12 interlock

Reservation Table

Reservation Table with Interlocks

slide-35
SLIDE 35

35

CISC

Ø DSPs are typically CISC machines Ø Instructions support § FIR filtering § FFTs § Viterbi decoding

slide-36
SLIDE 36

36

FIR Filter Implementation

Ø z-1 is unit delay Ø Suppose N = 4 and a0 = a1 = a2 = a3 = 1/4. Ø Then for all n ∈ N,

y(n) = (x(n) + x(n − 1) + x(n − 2) + x(n − 3))/4 .

Ø Multiply-Accumulate

Tapped delay line implementation of the FIR filter

slide-37
SLIDE 37

37

CISC Instruction

Ø Texas Instruments TMS320c54x family of DSP processors Ø Code

§ RPT numberOfTaps - 1 § MAC *AR2+, *AR3+, A

Ø RPT: zero overhead loops Ø MAC : Multiply accumulate

§ a := a + x ∗ y § AR2, AR3 are registers § A is the Accumulator

slide-38
SLIDE 38

38

VLIW Instruction Set

Ø Used for DSP, other

Embedded Applications

Ø Multiple independent

instructions per cycle, packed into single large "instruction word" or "packet"

slide-39
SLIDE 39

39

Multicore Architecture

Ø Combination of several processors in a single chip Ø Real-time and Safety critical tasks can have dedicated

processors

Ø Heterogeneous multicore § CPU and GPUs together

slide-40
SLIDE 40

40

FPGAs

Ø Field Programmable Gate Arrays § Set of logic gates and RAM blocks § Reconfigurable / Programmable § Precise timing Ø System on Chip design

Zync

slide-41
SLIDE 41

41

Fixed and Floating Point Numbers

Ø Programs may use float or double Ø Many embedded processors do not have floating point

arithmetic hardware

Ø Conversion required, which makes it slow Ø Imaginary Binary Point is considered for computation

§ Binary point separates bits § Decimal point separates digits

Ø Format x.y representation indicates § x bits left & y bits right of binary point

slide-42
SLIDE 42

42

Fixed Point Numbers

Ø

!""!". "!"$

Ø

= 1×2) + 1×2$ + 1×2+ + 1×2,- + 1×2,)

Ø

= 13.625

1 = 23 + 24×5,6 10101.101$ = 89 + 8:×2,) = 21 + 5×2,) = 21.625

m bits n bits Radix point Integer Fraction

h

A

l

A

slide-43
SLIDE 43

43

Unsigned Fixed Point Representation

Ø Example: Convert ! = 3. 141593 to unsigned fixed-point UQ4.12

format.

Ø Calculate !×2+, = 12867.964928 Ø Round the result to an integer, 01234 12867.964928

= 12868

Ø Convert the integer to binary: 12868 = 11_0010_0100_01002 Ø Organize into UQ4.12: 0011.0010_0100_01002 Ø Final result in Hex: 0x3244 Ø Error: +,565

,78 − ! = −8.5625×10;6

slide-44
SLIDE 44

44

Signed Fixed Point Representation

m bits n bits Radix point s Sign bit

! = −$×&'($×)'($ + +

,-. '()

&,×), where / = 0 + 1 + 1 3 = ! )4

slide-45
SLIDE 45

45

Signed Fixed Point Representation

Ø Example: Convert ! = −3. 141593 to signed fixed-point Q3.12 format. Ø Calculate !×2,- = −12867.964928 Ø Round the result to an integer, 12345 −12867.964928

= −12868

Ø Convert the absolute integer to binary: 12868 = 11_0010_0100_01002

(Note that the integer is represented in two’s complement.)

Ø Make the result into 16 bits: 0011_0010_0100_01002 Ø Find the two’s complement: 1100_1101_1011_11002 Ø Final result in Hex: 0xCDBC Ø Error: − ,-676

  • 89 − ! = 8.5625×10;7
slide-46
SLIDE 46

46

Addition and Subtraction

Assume UQ16.16 !

" = ! $ + ! &

' ($ = !

$×2+,

(& = !

&×2+,

(" = !

"×2+,

' !

$ = ($×2-+,

!

& = (&×2-+,

!

" = ("×2-+,

!

" = ! $ + ! &

= ($×2-+, + (&×2-+, = ($ + (& ×2-+, ("×2-+, = ($ + (& ×2-+, (" = ($ + (& !

" = ! $ − ! &

(" = ($ − (&

Subtraction Addition

slide-47
SLIDE 47

47

Multiplication

!

" = ! $×! &

= '$×2)*+ × '&×2)*+ = '$×'& ×2),- !

" = '"×2)*+

'" = '$×'& ×2)*+

slide-48
SLIDE 48

48

Law of Conservation of Bits

Ø When multiplying two x-bit numbers with formats n.m and

p.q, the result has format (n + p).(m + q)

Ø Processors might support full precision multiplications Ø Finally need to convert to x-bits to data register

slide-49
SLIDE 49

49

Fixed Point Multiplication

!

" = ! $×! &

= '$×2)*+ × '&×2)*+ = '$×'& ×2),- !

" = '"×2)*+

'" = '$×'& ×2)*+

slide-50
SLIDE 50

50

Overflow Example

Ø Multiply 0.5x0.5 Ø Fixed point representation of 0.5 = 230 Ø Result of Multiplication = 260 Ø Discard higher bits results in error Ø Remedy: Shift Right before multiply Ø Result = 0.01, interpreted as 0.25

slide-51
SLIDE 51

51

Programmers need to guard

Ø Overflow – since higher order bits are discarded Ø Truncation, if bits are chosen before operation Ø Rounding – rounds to nearest full precision after

  • peration
slide-52
SLIDE 52

52

History of ARM Processor

slide-53
SLIDE 53

53

ARM Cortex Processors

ARM Cortex-A family:

Applications processors Support OS and high-performance applications Such as Smartphones, Smart TV

ARM Cortex-R family:

Real-time processors with high performance and high reliability Support real-time processing and mission-critical control

ARM Cortex-M family:

Microcontroller Cost-sensitive, support SoC

slide-54
SLIDE 54

54

Raspberry Pi

Ø The Raspberry Pi 3 Model B+ is the latest product in Raspberry Pi range.

§ Broadcom BCM2837B0, Cortex-A53 (ARMv8) 64-bit SoC @ 1.4GHz § 1GB LPDDR2 SDRAM § 2.4GHz and 5GHz IEEE 802.11.b/g/n/ac wireless LAN, Bluetooth 4.2, BLE § Gigabit Ethernet over USB 2.0 (maximum throughput 300 Mbps) § Extended 40-pin GPIO header § Full-size HDMI

slide-55
SLIDE 55

55

Raspberry Pi

Ø The Raspberry Pi 3 Model B+ is the latest product in Raspberry Pi range.

§ CSI camera port for connecting a Raspberry Pi camera § DSI display port for connecting a Raspberry Pi touchscreen display § 4-pole stereo output and composite video port § Micro SD port for loading your

  • perating system and storing data

§ 5V/2.5A DC power input § Power-over-Ethernet (PoE) support (requires separate PoE HAT)

slide-56
SLIDE 56

56

ARM Peripherals

slide-57
SLIDE 57

57

GPIO Pins

Ø https://pinout.xyz