Lars Bauer, Jrg Henkel - 1 - Lecture time: Mi., 15.45 - 17.15 - - PowerPoint PPT Presentation

lars bauer j rg henkel
SMART_READER_LITE
LIVE PREVIEW

Lars Bauer, Jrg Henkel - 1 - Lecture time: Mi., 15.45 - 17.15 - - PowerPoint PPT Presentation

Institut fr Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2014 Lars Bauer, Jrg Henkel - 1 - Lecture time: Mi., 15.45 - 17.15 Bld. 50.34, HS -102 Homepage: http://ces.itec.kit.edu/teaching/


slide-1
SLIDE 1

Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel

Lars Bauer, Jörg Henkel

Vorlesung im SS 2014

  • 1 -
slide-2
SLIDE 2
  • 2 -
  • L. Bauer, KIT, 2014

Lecture time:

Mi., 15.45 - 17.15

  • Bld. 50.34, HS -102

Homepage:

http://ces.itec.kit.edu/teaching/ you can also find the slides from previous years there

Slides Login:

Login: “student” Passwd: “CES-Student”

Contact:

lars.bauer@kit.edu Haid-und-Neu-Str. 7

  • Bld. 07.21, Rm. 316.2 (2nd Floor!!)
slide-3
SLIDE 3
  • 3 -
  • L. Bauer, KIT, 2014

Mensa Info-Bau TFI

slide-4
SLIDE 4
  • 4 -
  • L. Bauer, KIT, 2014

Simply let me know / interrupt me

slide-5
SLIDE 5
  • 5 -
  • L. Bauer, KIT, 2014

CS Diploma:

  • Vertiefungsfach 8: Entwurf eingebetteter Systeme und

Rechnerarchitekturen

CS Master:

  • Modul: Rekonfigurierbare und Adaptive Systeme

[IN4INRAS] (3 ECTS)

  • Modul: Eingebettete Systeme: Weiterführende Themen

[IN4INESWTN] (10 ECTS)

  • Modul: Advanced Computer Architecture

[IN4INACA] (10 ECTS)

Other Study Courses (e.g. EE): ask individually

slide-6
SLIDE 6
  • 6 -
  • L. Bauer, KIT, 2014

Lectures

  • RAS
  • Low Power Design
  • Embedded Systems for

Multimedia and Image Processing

Labs

  • Entwurf eingebetteter

Systeme

  • Entwurf von eingebetteten

applikationsspezifischen Prozessoren

  • Low Power Design and

Embedded Systems

Seminars

  • Rekonfigurierbare

Eingebettete Systeme

  • Dependability in Embedded

Systems

  • Distributed Decision

Making

  • Stereo Video Processing
  • Multicore for Multimedia

Processors

  • Sensor Networks

More Info: ces.itec.kit.edu/teaching

slide-7
SLIDE 7
  • 7 -
  • L. Bauer, KIT, 2014

Note: Info on homepage is typically not up-to-date

  • If you are interested in a particular topic: better ask individually

There are nearly always SADABAMA theses or Hiwi jobs

available in the scope of reconfigurable systems

Main projects:

  • i-Core: invasive Core
  • OTERA: Online Test Strategies for Reliable Reconfigurable

Architectures

  • Compilers for reconfigurable architectures

Topics:

  • Algorithms for Runtime System, Operating System, …
  • Toolchain, Compiler, Synthesis, …
  • Architecture, Hardware Prototype, Simulation Environment, …
slide-8
SLIDE 8
  • 8 -
  • L. Bauer, KIT, 2014

Rechnerstrukturen

  • Prerequisites

Eingebettete Systeme

  • ES1: Optimierung und Synthese Eingebetteter Systeme
  • ES2: Entwurf und Architekturen für Eingebettete Systeme
  • The core topics (e.g. details about FPGA architectures)

will be recapitulated in the scope of this lecture

  • Thus, the contents of ES1 and ES2 are beneficial but not

required in full detail

slide-9
SLIDE 9
  • 9 -
  • L. Bauer, KIT, 2014

“Fine- and Coarse-Grain Reconfigurable Computing”,

  • S. Vassiliadis and D. Soudris, Springer 2007.

“Runtime adaptive extensible embedded processors – a

survey”, H. P. Huynh and T. Mitra, SAMOS, pp. 215–225, 2009.

“Reconfigurable computing: architectures and design

methods”, T.J. Todman et al., IEE Proceedings Computers & Digital Techniques, vol. 152, no. 2, pp. 193-207, 2005.

“Reconfigurable Instruction Set Processors from a

Hardware/Software Perspective”, F. Barat et al., IEEE Transactions on Software Engineering, vol. 28, no. 9, pp. 847-862, 2002.

slide-10
SLIDE 10

Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel

  • 10 -
  • 1. Introduction and Motivation:

The Demand for Adaptivity

slide-11
SLIDE 11
  • 11 -
  • L. Bauer, KIT, 2014
  • 11

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 -

Typical approach:

  • Static analysis of system

requirements (e.g. com- putational hot spots)

  • Build optimized system

Today’s requirements:

  • Increasing complexity
  • More functionality

Problem:

  • Statically chosen design

point has to match all requirements

  • Typically inefficient for

individual components (e.g. tasks or hot spots)

  • 11 -
slide-12
SLIDE 12
  • 12 -
  • L. Bauer, KIT, 2014

A rather small part of the application that

corresponds to a rather large part of the execution time

  • Also called ‘Computational Kernel’
  • Typically: inner loop
  • 80/20 rule (90/10 rule etc.)

80 20 20 80

Code Size Execution Time

slide-13
SLIDE 13
  • 13 -
  • L. Bauer, KIT, 2014

Flexibility, 1/time-to-market, … Efficiency: Mips/$, MHz/mW, Mips/area, … ASIC:

  • Non-programmable,
  • highly specialized

GPP: General pur- pose processor ASIP: Application

specific instruction set processor

  • Instruction set extension
  • parameterization
  • inclusion/exclusion of

functional blocks “Hardware solution” “Software solution”

src: Henkel, ESII

  • L. Bauer, KIT, 2014
slide-14
SLIDE 14
  • 14 -
  • L. Bauer, KIT, 2014

Video En-/Decoding Audio En-/Decoding Data (De-)Multi-

plexing

Control protocol

src: cityrockz.com

VIDEO INPUT AUDIO INPUT IR MULTIPLEXER H.223 DE-MULTIPLEXER H.223 AUDIO DECODER G.723 VIDEO DECODER H.263 / H.264 VIDEO OUPUT AUDIO OUPUT

Remote Control Mic Phone CVBS CVHS

AUDIO ENCODER G.723 VIDEO ENCODER H.263 / H.264

Digital Video Input

H.245 CONTROL H.245 CONTROL MODEM PSTN INTERFACE

Line Phone Speakers Display Screen

slide-15
SLIDE 15
  • 15 -
  • L. Bauer, KIT, 2014

2 4 6 8 10 12

I _ M E S _ M E P M V T Q _ P L T Q _ I L T Q _ C L F M C _ L M C _ C I P _ L 1 6 M D _ I 4 C A B A C C A V L C D e c _ M B g e t _ p

  • s

I D Q _ P L C A B A C _ d C A V L C _ d F M Q U P E n c Q t P r e d _ R e c

  • n

s t E D B C T C B A C S N F L P F H P F E E D R F D t F G A H 2 4 5 _ C H 2 2 3 _ M H 2 2 3 _ D M V 3 4 M

  • d

U S B M A C

Processing Functions Processing Time [%]

slide-16
SLIDE 16
  • 16 -
  • L. Bauer, KIT, 2014

src: Tensilica, Inc.: “Xtensa LC Product Brief”

Design accele-

rators for the hot spots

Connect them as

Execution Units, Register Files, and Interfaces

slide-17
SLIDE 17
  • 17 -
  • L. Bauer, KIT, 2014

src: Tensilica, Inc.: “Xtensa LC Product Brief”

Provides noticeably improved

performance after targe- ting the ma- jor hot spots

However,

performance still not suf- ficient to achieve real- time require- ments

  • More hot spots need to be

accelerated

I_ME MC_L TQ_PL

slide-18
SLIDE 18
  • 18 -
  • L. Bauer, KIT, 2014

src: Tensilica, Inc.: “Xtensa LC Product Brief”

Scalability

problem when rather many hot- pots exist

  • Note: still not

all relevant hot spots are covered

MC_L CABAC

S_ME

CAVLC FM MAC H245_C Dec_ MB V34 mod I_ME TQ_PL

slide-19
SLIDE 19
  • 19 -
  • L. Bauer, KIT, 2014

ASIPs perform well when

1. rather few hot spots need to be accelerated and 2. those hot spots are well known in advance

ASIPs are less efficient when targeting rather many

hot spots

  • All accelerators are provided statically (i.e. they require area

and consume power) even though typically just a few of them are needed at a certain time

ASIPs are less efficient when targeting unknown hot

spots

  • Even for a given application it is not necessarily clear, which

parts of it are ‘hot’ during execution as this may depend on input data (as demonstrated in the following)

slide-20
SLIDE 20
  • 20 -
  • L. Bauer, KIT, 2014

If MB_Type = P_MB

MC

Loop Over MB

Encoding Engine

Loop Over MB

ME: SA(T)D RD

MB-Type Decision (I or P) Mode Decision (for I or P)

Loop Over MB

IPRED DCT / Q DCT / HT / Q IDCT / IQ IDCT / IHT / IQ CAVLC

then else

MB Encoding Loop

In-Loop De- Blocking Filter

Iterates on MacroBlocks (M

MBs, i.e. 16x16 pixels)

2 different MB-types

different computational paths with different computational requirements

  • I-MB (spatial prediction)
  • P-MB (temporal prediction)
slide-21
SLIDE 21
  • 21 -
  • L. Bauer, KIT, 2014

I-MB P-MB Note: 16x16 MBs can be partitioned into sub- MBs, e.g. 16x8, 8x8, down to 4x4

slide-22
SLIDE 22
  • 22 -
  • L. Bauer, KIT, 2014

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301

INTRA MB in a Frame [%] Frame Number

Scene with Very High Motion Scene with Medium- to-Slow Motion Scene with High-to- Medium Motion Rafting Rugby Football

1 21 41 61 81

  • L. Bauer, KIT, 2014
slide-23
SLIDE 23
  • 23 -
  • L. Bauer, KIT, 2014

Even for a well known application it is not always clear

which parts will be ‘hot’ (e.g. according computational complexity) and thus benefit from accelerators

  • This depends on changing input data and control flow

Even more complex: multi-tasking scenarios

  • Not clear, which applications will execute at the same time
  • Not clear, which applications will execute at all (user can

download new applications)

  • This significantly increases the number of potential hot spots

hardly possible to address this with an ASIP

Systems that fulfill the demand for adaptivity may lead to

  • Better performance (absolute criteria)
  • Higher Efficiency (relative criteria e.g. performance per area etc.)
  • Lower cost (no redesign if specifications change, no overdesign to

cover all scenarios)

slide-24
SLIDE 24
  • 24 -
  • L. Bauer, KIT, 2014

Flexibility, 1/time-to-market, … Efficiency: MIPS/$, MHz/mW , MIPS/area, … ASIC:

  • Non-programmable,
  • highly specialized

GPP: General pur- pose processor ASIP: Application

specific instruction set processor “Hardware solution” “Software solution” tion ion r

Reconfigurable and Adaptive Systems

F E

  • L. Bauer, KIT, 2014
slide-25
SLIDE 25
  • 25 -
  • L. Bauer, KIT, 2014

Providing accelerators for hot spots on demand Efficient dependability/reliability and fault tolerance

  • Rather than providing static redundancy or hardened

devices, use online monitoring (BIST: Build-in Self-Test) to detect faults and use reconfigu- ration and adaptation to react accordingly

Reducing the design/development costs

  • Hardware bug fixes, hardware updates
  • Avoids hardware redesign

Shorter Time-to-market

  • The time between idea and product

Improved efficiency

  • E.g. energy reduction due to better resource utilization

So-called ‘Self-x’ properties (explained in the following)

slide-26
SLIDE 26
  • 26 -
  • L. Bauer, KIT, 2014

The ability to determine

and establish feasible/ good setups

  • Composed out of

predetermined elements

  • Or created from scratch

(online-synthesis)

  • Or implicitly created

(emergent behavior)

src: Stargate; yehppael.com

slide-27
SLIDE 27
  • 27 -
  • L. Bauer, KIT, 2014

The ability to modify/

improve the system setup towards ma- ximizing a certain cost function (e.g. performance, energy saving, or efficiency)

The cost function is

not necessarily fixed, but it may vary, de- pending on external requirements, goals etc.

src: M. C. Escher

slide-28
SLIDE 28
  • 28 -
  • L. Bauer, KIT, 2014

The ability to resist,

tolerate, or correct certain faults

It is not necessarily

required to explicitly detect them

It is not necessarily

required to operate with the same performance, efficiency etc. as before the fault

  • Graceful degradation

src: T-1000; movie-infos.net src: T-800; spill.com src: T-1000; geekologie.com

slide-29
SLIDE 29
  • 29 -
  • L. Bauer, KIT, 2014

Techniques for (Self-) Reconf.

  • How to use/develop/reconfigure

accelerators

  • Optimizations (compile time/run time)

Different flavors of reconfigurable

processors

  • Basic systems
  • Highly efficient/adaptive systems
  • Online synthesis

New Technologies for reconfi-

gurable devices and innovative products

Improving system reliability by

reconfiguration

src: Mars Rover, newscientist.com src: CERN, nytimes.com