Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - - - PDF document

reconfigurable and adaptive systems ras
SMART_READER_LITE
LIVE PREVIEW

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - - - PDF document

Institut fr Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2012 Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - 1 - Organisation Lecture time: Mon., 14.00 - 15.30 Bld.


slide-1
SLIDE 1

Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel

Lars Bauer, Jörg Henkel

Vorlesung im SS 2012

Reconfigurable and Adaptive Systems (RAS)

  • 1 -
  • 2 -
  • L. Bauer, CES, KIT, 2012

Lecture time:

Mon., 14.00 - 15.30

  • Bld. 50.34, HS -101

Homepage:

http://ces.itec.kit.edu/ Teaching

Slides Login:

Login: “student” Passwd: “CES-Student”

Contact:

lars.bauer@kit.edu Haid-und-Neu-Str. 7

  • Bld. 07.21, Rm. 316.2 (2nd Floor!!)

Organisation

slide-2
SLIDE 2
  • 3 -
  • L. Bauer, CES, KIT, 2012

CES @ Technologiefabrik (TFI)

Mensa Info-Bau TFI

  • 4 -
  • L. Bauer, CES, KIT, 2012

CS Diploma:

  • Vertiefungsfach 8: Entwurf eingebetteter Systeme und

Rechnerarchitekturen

CS Master:

  • Modul: Rekonfigurierbare und Adaptive Systeme

[IN4INRAS] (3 ECTS)

  • Modul: Eingebettete Systeme: Weiterführende Themen

[IN4INESWT] (8 ECTS)

  • Modul: Advanced Computer Architecture

[IN4INACA] (10 ECTS)

Other Study Courses (e.g. EE): ask individually

RAS Examine

slide-3
SLIDE 3
  • 5 -
  • L. Bauer, CES, KIT, 2012

Lectures

  • RAS
  • Low Power Design

Labs

  • Entwurf eingebetteter Systeme
  • Entwurf von eingebetteten

applikationsspezifischen Prozessoren

  • Software-Entwicklung

Seminars

  • Distributed Decision Making
  • Organic Computing
  • Stereo Video Processing
  • Multicore for Multimedia

Processors

  • Embedded Multimedia
  • Wireless Sensor Networks
  • Processor Modeling at

Transaction-Level

  • Low Power Design for

Embedded Systems

  • Rekonfigurierbare

Eingebettete Systeme

  • Design Tools for Embedded

Processors

  • Dependability in Embedded

Systems

  • Dependable Embedded

Software

Teaching @ CES, SS 2012

More Info: ces.itec.kit.edu/teaching

  • 6 -
  • L. Bauer, CES, KIT, 2012

Note: Info on homepage is typically not up-to-date

  • If you are interested in a particular topic: better ask individually

There are nearly always SADABAMA theses or Hiwi jobs

available in the scope of reconfigurable systems

Main projects:

  • i-Core (invasive Core)
  • OTERA (Online Test Strategies for Reliable Reconfigurable

Architectures)

Topics:

  • Hardware Prototype
  • Simulation Environment / Algorithms for Runtime System

Examples: Fault Emulation, adaptive redundancy schemes,

  • nline monitoring, bitstream manipulation, multicore

integration, compiler tools, …

Theses @ CES

slide-4
SLIDE 4
  • 7 -
  • L. Bauer, CES, KIT, 2012

Rechnerstrukturen

  • Prerequisites

Eingebettete Systeme

  • ES1: Optimierung und Synthese Eingebetteter Systeme
  • ES2: Entwurf und Architekturen für Eingebettete Systeme
  • The core topics (e.g. details about FPGA architectures)

will be recapitulated in the scope of this lecture

  • Thus, the contents of ES1 and ES2 are beneficial but not

required in full detail

Beneficial Previous Knowledge

  • 8 -
  • L. Bauer, CES, KIT, 2012

“Fine- and Coarse-Grain Reconfigurable Computing”,

  • S. Vassiliadis and D. Soudris, Springer 2007.

“Runtime adaptive extensible embedded processors – a

survey”, H. P. Huynh and T. Mitra, SAMOS, pp. 215–225, 2009.

“Reconfigurable computing: architectures and design

methods”, T.J. Todman et al., IEE Proceedings Computers & Digital Techniques, vol. 152, no. 2, pp. 193-207, 2005.

“Reconfigurable Instruction Set Processors from a

Hardware/Software Perspective”, F. Barat et al., IEEE Transactions on Software Engineering, vol. 28, no. 9, pp. 847-862, 2002.

General Literature

slide-5
SLIDE 5

Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel

Reconfigurable and Adaptive Systems (RAS)

  • 9 -
  • 1. Introduction and Motivation:

The Demand for Adaptivity

  • 10 -
  • L. Bauer, CES, KIT, 2012
  • 10 -

Designing Embedded Systems

Typical approach:

  • Static analysis of system

requirements (e.g. com- putational hot spots)

  • Build optimized system

Today’s requirements:

  • Increasing complexity
  • More functionality

Problem:

  • Statically chosen design

point has to match all requirements

  • Typically inefficient for

individual components (e.g. tasks or hot spots)

slide-6
SLIDE 6
  • 11 -
  • L. Bauer, CES, KIT, 2012

A rather small part of the application that

corresponds to a rather large part of the execution time

  • Also called ‘Computational Kernel’
  • Typically: inner loop
  • 80/20 rule (90/10 rule etc.)

Definition ‘Computational Hot Spot’

80 20 20 80

Code Size Execution Time

  • 12 -
  • L. Bauer, CES, KIT, 2012

Flexibility, 1/time-to-market, … Efficiency: Mips/$, MHz/mW, Mips/area, … ASIC:

  • Non-programmable,
  • highly specialized

GPP: General pur- pose processor ASIP: Application

specific instruction set processor

  • Instruction set extension
  • parameterization
  • inclusion/exclusion of

functional blocks “Hardware solution” “Software solution”

src: Henkel, ESII

Typical Implementation Alternatives

slide-7
SLIDE 7
  • 13 -
  • L. Bauer, CES, KIT, 2012

Video En-/Decoding Audio En-/Decoding Data (De-)Multi-

plexing

Control protocol

Example Application: H.324 Video Conferencing

src: cityrockz.com

VIDEO INPUT AUDIO INPUT IR MULTIPLEXER H.223 DE-MULTIPLEXER H.223 AUDIO DECODER G.723 VIDEO DECODER H.263 / H.264 VIDEO OUPUT AUDIO OUPUT

Remote Control Mic Phone CVBS CVHS

AUDIO ENCODER G.723 VIDEO ENCODER H.263 / H.264

Digital Video Input

H.245 CONTROL H.245 CONTROL MODEM PSTN INTERFACE

Line Phone Speakers Display Screen

  • 14 -
  • L. Bauer, CES, KIT, 2012

Hotspots in H.324 Video Conferencing

2 4 6 8 10 12

I_ME S_ME PMV TQ_PL TQ_IL TQ_C LF MC_L MC_C IP_L16 MD_I4 CABAC CAVLC Dec_MB get_pos IDQ_PL CABAC_d CAVLC_d FM Q UP Enc Qt Pred_0 Reconst ED BC TC BA CS NF LPF HPF EE DRF Dt FGA H245_C H223_M H223_DM V34Mod USB MAC

Processing Functions Processing Time [%]

slide-8
SLIDE 8
  • 15 -
  • L. Bauer, CES, KIT, 2012

ASIP Implementation

src: Tensilica, Inc.: “Xtensa LC Product Brief”

Design accele-

rators for the hot spots

Connect them as

Execution Units, Register Files, and Interfaces

  • 16 -
  • L. Bauer, CES, KIT, 2012

ASIP Implementation (cont’d)

src: Tensilica, Inc.: “Xtensa LC Product Brief”

Provides noticeably improved

performance after targe- ting the ma- jor hot spots

However,

performance still not suf- ficient to achieve real- time require- ments

  • More hot spots need to be

accelerated

I_ME MC_L TQ_PL

slide-9
SLIDE 9
  • 17 -
  • L. Bauer, CES, KIT, 2012

ASIP Implementation (cont’d)

src: Tensilica, Inc.: “Xtensa LC Product Brief”

Scalability

problem when rather many hot- pots exist

  • Note: still not

all relevant hot spots are covered

MC_L CABAC

S_ME

CAVLC FM MAC H245_C Dec_ MB V34 mod I_ME TQ_PL

  • 18 -
  • L. Bauer, CES, KIT, 2012

ASIPs perform well when

1. rather few hot spots need to be accelerated and 2. those hot spots are well known in advance

ASIPs are less efficient when targeting rather many

hot spots

  • All accelerators are provided statically (i.e. they require area

and consume power) even though typically just a few of them are needed at a certain time

ASIPs are less efficient when targeting unknown hot

spots

  • Performance degenerates to the performance of a GPP
  • Note that even for a given application it is not necessarily

clear, which parts of it are ‘hot’ when executing as this may depend on input data (as demonstrated in the following)

ASIP Implementation (cont’d)

slide-10
SLIDE 10
  • 19 -
  • L. Bauer, CES, KIT, 2012

If MB_Type = P_MB

MC

Loop Over MB

Encoding Engine

Loop Over MB

ME: SA(T)D RD

MB-Type Decision (I or P) Mode Decision (for I or P)

Loop Over MB

IPRED DCT / Q DCT / HT / Q IDCT / IQ IDCT / IHT / IQ CAVLC

then else

MB Encoding Loop

In-Loop De- Blocking Filter

Iterates on MacroBlocks (M

MBs, i.e. 16x16 pixels)

2 different MB-types

different computational paths with different computational requirements

  • I-MB (spatial prediction)
  • P-MB (temporal prediction)

Example Application: H.264 video Encoder

  • 20 -
  • L. Bauer, CES, KIT, 2012

I-MB P-MB

Example: Football Video

Note: 16x16 MBs can be partitioned into sub- MBs down to 4x4

slide-11
SLIDE 11
  • 21 -
  • L. Bauer, CES, KIT, 2012

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301

INTRA MB in a Frame [%] Frame Number

Scene with Very High Motion Scene with Medium- to-Slow Motion Scene with High-to- Medium Motion Rafting Rugby Football

Example: Distribution of I-MBs in Medium-to-VeryHigh Motions

  • 22 -
  • L. Bauer, CES, KIT, 2012

Example: Changing Energy Con- sumption at Frame- and MB Level

1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 0.4-0.5µWs 0.3-0.4µWs 0.2-0.3µWs 0.1-0.2µWs 0.0-0.1µWs

Frame#99 Frame#100 5 10 15 20 25 30 35 20 40 60 80 100 120 140 Carphone_QCIF Clair_QCIF SusieTable_QCIF

Frame Number Energy Consumption [µWs]

slide-12
SLIDE 12
  • 23 -
  • L. Bauer, CES, KIT, 2012

Even for a well known application it is not always clear

which parts will be ‘hot’ (e.g. according computational complexity) and thus benefits from accelerators

  • This depends on changing input data and control flow

Even more complex: multi-tasking scenarios

  • Not clear, which applications will execute at the same time
  • Not clear, which applications will execute at all (user can

download new applications)

  • This significantly increases the number of hot spots

hardly possible to address this with an ASIP

Systems that fulfill the demand for adaptivity may lead to

  • Better performance (absolute criteria)
  • Lower cost (no redesign if specifications change, no overdesign to

cover all scenarios)

  • Higher Efficiency (relative criteria e.g. performance per area etc.)

Conclusion: Demand for Adaptivity

  • 24 -
  • L. Bauer, CES, KIT, 2012

Potentials of RAS

Flexibility, 1/time-to-market, … Efficiency: Mips/$, MHz/mW, Mips/area, … ASIC:

  • Non-programmable,
  • highly specialized

GPP: General pur- pose processor ASIP: Application

specific instruction set processor “Hardware solution” “Software solution” ion

  • n

r

Reconfigurable and Adaptive Systems

slide-13
SLIDE 13
  • 25 -
  • L. Bauer, CES, KIT, 2012

Providing accelerators for hot spots on demand Efficient dependability, reliability, and fault tolerance

  • Rather than providing redundancy and hardened

devices, providing online monitoring (BIST: Build- in Self-Test) to detect faults and use reconfigu- ration and adaptation to react accordingly

Reducing the design/development costs

  • Hardware bug fixes, hardware updates
  • Avoids hardware redesign

Shorter Time-to-market

  • The time between idea and product

Improved efficiency

  • E.g. energy reduction due to better resource utilization

So-called ‘Self-x’ properties (explained in the following)

Potentials of RAS (cont’d)

d d-

  • 26 -
  • L. Bauer, CES, KIT, 2012

The ability to determine

and establish feasible/ good setups

  • Composed out of

predetermined elements

  • Created from scratch

(online-synthesis)

  • Implicitly created

(emergent behavior)

Self-organisation/Self- configuration

src: Stargate; yehppael.com

slide-14
SLIDE 14
  • 27 -
  • L. Bauer, CES, KIT, 2012

The ability to modify/

improve the system setup towards ma- ximizing a certain cost function (e.g. performance, energy requirements or efficiency)

The cost function is

not necessarily fixed, but it may vary, de- pending on external requirements, goals etc.

Self-adaptation/Self-optimization

src: M. C. Escher

  • 28 -
  • L. Bauer, CES, KIT, 2012

The ability to resist,

tolerate, or correct certain faults

It is not necessarily

required to explicitly detect them

It is not necessarily

required to operate with the same performance, efficiency etc. as before the fault

  • Graceful degradation

Self-healing

src: T-1000; movie-infos.net src: T-800; spill.com src: T-1000; geekologie.com

slide-15
SLIDE 15
  • 29 -
  • L. Bauer, CES, KIT, 2012

Techniques for (Self-) Reconf.

  • How to use/develop/reconfigure

accelerators

  • Optimizations (compile time/run time)

Different flavors of reconfigurable

processors

  • Basic systems
  • Highly efficient/adaptive systems
  • Online synthesis

New Technologies for

reconfigurable devices

Improving system reliability by

reconfiguration

Innovative products / State-of-the

art research

Sneak Preview

src: Mars Rover, newscientist.com src: CERN, nytimes.com