Conquering MPSoC Design and Architecture Complexity with - - PDF document

conquering mpsoc design and architecture complexity with
SMART_READER_LITE
LIVE PREVIEW

Conquering MPSoC Design and Architecture Complexity with - - PDF document

Technische Universitt Mnchen MPSoC Forum 2014 Margeaux, France, July 10, 2014 Conquering MPSoC Design and Architecture Complexity with Bio-Inspired Self-Organization Andreas Herkersdorf Institute for Integrated Systems, TU Mnchen


slide-1
SLIDE 1

1

Technische Universität München

Conquering MPSoC Design and Architecture Complexity with Bio-Inspired Self-Organization

Andreas Herkersdorf Institute for Integrated Systems, TU München http:/www.lis.tum.de MPSoC Forum 2014 Margeaux, France, July 10, 2014

Technische Universität München MPSoC Forum 2014 - 2 Energy Mobility Environment Health Communication Security

What topics do matter to society? How will the future look like? Source: H.-J. Bullinger, Fraunhofer Gesellschaft

  • A. Herkersdorf

Key enabling technologies: ICT Technology

The System Design Complexity Challenge

slide-2
SLIDE 2

2

Technische Universität München MPSoC Forum 2014 - 3 Energy Mobility Environment Health Communication Security

What topics do matter to society? How will the future look like? Source: H.-J. Bullinger, Fraunhofer Gesellschaft

  • A. Herkersdorf

Key enabling technologies: ICT Technology

The System Design Complexity Challenge

  • Moore’s law is the technological enabler for Billion MOSFET designs, but …
  • … how to deal with the design complexity of Billion MOSFET (MP)SoCs
  • considering implications of various forms of variability
  • and get the design and fabrication right at first time?
  • When the MPSoC design & manufacturing challenge is solved, how to …
  • program massively parallel processor components,
  • guarantee real-time, security, mixed-criticality and power constraints?
  • Progress of IC design and integration gets more and more constraint by direct
  • r indirect complexity issues
  • What alternative to established, best practice engineering approaches do we

have to tackle complexity?

Technische Universität München MPSoC Forum 2014 - 4 CPU1 CPU2 CPU3 Bus MEM Interrupt Control Video Interface

MPSoC Resilience

[Zeppenfeld08]

  • Video frames distributed among

1 – 3 operational RISC cores

  • RISC cores may arbitrarily fail …

(mimicked by purposely switch-off every few 100 ms)

  • … which is compensated by

workload redistribution and f, VDD scaling (DVFS) of remaining cores

  • Heuristic by which f, VDD are altered

will be revealed later

  • A. Herkersdorf
slide-3
SLIDE 3

3

Technische Universität München MPSoC Forum 2014 - 5

Frame Drop Rate Frequency[GHz] / Utilization[x100%]

MPSoC Resilience

  • A. Herkersdorf

The more often we switch cores off/on, the lower the frame drop rate

Technische Universität München MPSoC Forum 2014 - 6 MEM UART MAC Bus Core1 Core2 Core3

T1 T2 T3 T4 T5

MPSoC Task Mapping / Workload Balancing

[Zeppenfeld11]

  • IP packet processing is split into 5 tasks,

which are executed sequentially and initially all mapped to Core 1

  • IP flow received at MAC, sent to T1

and received from T5 for retransmit

  • Cores may issue task relocation when

in high-load condition and perform DVFS

  • Same heuristic and method applied

as in previous example of resilient video applications

slide-4
SLIDE 4

4

Technische Universität München MPSoC Forum 2014 - 7

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 s 100 us 200 us 300 us 400 us 500 us 600 us 700 us 800 us 900 us 1 ms 1100 us 1200 us 1300 us 1400 us 1500 us 1600 us 1700 us 1800 us 1900 us 2 ms 0.1 0.2 0.3 0 s 100 us 200 us 300 us 400 us 500 us 600 us 700 us 800 us 900 us 1 ms 1100 us 1200 us 1300 us 1400 us 1500 us 1600 us 1700 us 1800 us 1900 us 2 ms

Utilization Frequency (GHz)

CPU1 CPU2 CPU3

With increasing number of packets received / processed, the core utilization saturates at high, frequency at low level …

MPSoC Task Mapping / Workload Balancing

Technische Universität München MPSoC Forum 2014 - 8

10 20 30 40 50 0 s 100 us 200 us 300 us 400 us 500 us 600 us 700 us 800 us 900 us 1 ms 1100 us 1200 us 1300 us 1400 us 1500 us 1600 us 1700 us 1800 us 1900 us 2 ms

Average Latency (μs)

… and variation of IP packet forwarding latency settles at low level.

MPSoC Task Mapping / Workload Balancing

slide-5
SLIDE 5

5

Technische Universität München MPSoC Forum 2014 - 9

  • A. Herkersdorf

„Manycore“ System in Nature …

  • Fish school does not seam to have a

complexity or reliability problem

  • Entire system behaves orderly and

defends itself reliably against predators

  • How is orchestration of “fish school”

facilitated among its components?

Technische Universität München MPSoC Forum 2014 - 10

  • A. Herkersdorf

If (d < Rr ) then Repulsion: Avoid clashes Else if (d < Rp ) then Orientation: Parallel to neighbor Else if (d < Ra ) Attraction: Approach companion Endif

Nature designs and optimizes differently …

Every system constituent (fish) follows a local rule set on how to behave under stress

  • Rule set is simple, easily manageable for each constituent
  • Every constituent follows same rule set
  • Global system behavior not necessarily reflected in local rule set

[Inada02, Pathel04]

slide-6
SLIDE 6

6

Technische Universität München MPSoC Forum 2014 - 11

  • A. Herkersdorf

Self-Organization / Emergence

Local behavior of the constituents of a self-organizing system may lead to

  • bservable, emergent global behavior which is not reflected in local behavior / rules
  • Population of interacting system constituents
  • System is hierarchically structured

(multi-layer organization)

  • Emergent behavior observable at levels

above constituent level (system level or system environment) as a result of hidden causal relationships across levels

[Fromm04]

System

Technische Universität München MPSoC Forum 2014 - 12

  • A. Herkersdorf

Conway’s Game of Life

[Gardner70] If (cell alive AND N = 3 ) then live unchanged to next generation Else if (cell alive AND N < 2 OR N > 3 ) then death by loneliness or overcrowding Else if (cell dead AND N = 3) then birth of new cell in next generation End

Conway (rule 3/3) hexagonal (rule 34/2)

Constituent pattern determines system level behavior:

  • Rotary, translatory movements,
  • scillation, persistence, …
  • … in combination and with varying

parameters

slide-7
SLIDE 7

7

Technische Universität München MPSoC Forum 2014 - 13

ASoC: Autonomic System on Chip

PU CPU Network I / F CPU RAM FUNCTIONAL Layer AUTONOMIC Layer Autonomic Element Monitor Evaluator Actuator Communication I/F Functional Element

[Herkersdorf/Rosenstiel04]

Evolutionary, platform-centric approach with compatibility to SoC design method:

  • Functional layer containing conventional IPs
  • Autonomic layer extends IPs with self-x

properties

  • Improved Reliability
  • Performance / Power optimization at

runtime

Dedicate part of chip capacity to self-x abilities at autonomic layer Future SoC shall have the ability to learn to live with environmentally imposed variations or work around defects autonomously

Joint project with University of Tübingen and FZI Karlsruhe

Technische Universität München MPSoC Forum 2014 - 14 CPU1 CPU2 CPU3 Bus MEM Interrupt Control Video Interface

ASoC: Self-Optimization through Runtime Learning

  • Monitors
  • Error rate, (Temperature)
  • Frequency
  • Utilization
  • Workload
  • Actuators
  • Frequency (and voltage)
  • Task migration
  • Evaluator
  • Learning classifier system adapted for

efficient HW implementation

  • Reinforcement learning through fitness

update of individual rules

  • Communicator
  • Sharing of global information

52 3 15 1 X X X 1 X 1 X X Learning Classifier Table : 1001 : 1010 : 1100 Condition Action Fitness

. . . . . . . . .

1 X X X 1 X 1 X X Learning Classifier Table : 1001 : 1010 : 1100 Condition Action

. . . . . .

Fitness Update 52 3 15 1 X X X 1 X 1 X X Learning Classifier Table : 1001 : 1010 : 1100 Condition Action Fitness

. . . . . . . . .

  • A. Herkersdorf

[Zeppenfeld11, Holland78, Wilson95, Butz06]

slide-8
SLIDE 8

8

Technische Universität München MPSoC Forum 2014 - 15 CPU1 CPU2 CPU3 Bus MEM Interrupt Control Video Interface

ASoC: Self-Optimization through Runtime Learning

52 3 15 1 X X X 1 X 1 X X Learning Classifier Table : 1001 : 1010 : 1100 Condition Action Fitness

. . . . . . . . .

1 X X X 1 X 1 X X Learning Classifier Table : 1001 : 1010 : 1100 Condition Action

. . . . . .

Fitness Update 52 3 15 1 X X X 1 X 1 X X Learning Classifier Table : 1001 : 1010 : 1100 Condition Action Fitness

. . . . . . . . .

  • A. Herkersdorf
  • Fitness-driven rule and operation

parameter selection per core

  • Objective function to assess system-

wide impact of rule / action selection:

δLoad = |fcpu n·utilcpu n − fcpu avg·utilcpu avg| δUtil = utiltarget − utilcpu n δFreq = fcpu n OCPU = w1∙ δLoad + w2∙ δUtil + w3∙ δFreq OSys = wa∙OCPU1 + wb∙OCPU2 + …

  • Reward function:

+N 1.0 OT−1 OT OT−1 2 R(OT) −N

[Zeppenfeld11]

Technische Universität München MPSoC Forum 2014 - 16

  • Avg. packet latency

Frequency CPU 1 2 3

CPU1 CPU2 CPU3 Bus MEM Interrupt Control Video Interface

ASoC: Self-Optimization through Runtime Learning

52 3 15 1 X X X 1 X 1 X X Learning Classifier Table : 1001 : 1010 : 1100 Condition Action Fitness

. . . . . . . . .

1 X X X 1 X 1 X X Learning Classifier Table : 1001 : 1010 : 1100 Condition Action

. . . . . .

Fitness Update 52 3 15 1 X X X 1 X 1 X X Learning Classifier Table : 1001 : 1010 : 1100 Condition Action Fitness

. . . . . . . . .

  • A. Herkersdorf

MAC

CPU1 CPU3 CPU2 Task N

T1 T2 T3 T4 T5

slide-9
SLIDE 9

9

Technische Universität München MPSoC Forum 2014 - 17 CPU1 CPU2 CPU3 Bus MEM Interrupt Control Video Interface 52 3 15 1 X X X 1 X 1 X X Learning Classifier Table : 1001 : 1010 : 1100 Condition Action Fitness

. . . . . . . . .

1 X X X 1 X 1 X X Learning Classifier Table : 1001 : 1010 : 1100 Condition Action

. . . . . .

Fitness Update 52 3 15 1 X X X 1 X 1 X X Learning Classifier Table : 1001 : 1010 : 1100 Condition Action Fitness

. . . . . . . . .

MAC

ASoC: Self-Optimization through Runtime Learning

  • A. Herkersdorf
  • Avg. packet latency

Frequency CPU 1 2 3

CPU1 CPU3 CPU2 Task N

Component Behavior System Behavior Fitness separation between more and less effective rules / actions (replacing evolution in nature) Exploration / adaptation within identified rule set to possibly adopt a global optimum

Technische Universität München MPSoC Forum 2014 - 18

LCT / AE Implementierung

+

fit match fit_sum

1 0

start

×

lfsr_i

> <<

fit match act_sel fit_weight

Flip-Flops LUTs BRAMs Mult. Overhead Leon3 1749 8936 28 1 – Leon3 AE 2122 10213 29 2 14.3% LCT 66 116 1 1 1.4% Act Task. 57 299 3.5% Act Freq. 7 19 0.2% Mon Util. 35 74 0.8% Mon Load 20 40 0.5% AE IF 173 399 4.5% Synthesis results for Xilinx Virtex 4 VLX100

slide-10
SLIDE 10

10

Technische Universität München MPSoC Forum 2014 - 19

  • A. Herkersdorf

Conclusions

  • Complexity of multi-objective hardware/software systems demands push of

classical design time tasks into field operation

  • Complexity either increases through technology-enabled increases in function

integration or consideration of pressing non-functional aspects

  • Security, resilience, power dissipation, test & debug
  • Nature offers scalable, surprisingly simple

self-organization paradigms to control complex systems

  • Suggestion to study and apply emergent

behaviors on broader scope for optimization of non-function aspects of MPSoC and distributed Embedded Systems / CPS

  • Be inspired by nature, but also be aware that …

Function Reliability Power Real-Time Security Debugging

Technische Universität München MPSoC Forum 2014 - 20

  • A. Herkersdorf

„Kleiner Schlagflügelapparat“, 1893

Source: Fotoarchiv Otto-Lilienthal-Museum, 1997

… copying nature 1-on-1 not necessarily yields success!

„Der Vogelflug als Grundlage der Fliegekunst“,

  • O. Lilienthal, 1889
slide-11
SLIDE 11

11

Technische Universität München MPSoC Forum 2014 - 21

  • A. Herkersdorf

Thanks!

«All arts rely on mimicking nature.»

Seneca (1 – 65 AD), roman philosopher, scientist, politician

I particularly would like to thank our ASoC partners from University of Tübingen (W. Rosenstiel, A. Bernauer), the VirTherm-3D team from KIT Karlsruhe (J. Henkel, T. Ebi) and the team at TU Munich (A. Bouajila, H. Rauchfuss, W. Stechele, T. Wild, J. Zeppenfeld). Furthermore, sincere thanks deserve the DFG and the BMBF for supporting our work as part

  • f the SPP1183 (Organic Computing), SPP1500 (Dependable Embedded Systems) and the

Clusterforschungsprogramm AIS “Autonome Integrierte Systeme”

Technische Universität München MPSoC Forum 2014 - 22

  • A. Herkersdorf

References

[Butz06] Butz, M. V.: “Rule-based Evolutionary Online Learning Systems – A Principled approach to LCS Analysis and Design”, Springer, ISBN 978-3-540-25379-2, 2006. [Fromm04] Fromm, J.; "The Emergence of Complexity", Kassel university press, ISBN 3-89958- 069-9, 2004, URN: urn:nbn:de:0002-069. [Herkersdorf04] Herkersdorf, A., Rosenstiel, W., “Towards a Framework and Design Methododology for Autonomous SoC”, GI Workshop on Organic Computing, Ulm, Germany, September 2004 [Holland78] Holland, J.; Reitman, J.: Cognitive systems based on adaptive algorithms. In Pattern-directed inference systems, Academic Press, New York, 1978. [Inada02] Patel, J. "Characterization of Soft Errors Caused by Single Event Upsets in CMOS Processes". IEEE Trans. Dependable Secur. Comput. 1, 2 (Apr. 2004), 128-143 [Patel04] Patel, J. "Characterization of Soft Errors Caused by Single Event Upsets in CMOS Processes". IEEE Trans. Dependable Secur. Comput. 1, 2 (Apr. 2004), 128-143

slide-12
SLIDE 12

12

Technische Universität München MPSoC Forum 2014 - 23

  • A. Herkersdorf

References

[Wilson95] S. Wilson, “Classifier fitness based on accuracy”, Evolutionary Computation, 3, 1995, pp. 149-175 [Zeppenfeld08] Zeppenfeld J.; et. al.: “Learning Classifier Tables for Autonomic Systems on Chip”, Lecture Notes in Informatics vol. 134, pp.771–778, Springer; GI Annnual Meeting, Munich, September 12, 2008 [Zeppenfeld11] Zeppenfeld J.; Herkersdorf A.; "Applying Autonomic Principles for Workload Management in Multi-Core Systems on Chip", International Conference on Autonomic Computing, ICAC, Karlsruhe, Germany, June 14-18, 2011