Architecture and Design Methodology for Autonomic Systems-on-Chip - - PowerPoint PPT Presentation

architecture and design methodology for
SMART_READER_LITE
LIVE PREVIEW

Architecture and Design Methodology for Autonomic Systems-on-Chip - - PowerPoint PPT Presentation

Architecture and Design Methodology for Autonomic Systems-on-Chip (ASoC) A. Bernauer, A. Bouajila, J. Zeppenfeld, W. Stechele, O. Bringmann, A. Herkersdorf, W. Rosenstiel Universitt Tbingen Technische Universitt Mnchen FZI -


slide-1
SLIDE 1

Architecture and Design Methodology for Autonomic Systems-on-Chip (ASoC)

Universität Tübingen Technische Universität München FZI - Forschungszentrum Informatik

  • A. Bernauer, A. Bouajila, J. Zeppenfeld, W. Stechele, O. Bringmann,
  • A. Herkersdorf, W. Rosenstiel
slide-2
SLIDE 2

2

Project Reminder

Application

Requirements & Characteristics

Architecture

Characteristics

FE/AE

Parameter Selection

FE/AE

Model

Evaluation

Architecture Optimization

Functional

SoC elements

Autonomic

SoC elements

Performance Reliability Power FUNCTIONAL Layer AUTONOMIC Layer Autonomic Element Functional Element

October 7, 2010 ASoC - Architecture and Design Methodology for SoC

slide-3
SLIDE 3

4

Phase 3 Work Packages

October 7, 2010 ASoC - Architecture and Design Methodology for SoC

Demonstrator Combining XCS and LCT Estimating minimum population size Distributed XCS Run-time reliability

  • ptimization
slide-4
SLIDE 4

Distributed XCS

  • Cooperating XCS instances:

benefits for self-adaptation?

  • Simplified Cell processor,

different cooling properties for each core

  • Varying ambient temperature and activity
  • Goal: maximum performance, but no timing errors
  • Configurations

– Topology: uni-, bidirectional, complete graph – Emigration strategy: random, numerosity, prediction error – Deletion strategy: by fitness, prediction error – Don’t care probability: 0.1, 0.3, 0.6, 0.9

October 7, 2010 ASoC - Architecture and Design Methodology for SoC 5

slide-5
SLIDE 5

Isolated XCS

October 7, 2010 ASoC - Architecture and Design Methodology for SoC 6

P#=0.3 P#=0.9 SXU2

slide-6
SLIDE 6

Distributed XCS

October 7, 2010 ASoC - Architecture and Design Methodology for SoC 7

P#=0.3 P#=0.9 Topology: complete Emigration: selecting by fitness Deletion: selecting by prediction error SXU2

slide-7
SLIDE 7

8

Phase 3 Work Packages

October 7, 2010 ASoC - Architecture and Design Methodology for SoC

Combining XCS and LCT

slide-8
SLIDE 8

Combining XCS and LCT

October 7, 2010 ASoC - Architecture and Design Methodology for SoC 9

Design-time rule set Action Fitness update Rule set update Initial rule set Run-time rule set Action Fitness update Rule set update

Rule-set translation HW SW Design-time learning (XCS) Software Run-time learning (LCT) Hardware

Goals:

  • retain capability to self-adapt to unforeseen events
  • low hardware overhead

[BICC10]

slide-9
SLIDE 9

Configurations and benchmarks

  • Rule translation

– all-XCS: translate all XCS rules to LCT rules – top-XCS: translate only top XCS rules

  • Action selection

– roulette-wheel: randomly, reward-weighted – winner-takes-all: highest reward

  • Benchmarks

– Multiplexer – Task allocation – SoC component parameterization

October 7, 2010 ASoC - Architecture and Design Methodology for SoC 10

0 1 2 5 4 3 6 7 8

task allocation

  • n 9 cores

[BICC10]

slide-10
SLIDE 10

Task allocation benchmark

October 7, 2010 ASoC - Architecture and Design Methodology for SoC 11 L: Number of cores i: Number of tasks

[BICC10]

slide-11
SLIDE 11

Task allocation benchmark

October 7, 2010 ASoC - Architecture and Design Methodology for SoC 12

L: Number of cores i: Number of tasks

Single core failure in LCT Double core failure in LCT  Self-adaptation at chip level with all-XCS rule translation and winner-takes-all action selection strategy [BICC10]

slide-12
SLIDE 12

13

Phase 3 Work Packages

October 7, 2010 ASoC - Architecture and Design Methodology for SoC

Estimating minimum population size

slide-13
SLIDE 13

Estimating minimum population size N

  • State of the art: Calculate N for “regular” problems

– “Regular” problem: constant number of relevant bits per problem instance – Example: Multiplexer problem 01 1101  0 (3 relevant bits)

  • Our extension: Calculate N for “complex” problems

– “Complex” problem: variable number of relevant bits per problem instance – Example: 2-out-of-4 task allocation problem allocation possible 1100  2 (2 relevant bits) allocation impossible 1110  0 (3 relevant bits)

October 7, 2010 ASoC - Architecture and Design Methodology for SoC 14

[IJCNN10]

slide-14
SLIDE 14

Estimating minimum population size N

October 7, 2010 ASoC - Architecture and Design Methodology for SoC 15

[IJCNN10]

Experimental results: → Estimated N is an upper bound; Performance penalty if using smaller N

(e.g., when using only 0.75N, only 70% of the problem instances have correctness rate >90%)

slide-15
SLIDE 15

Trading classifiers for accuracy

October 7, 2010 ASoC - Architecture and Design Methodology for SoC 16

  • Idea: subsume classifiers

during rule translation, after learning

  • Higher correctness rate

than when subsuming during learning (SASO [9])

  • Comparable population

size with SASO [9]

[IJCNN10]

slide-16
SLIDE 16

17

Phase 3 Work Packages

October 7, 2010 ASoC - Architecture and Design Methodology for SoC

Run-time reliability

  • ptimization
slide-17
SLIDE 17

Run-time reliability optimization

October 7, 2010 ASoC - Architecture and Design Methodology for SoC 18

Modular redundancy Series

slide-18
SLIDE 18

19

Phase 3 Work Packages

October 7, 2010 ASoC - Architecture and Design Methodology for SoC

Demonstrator

slide-19
SLIDE 19

Demonstrator – Hardware

  • Monitors

– Frequency (3 bit) – Utilization (3 bit) – Workload difference (2 bit)

  • Actuators

– Frequency (4 bit) – Task migration (1 bit)

  • Evaluator

– Learning classifier system adapted for efficient HW implementation

  • Communicator

– Sharing of global information – Migration of tasks

MEM UART MAC Core1 Core2 Core3 Bus Condition Action Fitness 1 X X X 1 X 1 X X : 1001 : 1010 : 1100 11 3 15

. . . . . . . . .

Learning Classifier Table Fitness Update

October 7, 2010 ASoC - Architecture and Design Methodology for SoC 20

slide-20
SLIDE 20

MEM UART MAC Bus Condition Action Fitness 1 X X X 1 X 1 X X : 1001 : 1010 : 1100 11 3 15

. . . . . . . . .

Learning Classifier Table Fitness Update

Demonstrator – Tasks

  • Test tasks 1-N

– Adjustable, synthetic workloads – After completion, pass data to next task

  • User interface

– Allow for user interaction

  • Data generation

– Replaces packet reception for standalone demonstration – Generates new data packet every 100µs T1 T2 T3 T4 T5

October 7, 2010 ASoC - Architecture and Design Methodology for SoC 21

Core1 Core2 Core3

  • UART

– Pass buffered output data to the UART

slide-21
SLIDE 21

Demonstration

slide-22
SLIDE 22

Hardware Overheads

Flip-Flops LUTs BRAMs Mult. Overhead Leon3 1749 8936 28 1 – Leon3 AE 2122 10213 29 2 14.3% LCT 66 116 1 1 1.4% Act Task. 57 299 3.5% Act Freq. 7 19 0.2% Mon Util. 35 74 0.8% Mon Load 20 40 0.5% AE IF 173 399 4.5%

Synthesis results for Xilinx Virtex 4 VLX100 October 7, 2010 ASoC - Architecture and Design Methodology for SoC 23

slide-23
SLIDE 23

24

Phase 3 Progress

October 7, 2010 ASoC - Architecture and Design Methodology for SoC

  • 3rd LCS-Workshop, Tübingen, July 1-2, 2010
slide-24
SLIDE 24

Future work

  • Continue work on Autonomic Layer reliability
  • Complete dependability assessment
  • Reduce simulation time due to temperature estimation
  • Extend theoretical analysis to run-time system
  • Complete work on demonstrator

October 7, 2010 ASoC - Architecture and Design Methodology for SoC 25

slide-25
SLIDE 25

26

Summary

  • Distributed XCS, solves previously difficult to solve problems
  • Combining Software XCS with Hardware LCT

for Lightweight On-Chip Learning

  • Estimating minimum population size N
  • Trading classifiers for accuracy
  • Demonstrator runs 3 Leon3 cores, distributes tasks and

adjusts frequency autonomously

October 7, 2010 ASoC - Architecture and Design Methodology for SoC

slide-26
SLIDE 26

27

Recent Publications

  • [ARCS10] J. Zeppenfeld, A. Herkersdorf. Autonomic Workload Management for Multi-Core

Processor Systems, ARCS, Hannover, Germany, February 22-25, 2010.

  • [SORT10] J. Zeppenfeld, A. Bouajila, A. Herkersdorf, W. Stechele. Towards Scalability and

Reliability of Autonomic Systems on Chip, 1st IEEE Workshop on Self-Organizing Real-Time Systems, Carmona, Spain, May 4, 2010.

  • [IJCNN10] B. Rakitsch, A. Bernauer, O. Bringmann, W. Rosenstiel. Pruning population size in

XCS for complex problems, International Joint Conference on Neural Networks at the World Congress on Computational Intelligence (WCCI), Barcelona, Spain, July 18-23, 2010.

  • [BICC10] A. Bernauer, J. Zeppenfeld, O. Bringmann, A. Herkersdorf, W. Rosenstiel.

Combining software and hardware LCS for lightweight on-chip learning, DIPES/BICC 2010, IFIP AICT 329, p.279-290, Brisbane, Australia, September 20-23, 2010. October 7, 2010 ASoC - Architecture and Design Methodology for SoC