Towards a Reliability-aware Design Flow for Kahn Process Networks on - - PowerPoint PPT Presentation

towards a reliability aware design flow for kahn process
SMART_READER_LITE
LIVE PREVIEW

Towards a Reliability-aware Design Flow for Kahn Process Networks on - - PowerPoint PPT Presentation

Towards a Reliability-aware Design Flow for Kahn Process Networks on NoC-based Multiprocessors Onur Derin , Leandro Fiorin ALaRI, Faculty of Informatics, University of Lugano, Switzerland { derino,fiorin } @alari.ch L ubeck Feb 25, 2014


slide-1
SLIDE 1

Towards a Reliability-aware Design Flow for Kahn Process Networks on NoC-based Multiprocessors

Onur Derin, Leandro Fiorin

ALaRI, Faculty of Informatics, University of Lugano, Switzerland {derino,fiorin}@alari.ch

L¨ ubeck – Feb 25, 2014

slide-2
SLIDE 2

Outline

Introduction Fault tolerance techniques Online task remapping N-modular redundancy Case study Related Work Conclusion

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 2/24

slide-3
SLIDE 3

Introduction

As CMOS technology scales down, fault tolerance becomes more relevant. Probability of permanent faults increases with technology scaling due to

process variability age-related degradation single-event effects (e.g., single-event latchup/burnout/gate rupture) rupture of wires dielectric breakdowns corrosion

Failures are often hardly predictable and avoidable with current design methodologies. Introducing fault tolerance capabilities increases the lifetime

  • f the system
  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 3/24

slide-4
SLIDE 4

Introduction

Power wall problem: higher power consumption prohibits running at higher frequencies to increase performance. task-level parallelism is a promising solution to increase performance, but it requires suitable programming models. advances in microelectronics enable integration of billions of transistors on the same on-chip die.

higher number of heterogeneous processing and storage elements in next generation embedded platforms

bus-based or point-to-point communication do not scale, are power hungry and not predictable. Networks-on-Chip (NoCs) improve scalability, bandwidth and power efficiency. distributed memory solutions are more scalable than shared memory architectures:

Non-uniform Memory Access (NUMA) No-remote Memory Access (NORMA)

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 4/24

slide-5
SLIDE 5

Problem: Reliability-aware design flow

Design Space Explorer Application Estimators Performance Power Pareto-optimal solutions Architecture Mapping

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 5/24

slide-6
SLIDE 6

Problem: Reliability-aware design flow

Design Space Explorer Application Estimators Performance Power Pareto-optimal solutions Architecture Mapping Reliability

Reliability is introduced as a new objective into the design flow

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 5/24

slide-7
SLIDE 7

Context of our work

NoC-based multi-processor platforms as the underlying hardware platform NORMA as the memory model Kahn Process Networks (KPN) as the model of computation throughput-constrained systems fault-tolerance addressed at the software level

physical or micro-architectural solutions may be too costly for resource-constrained platforms. fault model is restricted to permanent faults in the processing elements (assumed fault tolerant interconnect and memory). single fault assumption Two fault-tolerance schemes:

Fault-aware online task remapping (OTR) N-modular redundancy at KPN level (NMR)

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 6/24

slide-8
SLIDE 8

Kahn Process Networks

a set of concurrent processes (tasks) connected via non-blocking write, blocking read FIFO channels

when running on actual platforms, channels are bounded and have blocking write semantics. Figure : A KPN example

better suited for streaming applications (e.g., image/video/audio processing) several advantages:

suitable for message passing platforms as the communication is explicitly exposed no need for a global scheduler to execute in a distributed fashion

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 7/24

slide-9
SLIDE 9

Fault-aware online task remapping (OTR)

Fault-aware online task remapping allows the system to survive in the presence of faulty processors. As the processor becomes faulty, the tasks are executed on a reduced number of fault-free processors with degraded performances.

Tile1 DCT Tile2 Q Tile3 SRC Tile4 V LE

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 8/24

slide-10
SLIDE 10

Fault-aware online task remapping (OTR)

Fault-aware online task remapping allows the system to survive in the presence of faulty processors. As the processor becomes faulty, the tasks are executed on a reduced number of fault-free processors with degraded performances.

Tile1 (faulty PE) Tile2 DCT Q Tile3 SRC Tile4 V LE

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 8/24

slide-11
SLIDE 11

Fault-aware online task remapping (OTR)

Fault-aware online task remapping allows the system to survive in the presence of faulty processors. As the processor becomes faulty, the tasks are executed on a reduced number of fault-free processors with degraded performances.

Tile1 (faulty PE) Tile2 (faulty PE) Tile3 SRC Tile4 DCT Q V LE

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 8/24

slide-12
SLIDE 12

Fault tolerant tile for OTR

Port A

Data Memory

Port B

message−passing handler DMA

router NoC

tile 3 tile 1

router NoC Port A send()/recv() parameters Port B

Processing Element Instruction Memory Network Adapter (NA)

tag

Tag Decoder Local Bus

preds/succs flushed

Initiator/Target NI

send() paramaters FLIT−out FLIT−in stall interrupt

tile 2 tile 4 tile 5

NoC router

Task Migration Hardware Self Testing Fault tolerance support

detected fault

Module NA NA NA NA NA

Figure : Fault tolerant tile for OTR support (Derin, 2013) Self-testing module (STM) detects the fault with a self-test routine Task migration hardware module (TMH) notifies the remapping manager (RM) RM calculates the new mapping by a remapping heuristic RM notifies the predecessor, successor and other tiles RM gets the tasks’ state (iterators and channel tokens) from the TMH RM transfers the tasks’ state to new tiles Migrated tasks are resumed on the new tiles

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 9/24

slide-13
SLIDE 13

Fault-aware online task remapping (OTR)

Calculating MTTF for fault-aware

  • nline task remapping

The application will not fail as long as there is at least one healthy core of all the core types required by the application. Create a fault tree given the platform specification (MNC) and profiling information (MTC

cap)

Calculate MTTF using binary decision diagrams

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 10/24

slide-14
SLIDE 14

OTR: Creating the fault tree

t1 t2 t3 t4 t1 t2 t3 t4 C1 C1, C2 C1, C2 C3 n1

C1

n2

C1

n3 n4

C2 C3

t2 n2 n1 C1 C2 n3 n2 n1 C1 n2 n1 C1 C2 n3 t3 failure t1 t4 n4 C3

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 11/24

slide-15
SLIDE 15

OTR: Calculating MTTF

The set of all the paths leading to 1s are called satisfying paths, Sat A satisfying path assigns values to nodes as 1 (ni, failure) and 0 (¯ ni, non-failure) probability that processing node ni will be failed at time t, Pni(t) the overall probability of failure (Qsys(t)) Qsys(t) =

  • si∈Sat

(

  • nj∈si

Pnj(t)

  • ¯

nk∈si

(1 − Pnk(t))) MTTFsys = ∞ Rsys(t)dt where reliability of the system, Rsys(t) = 1 − Qsys(t).

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 12/24

slide-16
SLIDE 16

N-modular redundancy at KPN level (NMR)

(b) (a) voter fork

t1

j

t2

j

t3

j

ti tk tj ti tk

Figure : TMR pattern applied to a KPN task

voter fork RISC DSP RISC DSP NPC DSP RISC DSP RISC t1

2

t3

2

t2

2

t3

n2 n3 n4 n5 n6 n7 n8 n9

t1

n1

Figure : A mapping of an application with TMR pattern onto a 3x3 NoC

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 13/24

slide-17
SLIDE 17

NMR - safe failure

Calculating MTTF

Safe failure is the failure of the system to provide checked results. The application will fail if there is

  • nly one instance left of any task

type. Create a fault tree given the application specification and mapping information (g R

t , MNT)

Calculate MTTF using binary decision diagrams

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 14/24

slide-18
SLIDE 18

NMR - safe failure: Creating the fault tree

voter fork RISC DSP RISC DSP NPC DSP RISC DSP RISC t1

2

t3

2

t2

2

t3

n2 n3 n4 n5 n6 n7 n8 n9

t1

n1

Figure : A mapping of an application with TMR pattern

  • nto a 3x3 NoC

true true true true

failure n2 n4 n2 n4 t1 fork t2 t3 t3

2

t2

2

t3

2

t1

2

t1

2

t2

2

n6 n6 voter

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 15/24

slide-19
SLIDE 19

NMR - unsafe failure

Calculating MTTF

Unsafe failure is the failure of the system to provide correct results. The application will fail if there is no instance left of any task type. Create a fault tree given the application specification and mapping information (g R

t , MNT)

Calculate MTTF using binary decision diagrams

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 16/24

slide-20
SLIDE 20

NMR - unsafe failure: Creating the fault tree

voter fork RISC DSP RISC DSP NPC DSP RISC DSP RISC t1

2

t3

2

t2

2

t3

n2 n3 n4 n5 n6 n7 n8 n9

t1

n1

Figure : A mapping of an application with TMR pattern onto a 3x3 NoC

t3 n8 failure t1

2

t2 t1 t2

2

n1 n5 n2 n4 n6 fork n5 t3

2

voter

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 17/24

slide-21
SLIDE 21

Reliability-aware mapping flow

Profiling GAMapper (reliability-aware mapping tool) Architecture Specification MNP, MPL, MNC, C, c Application Specification MTE, d TTC Pareto solutions (XNT, MTE) Analytical model MTTF, exe. time, comm. cost Apply self-checking patterns Fault rates λ

Figure : Reliability-aware mapping tool (GAMapper) based on genetic algorithms (constrained NSGAIIC)

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 18/24

slide-22
SLIDE 22

Case study (MPEG2 decoder on a 3x3 NoC)

t1 t2 e1 (1) t3 e2 (34.6) t4 e3 (34.6) t5 e4 (28.1) t6 e5 (28.1) t7 e9 (28.1) t8 e10 (28.1) t9 e6 (28.1) e7 (28.1) t10 e11 (28.1) e12 (28.1) t11 e8 (65) e13 (65) t12 e14 (15.2)

Figure : Original MPEG2 decoder

t1_1 v_e1 ev_e1_1 (1) t1_2 ev_e1_2 (1) t1_3 ev_e1_3 (1) f_ev_e1_0 ef_ev_e1_0_0 (1) t2_1 v_e2 ev_e2_1 (34.6) v_e3 ev_e3_1 (34.6) t2_2 ev_e2_2 (34.6) ev_e3_2 (34.6) t2_3 ev_e2_3 (34.6) ev_e3_3 (34.6) ef_ev_e1_0_1 (1) ef_ev_e1_0_2 (1) ef_ev_e1_0_3 (1) f_ev_e2_0 ef_ev_e2_0_0 (34.6) f_ev_e3_0 ef_ev_e3_0_0 (34.6) t3_1 v_e4 ev_e4_1 (28.1) v_e5 ev_e5_1 (28.1) t3_2 ev_e4_2 (28.1) ev_e5_2 (28.1) t3_3 ev_e4_3 (28.1) ev_e5_3 (28.1) ef_ev_e2_0_1 (34.6) ef_ev_e2_0_2 (34.6) ef_ev_e2_0_3 (34.6) f_ev_e4_0 ef_ev_e4_0_0 (28.1) f_ev_e5_0 ef_ev_e5_0_0 (28.1) t4_1 v_e9 ev_e9_1 (28.1) v_e10 ev_e10_1 (28.1) t4_2 ev_e9_2 (28.1) ev_e10_2 (28.1) t4_3 ev_e9_3 (28.1) ev_e10_3 (28.1) ef_ev_e3_0_1 (34.6) ef_ev_e3_0_2 (34.6) ef_ev_e3_0_3 (34.6) f_ev_e9_0 ef_ev_e9_0_0 (28.1) f_ev_e10_0 ef_ev_e10_0_0 (28.1) t5_1 v_e6 ev_e6_1 (28.1) t5_2 ev_e6_2 (28.1) t5_3 ev_e6_3 (28.1) ef_ev_e4_0_1 (28.1) ef_ev_e4_0_2 (28.1) ef_ev_e4_0_3 (28.1) f_ev_e6_0 ef_ev_e6_0_0 (28.1) t6_1 v_e7 ev_e7_1 (28.1) t6_2 ev_e7_2 (28.1) t6_3 ev_e7_3 (28.1) ef_ev_e5_0_1 (28.1) ef_ev_e5_0_2 (28.1) ef_ev_e5_0_3 (28.1) f_ev_e7_0 ef_ev_e7_0_0 (28.1) t7_1 v_e11 ev_e11_1 (28.1) t7_2 ev_e11_2 (28.1) t7_3 ev_e11_3 (28.1) ef_ev_e9_0_1 (28.1) ef_ev_e9_0_2 (28.1) ef_ev_e9_0_3 (28.1) f_ev_e11_0 ef_ev_e11_0_0 (28.1) t8_1 v_e12 ev_e12_1 (28.1) t8_2 ev_e12_2 (28.1) t8_3 ev_e12_3 (28.1) ef_ev_e10_0_1 (28.1) ef_ev_e10_0_2 (28.1) ef_ev_e10_0_3 (28.1) f_ev_e12_0 ef_ev_e12_0_0 (28.1) t9_1 v_e8 ev_e8_1 (65) t9_2 ev_e8_2 (65) t9_3 ev_e8_3 (65) ef_ev_e6_0_1 (28.1) ef_ev_e6_0_2 (28.1) ef_ev_e6_0_3 (28.1) ef_ev_e7_0_1 (28.1) ef_ev_e7_0_2 (28.1) ef_ev_e7_0_3 (28.1) f_ev_e8_0 ef_ev_e8_0_0 (65) t10_1 v_e13 ev_e13_1 (65) t10_2 ev_e13_2 (65) t10_3 ev_e13_3 (65) ef_ev_e11_0_1 (28.1) ef_ev_e11_0_2 (28.1) ef_ev_e11_0_3 (28.1) ef_ev_e12_0_1 (28.1) ef_ev_e12_0_2 (28.1) ef_ev_e12_0_3 (28.1) f_ev_e13_0 ef_ev_e13_0_0 (65) t11_1 v_e14 ev_e14_1 (15.2) t11_2 ev_e14_2 (15.2) t11_3 ev_e14_3 (15.2) ef_ev_e8_0_1 (65) ef_ev_e8_0_2 (65) ef_ev_e8_0_3 (65) ef_ev_e13_0_1 (65) ef_ev_e13_0_2 (65) ef_ev_e13_0_3 (65) f_ev_e14_0 ef_ev_e14_0_0 (15.2) t12_1 t12_2 t12_3 ef_ev_e14_0_1 (15.2) ef_ev_e14_0_2 (15.2) ef_ev_e14_0_3 (15.2)

Figure : NMR-ed MPEG2 decoder

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 19/24

slide-23
SLIDE 23

Case study (MPEG2 decoder on a 3x3 NoC)

Profiling data for a 15 seconds long video @ 25 fps (taken from (Thiele, 2007))

Tasks Core type t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 RISC 0.13 6.68 0.06 2.00 2.00 0.05 0.06 2.00 2.00 0.05 12.33 0.18 DSP 0.20 8.52 0.04 1.25 1.25 0.04 0.04 1.25 1.25 0.04 8.51 0.30

Execution times (in seconds) of tasks on the available core types (T CT

cap ) d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 d14 1.0 34.6 28.1 28.1 28.1 28.1 65.0 34.6 28.1 28.1 28.1 28.1 65.0 15.2

Demands (in MBps) of edges (d)

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 20/24

slide-24
SLIDE 24

Preliminary results (MPEG2 decoder on a 3x3 NoC)

5 10 15 20 25 30 35 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 computation time (s) MTTF (106 time units)

α ω γ

  • riginal (unsafe)

NMR-ed (safe) NMR-ed (unsafe) OTR 1000 2000 3000 4000 5000 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 communication cost (MBps) MTTF (106 time units)

α ω γ

  • riginal (unsafe)

NMR-ed (safe) NMR-ed (unsafe) OTR

We assume exponentially distributed faults with λRISC = 10−6 and λDSP = 10−5. α = (0.5 × 106, 11.36, 214); ω = (0.92 × 106, 14.09, 2947.4) → 84% better MTTF γ = (2.08 × 106, 11.53, 214) → better MTTF at the cost of 1.5% performance overhead

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 21/24

slide-25
SLIDE 25

Related work

Common fault tolerance techniques

Spatial redundancy (N-modular redundancy, standby sparing) Temporal redundancy (multiple executions, checkpoint/rollback)

Reliability-aware approaches

(Huang et al, 2009): task scheduling to maximize lifetime for a single-mode real-time embedded system considering aging effects

  • f MPSoCs.

(Huang and Xu, 2010): task scheduling to minimize energy consumption given a lifetime constraint. (Huang et al., 2011): customer-aware online adjustments to initial schedules in order to improve energy efficiency and lifetime. (Huang J. et al., 2011): task scheduling with spatial/temporal redundancy to improve reliability in MPSoCs.

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 22/24

slide-26
SLIDE 26

Conclusion

We proposed methods to calculate the MTTF of a system that uses

fault-aware online task remapping technique, N-modular redundancy at KPN-level.

We implemented a mapping tool that realizes a reliability-aware mapping flow. A case study showed lifetime of the system can be prolonged with OTR and NMR with small performance overheads.

Future work

extend failure definition to performance failures for OTR extend the design flow to architectural exploration (NoC size, core types, number and placement of fork-voter cores) investigating other KPN-level spatial redundancy transformations for fault tolerance incorparate workload related aging

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 23/24

slide-27
SLIDE 27

Thank you!

Questions?

  • O. Derin, L. Fiorin, ALaRI

Feb 25, 2014— ARCS/VERFE 24/24