FAUL T TOLERANCE FOR M UL TI-CORE AND M ANY-CORE PROCESSORS - - PowerPoint PPT Presentation

faul t tolerance for m ul ti core and m any core
SMART_READER_LITE
LIVE PREVIEW

FAUL T TOLERANCE FOR M UL TI-CORE AND M ANY-CORE PROCESSORS - - PowerPoint PPT Presentation

FAUL T TOLERANCE FOR M UL TI-CORE AND M ANY-CORE PROCESSORS Vanessa VARGAS PhD candidate in Nano Electronics and Nano T echnologies Universit de Grenoble Alpes - France Professor at Universidad de las Fuerzas Armadas ESPE Department of


slide-1
SLIDE 1

FAUL T TOLERANCE FOR M UL TI-CORE AND M ANY-CORE PROCESSORS

Vanessa VARGAS

PhD candidate in Nano Electronics and Nano T echnologies Université de Grenoble Alpes - France Professor at Universidad de las Fuerzas Armadas ESPE Department of Electrical and Electronics- Ecuador

slide-2
SLIDE 2

OUTLINE

Introduction M otivation Background Work Done Conclusions

2

slide-3
SLIDE 3

3

INTRODUCTION

slide-4
SLIDE 4

4

INTRODUCTION

slide-5
SLIDE 5

5

INTRODUCTION

5

Start End Task 1 Task 2 Task n

slide-6
SLIDE 6

OUTLINE

Introduction

M OTIVATION

Background Work Done Conclusions

6

slide-7
SLIDE 7

M OTIVATION

7

Many-core SUPERCOMPUTERS Top500 (June 2016) 1er de Top500 : Sunway TaihuLight - Sunway M PP , NRCPC, 93.01 Petaflops Sunway SW26010 260C 1.45GHz, Sunway NRCPC 10,649,600 cores 15.31 MW National Supercomputing Center in Wuxi China 2nd de Top500 : Thiane-2, NUDT, 33.86 Petaflops ivybridge 12c/ proc, 2.2GHz + Intel XeonPhi, 3 120 000 cores 17.81 MW TH Express-2, National University of defense technology, China

slide-8
SLIDE 8

8

M OTIVATION

In HPC systems, the use of many-core processors is crucial to satisfy the growing demand of performance and reliability without a critical increase of power consumption.

slide-9
SLIDE 9

9

M OTIVATION

This exponential growth face many challenges:

  • Limited power budget

Power

  • Fit in available floor space

Space

  • Fixed financial budget

Cost

  • Feed compute power & cost efficiently

Memory technology

  • Connect nodes power & cost efficiently

Network technology

  • S

cale to utilize the growing compute capacity

Software

  • Failure rates should not grow with machine size

RELIABILITY And others …

slide-10
SLIDE 10

10

M OTIVATION

Evaluate fault tolerance technique under radiation and fault injection campaigns. Evaluate the impact of the use of fault tolerance techniques on performance and energy consumption.

CONCERNING THE RELIABILITY

FIGURE 1. RADIATION EXPERIM ENT

slide-11
SLIDE 11

OUTLINE

Introduction M otivation

BACKGROUND

  • M ultiprocessing modes
  • Fault T
  • lerance

Work Done Conclusions

11

slide-12
SLIDE 12

M UL TI-PROCESSING M ODES

12

  • Single OS is responsible for achieving parallelism in the application.
  • It dynamically distributes the tasks among the cores, manages the
  • rganization of task completion, and controls the shared resources.

SM P

  • The cores run independently of each other, with or without OS.
  • They have their own private memory space, although there is a

common infrastructure for inter-core communications.

AM P

FIGURE 2. S

CHEM ESOF AMP AND SMP PROCESSING M ODES

slide-13
SLIDE 13

FAUL T TOLERANCE

  • Spatial Redundancy

1

  • T

emporal Redundany

2

  • Both of them

3

A system is considered as fault tolerant when facing a fault, it continues working correctly.

Fault tolerance can be obtained by redundancy.

13

slide-14
SLIDE 14

Spatial vs temporal redundancy

SPATIAL TEM PORAL

It uses different physical components It can separate identical data signals in space ADVANT AGE

  • It lacks an inherent maximum operating

frequency.

DISADVANT AGES

  • It requires more area and components.
  • Penalty in performance

Source: Radiation Effects and Soft Errors in Integrated Circuits and Electronic Devices

14

It uses the same physical components It can separate identical data signals in time ADVANT AGE

  • Fewer components.

DISADVANT AGES

  • Latency penalty.
  • It has a maximum operating frequency and

therefore not used in commercial processes faster

slide-15
SLIDE 15

FAUL T TOLERANCE IN M UL TICORE

  • T

emporal redundancy

1

  • Data value redundancy

2

  • Information redundancy for error

detection in multicore designs

3

  • Redundancy in execution

4

Taking advantage of the multiplicity of cores, various redundancy techniques can be considered.

15

For evaluating any technique it is possible to do it by fault injection or by radiation test campaigns.

slide-16
SLIDE 16

Redundancy in execution

Divergent causes are:

Asynchronous signals Nondeterministic functions (gettimeofday) In multi-core

  • Access to shared memory

The record / replay method ensures that access to shared memory is done in the same order.

16

The replication of state machine is used Replication copies of a process is performed. Copies follow the same sequence of execution and produce the same result if inputs are the same. It should ensure that redundant processes not diverge in the absence of failures.

slide-17
SLIDE 17

Redundancy in execution

Unreliable system State Machine Replication Error Checking and Recovery Reliable system

Record/ Replay Deterministic Multithreading Double Modular Redundancy with checkpoint/ rollback Triple Modular Redundancy with Fault Masking

17

slide-18
SLIDE 18

Redundancy in execution

  • by using locks, barriers and creating

threads.

  • Problem: Slow down application.

Deterministic multithreading

  • It allows error detection.

Double Modular Redundancy DMR

  • It allows error detection and

correction by a voter.

Triple Modular Redundancy TMR

18

slide-19
SLIDE 19

Redundancy in execution

Source: Hamid M ushtaq, Zaid Al-Ars, Koen Bertels “Fault T

  • lerance on M ulticore Processors using

Deterministic M ultithreading”

  • Deteministic Multithreading
  • DMR

Mixed Modelling

19

FIGURE 3. EXAM PLEOF REDUNDANCY IN EXECUTION

slide-20
SLIDE 20

OUTLINE

Introduction Motivation Background WORK DONE

  • Freescale P2041RDB
  • TM R in AM P mode
  • Fault Injection in SM P
  • Radiation T

ests in AM P y SM P mode

  • KALRAY M PPA-256 (M ulti Purpose Processing Array)
  • Fault Injection in AM P mode
  • Radiation T

ests in AM P mode

  • Fault Injection in mixed mode
  • Evaluating Fault T
  • lerance T

echnique

Conclusions

20

slide-21
SLIDE 21

21

FIGURE 4. QORIQ P2041 M EM ORY ARCHITECTURE

FREESCALE P2041

  • Power Architectures technology

Built on

  • 45nm SOI technology

M anufactured

  • four e500mc cores( 32-bit superscalar processor )

Based on

  • up to 1.5 GHz

Operation Frequency

slide-22
SLIDE 22

22

TM R in AM P mode

slide-23
SLIDE 23

FIGURE 5. F

AULT INJECTION STRATEGY IN PROCESSORREGISTER

TM R in AM P mode

slide-24
SLIDE 24

24

  • 20% of injected faults have no detectable consequences (silent faults).
  • If one SEU is injected per execution, the error rate reaches 78% and

the TM R corrects 99.99% of them.

  • On the other hand, if two SEUs are injected, the error rate reaches

93% while the error correction factor decreases to 85%. RESULTS

  • It was run 50000 times.
  • Injection of one or two

SEUs per execution. EXPERIM ENT

TM R in AM P mode

FIGURE 6. F

AUL T-INJECTION CONS EQUENCES

slide-25
SLIDE 25

25

TM R in AM P mode

FIGURE 7. F

AUL T-INJECTION CONS EQUENCESIN PROCES S ORREGIS TERS

slide-26
SLIDE 26

26

TABLE I. APPLICATIONSS

UM M ARY

FAUL T INJECTION IN SM P

slide-27
SLIDE 27

27

Two test campaigns were performed on each selected application: a) Fault injection in processor registers. b) Fault injection in memory region TABLE II. F

AULT - INJECTION CAM PAIGNS

FAUL T INJECTION IN SM P

slide-28
SLIDE 28

FIGURE 8. PROPOSED S

OFTWARE F AULT-INJECTION IN M EM ORYREGION

FAUL T INJECTION IN SM P

slide-29
SLIDE 29

29 Silent faults Result errors Exceptions Timeouts

84,38% 1,47% 0,63% 13,52% 65,39% 0,16% 34,19% 0,27% Register MM Register TSP

Silent faults Result errors Exceptions Timeouts

59,82% 2,60% 23,32% 14,25% 96,59% 0,02% 1,49% 1,92% Memory MM Memory TSP

FIGURE 9. F

AUL T-INJECTION CONS EQUENCESIN PROCES S ORREGIS TERS

FIGURE 10. F

AULT-INJECTION CONSEQUENCESIN M EM ORY REGION

These campaigns target only the private code memory: The initial process stack memory, The thread’s stacks memory, and The process’ heap memory.

FAUL T INJECTION IN APPLICATION RUNNING IN SM P

slide-30
SLIDE 30

RADIATION TES TS

30

FIGURE 11. CONSEQUENCESOF RADIATION TEST CAM PAIGNS

  • From the results, one can see that the reliability of an application

depends on the software environment characteristics:

  • Operating system.
  • Multiprocessing mode used.
  • Characteristics of application.
slide-31
SLIDE 31

RADIATION TES TS IN SM P M ODE

31

FIGURE 12. ERROR CLASSIFICATION ACCORDING TO OS FAULT The obtained results revealed that errors may occur in SMP mode, even if the OS is in idle mode.

slide-32
SLIDE 32

RADIATION TES TS

32

FIGURE 13. SEE CONSEQUENCESACCORDING TO THE SCENARIO IM PLEM ENTED. THE

CONFIDENCE INTERVALSARE SHOWN BY M EANSOF THE RED LINES.

slide-33
SLIDE 33

33

FIGURE 14: MPPA- 256 M EM ORY ARCHITECTURE

KALRAY M PP A-256

  • TS

MC CMOS 28HP technology.

M anufactured

  • 256 Processing Engine (PE) and 32

Resource Management (RM) cores.

Integrates

  • Core VLIW 32-bit/ 64-bit architecture.

Based on

  • 100 MHz to 600 MHz.

Operation frequency

  • 15 W to 25 W.

Power Consumption

  • 634 GFLOPS

and 316 GFLOPS for single and double-precision respectively.

Peaks performance at 600 M Hz

  • 16 compute clusters (CCs) and 2 I/ O

clusters per device.

Clustered architecture

  • multi-banked local static memory

(SM EM ) of 2M B shared by the 16(PE) + 1(RM ).

Compute Cluster

  • 2 groups of quadcore. Each 128 KB

shared.

I/ O cluster

slide-34
SLIDE 34

1)

  • Core 0 Initializes intercluster communications

2)

  • Core 0 generates a pthread per core:
  • M aster of group of computing cluster

Core 1, 2

  • Voters of the results (TM R –arbiter)

Core 4,5,6

  • Arbiter of the final results. It logs the

results

Core 3

  • Fault Injector.

Core 7 (only of I/ O 0)

Fault Tolerance Approach on M PP A

Implemented at application level, it uses the 2 I/ O to improve the reliability

  • f the application.

34

slide-35
SLIDE 35

I/ O core0 core1 core2 core3 core4 core5 core6 core7 C0 C2 C1 C3 C4 C6 C5 C7 C12 C14 C13 C15 C8 C10 C9 C11 I/ O core4 core5 core6 core7 core0 core1 core2 core3 G0 G1 G2 G3 3) Core 1 and 2 spawn and controls clusters computation.

35

slide-36
SLIDE 36

I/ O core0 core1 core2 core3 core4 core5 core6 core7

Res-G0 Res-G1 Res-G2 Res-G3

I/ O core4 core5 core6 core7 core0 core1 core2 core3

Res-G0 Res-G1 Res-G2 Res-G3

3)

  • Core 1 and 2 save the results in I/ O memory and send the

results to the other I/ O(core0) via intercluster communication.

Fault Tolerance Approach on M PP A

36

slide-37
SLIDE 37

4)

  • core 4, core 5 and core 6 of each I/ O take the results and each one votes

independently of others to obtain the correct result.

5)

  • core 3 votes based on the responses of core 3, core 4 and core 5 and sends

the response to other I/ O, including the number of voters that agree.

6)

  • core3 of I/ O 0 logs the correct results .

Fault Tolerance Approach on M PP A

37

Core 7 of I/ O 0 is the fault injector.

It selects randomly the instant, the core, the register and bit. It sends an interrupt to the cluster that controls the selected core. Once in the cluster, the core selected is interrupted via an interprocessor interrupt.

slide-38
SLIDE 38

OUTLINE

Introduction M otivation Background Work Done

CONCLUSIONS

38

slide-39
SLIDE 39

39

A comparison of both scenarios SMP y AMP shows that the dynamic response of the device depends not only on the application but also on the adopted multi-processing mode. A work of De Witte et all. compares the performance of the SM P and AM P modes both with operating systems for a dual-core giving as a conclusion that SMP outperforms the AMP mode. Inferring this affirmation to our work, it is possible to suggest the existence of a trade-off between reliability and performance according to the multi-processing mode selected.

CONCLUSIONS

slide-40
SLIDE 40

40

Designers can improve the dependability of systems through minimizing the consequences of these effects by: Error-correcting codes in memories, error-reporting architectures (machine-check-error registers), etc. Nevertheless, there are some chip areas that remain unprotected. The inherent redundancy capability of many-cores makes them ideal for implementing fault tolerant techniques such as N-modular redundancy which applies majority-voting. Fault Tolerance in many-core through redundancy must be evaluated in terms of reliability, power consumption and performance.

CONCLUSIONS