Characterization of System on a Chip (SoC) Single Event Upset (SEU) - - PowerPoint PPT Presentation

characterization of system on a chip soc single event
SMART_READER_LITE
LIVE PREVIEW

Characterization of System on a Chip (SoC) Single Event Upset (SEU) - - PowerPoint PPT Presentation

Characterization of System on a Chip (SoC) Single Event Upset (SEU) Responses using SEU Data, Classical Reliability Models, and Space Environment Data Melanie Berg 1 , Kenneth LaBel 2 , Michael Campola 2 , Michael Xapsos 2


slide-1
SLIDE 1

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

1

Characterization of System on a Chip (SoC) Single Event Upset (SEU) Responses using SEU Data, Classical Reliability Models, and Space Environment Data

Melanie Berg1, Kenneth LaBel2, Michael Campola2, Michael Xapsos2 Melanie.D.Berg@NASA.gov

1.AS&D in support of NASA/GSFC

  • 2. NASA/GSFC

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

slide-2
SLIDE 2

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Acronyms

  • Combinatorial logic (CL)
  • Commercial off the shelf (COTS)
  • Complementary metal-oxide

semiconductor (CMOS)

  • Device under test (DUT)
  • Edge-triggered flip-flops (DFFs)
  • Error rate (λ)
  • Error rate per bit(λbit)
  • Error rate per system(λsystem)
  • Field programmable gate array (FPGA)
  • Global triple modular redundancy (GTMR)
  • Hardware description language (HDL)
  • Input – output (I/O)
  • Intellectual Property (IP)
  • Linear energy transfer (LET)
  • Mean fluence to failure (MFTF)
  • Mean time to failure (MTTF)
  • Number of used bits (#Usedbits)
  • Operational frequency (fs)
  • Personal Computer (PC)

2

  • Probability of configuration upsets

(Pconfiguration)

  • Probability of Functional Logic upsets

(PfunctionalLogic)

  • Probability of single event functional interrupt

(PSEFI)

  • Probability of system failure (Psystem)
  • Processor (PC)
  • Radiation Effects and Analysis Group (REAG)
  • Reliability over time (R(t))
  • Reliability over fluence (R(Φ))
  • Single event effect (SEE)
  • Single event functional interrupt (SEFI)
  • Single event latch-up (SEL)
  • Single event transient (SET)
  • Single event upset (SEU)
  • Single event upset cross-section (σSEU)
  • Xilinx Virtex 5 field programmable gate array

(V5)

  • Xilinx Virtex 5 field programmable gate array

radiation hardened (V5QV)

slide-3
SLIDE 3

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Problem Statement

  • Conventional methods of

applying single event upset (SEU) data to complex systems implemented in field programmable gate array (FPGA) devices need improvement.

  • The problem boils down to

extrapolation and application of SEU data to characterize system performance in radiation environments.

3

slide-4
SLIDE 4

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Abstract

  • We are investigating the application of classical reliability

performance metrics combined with standard SEU analysis data.

  • We expect to relate SEU behavior to system performance

requirements…

– Example: The system is required to be 99.999% (5-nines) reliable within a given time window. Will the system’s SEU response meet mission requirements? – Our proposed methodology will provide better prediction of SEU responses in harsh radiation environments.

4

slide-5
SLIDE 5

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Background

FPGA SEU Susceptibility Measured in SEU Cross Section (σSEU)

Design σSEU Configuration σSEU Functional logic

σSEU

SEFI σSEU Sequential and Combinatorial logic (CL) in data path Global Routes and Hidden Logic

5

  • σSEUs (per category) are calculated from SEE test and analysis.
  • FPGAs vary and so do their SEU responses.
  • Most believe the dominant σSEUs are per bit (configuration or

functional logic). However, global routes are also significant. σSEUs are measured by bit σSEUs are measured by bit

For functional logic, should σSEUs be measured by bit????

slide-6
SLIDE 6

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Background

(Current Goal: Convert SEU cross-sections (σSEU: cm2/(particles)) to error rates (λ) for complex systems)

  • Perform SEU accelerated radiation testing

across ions with different linear energy transfers (LETs) to calculate σSEUs per LET.

  • Bottom-Up approach (transistor level):

– Given σSEU (per bit) use an error rate calculator (such as CRÈME96) to

  • btain an error rate per bit (λbit ).

– Multiply λbit by the dominant number

  • f used memory bits (#UsedBits) in the

target design to attain a system error rate (λsystem).

  • Top-Down approach (system level):
  • Given σSEU (per system) use an error

rate calculator (such as CRÈME96) to

  • btain an error rate per bit (λsystem ).

6

σSEU = #errors/fluence λsystem = #errors/time

LET: Linear energy transfer

slide-7
SLIDE 7

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Technical Problems with Current Methods of Error Rate Calculation

  • For submission to CRÈME96, σSEU

data (across LET) is fitted to a Weibull curve.

– The two main parameters for curve fitting are a shape factor and a slope factor. – During the curve fitting process, a large amount of error can be introduced. – Consequently, it is possible for resultant error rates (for the same design) to vary by decades.

  • Because of the error rate calculation

process, σSEU data is blended together and it is nearly impossible to hone in on the problem spots. This can become important for mitigation insertion.

7

1.00E-08 1.00E-07 1.00E-06 1.00E-05 1.00E-04 1.00E-03 1.00E-02 1.00E-01

0.0 20.0 40.0 60.0

σSEU (cm2/design) LET MeV*cm2/mg

Top-down σSEU Data versus LET

slide-8
SLIDE 8

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Technical Problems with Bottom-Up Analysis Method (1)

  • Multiplying each bit within a design by λbit is

not an efficient method of system error rate prediction. – Works well with memory structures… but…complex systems do not operate like memories. – If an SEU affects a bit, and the bit is either inactive, disabled, or masked, a system malfunction might not occur.

  • Using the same multiplication factor

across DFFs will produce extreme over- estimates.

  • To this date, there is no accurate

method to predict DFF activity for complex systems.

  • Fault injection or simulation will not

determine frequency of activity.

8

λsystem < λbit×#UsedBits

slide-9
SLIDE 9

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Technical Problems with Bottom-Up Analysis Method (2)

  • There are a variety of components

that are susceptible to SEUs (clocks, resets, combinatorial logic, flip-flops (DFFs, etc…)).

– Various component susceptibilities are not accurately characterized at a per bit level. – Design topology makes a significant difference in susceptibility and is not characterized in error rate calculators (e.g., CREME96).

9

Error rates calculated at the transistor-bit level are estimated at too small of granularity for proper extrapolation to complex systems.

slide-10
SLIDE 10

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Let’s Not Reinvent The Wheel… A Proven Solution Can Be Found in Classical Reliability Analysis

  • Classical reliability

models have been used as a standard metric for complex system performance.

  • The analysis provides a

more in depth interpretation of system behavior over time by using system-level MTTF data for system performance metrics.

10

Theory is already developed, proven, and should be in our hands! R(t)=e-t/MTTF or R(t)=e-λt

slide-11
SLIDE 11

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 0.0030 10 2010 4010 6010 8010

Failure Rate (Faliures/Time) Time

Infant Mortality... error rate decreaes with time Useful Life...Random errors (constant error rate) Wear Out Life ...error rate increases with time

We will focus on the “Useful Life” of the bathtub curve for this analysis.

Failure Rate (λ(T)) Bathtub Curve

(Weibull Probability Density Function (PDF))

11

slide-12
SLIDE 12

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

  • The exponential model that relates reliability to MTTF

assumes that during useful-lifetime:

– Failures are random. – Error rate is constant. – MTTF = 1/λ.

  • For a given LET (across fluence):

– SEUs are random. – σSEU is constant. – MFTF = 1/σSEU.

  • Hence, mapping from the time domain to the fluence

domain (per LET) is straight forward:

– t Φ – MTTF MFTF – λ σSEU

Mapping Classical Reliability Models from The Time Domain To The Fluence Domain

12

R(t)=e-t/MTTF R(Φ)=eΦ/MFTF

R(t)=e-t/MTTF or R(t)=e-λt

Parallel between time and fluence.

σSEU = #errors/fluence λsystem = #errors/time

Weibull slope = 1… exponential.

slide-13
SLIDE 13

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Creating Reliability Curves from σSEUs

  • σSEU data is system level.
  • A histogram of environment

data is created. Bins are determined by LET values at each σSEU data point.

  • For each data point at a given

LET, a combination of binned environment data and upper- bound σSEU data are used to determine system reliability performance.

  • A piecemeal approach is

performed per data point to determine the weakest points

  • f system performance.

13

1.00E-08 1.00E-07 1.00E-06 1.00E-05 1.00E-04 1.00E-03 1.00E-02

0.0 20.0 40.0 60.0

σSEU (cm2/design)

LET MeV*cm2/mg

  • M. A. Xapsos, IEEE NSREC Short Course, Ponte Vedra

Beach, FL, 2008.

slide-14
SLIDE 14

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Example

  • Mission requirements:

– The FPGA shall contain an embedded microprocessor. – Decision shall be made to select a Xilinx V5QV (approximately $80,000 per device) or a Xilinx V5 with embedded PowerPC (less than $2000.00) per device. – FPGA operation shall have reliability of 3-nines (99.9%) within a 10 minute window at Geosynchronous Equatorial Orbit (GEO).

  • Proposed methodology:

– Create a histogram of particle flux versus LET for a 10- minute window of time for your target environment. – Calculate MFTF per LET (obtain SEU data). – Graph R(Φ) for a variety of LET values and their associated

  • MFTFs. R(Φ)=eΦ/MFTF

– For selected ranges of LETs, use an upper bound of particle flux (number of particles/cm210-minutes), to determine if the system will meet the mission’s reliability requirements.

14

slide-15
SLIDE 15

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Flux versus LET Histogram for A 10- minute Window

15

Geosynchronous Equatorial Orbit (GEO) 100-mils shielding

1.0E-08 1.0E-07 1.0E-06 1.0E-05 1.0E-04 1.0E-03 1.0E-02 1.0E-01 1.0E+00 1.0E+01 1.0E+02 1.0E+03 0 To 0.07 0.07 To 0.14 0.14 To 1.8 1.8 To 3.6 3.6 To 20 20 To 40 40 and over

Flux(particles/(cm2*10-minutes) LET Bins (MeVcm2/mg)

slide-16
SLIDE 16

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

MFTF versus LET for the Xilinx V5 MicroBlaze Soft Processor Core and the Xilinx V5QV embedded PowerPC Core

16

1.0E+02 1.0E+03 1.0E+04 1.0E+05 1.0E+06 1.0E+07 1.0E+08 20 40 60 80 100

MFTF (particles/cm2)

LET MeVcm2/mg V5QV: MicroBlaze with Cache Enabled V5: PowerPC

MFTF = 1/σSEU

Note: no system errors were

  • bserved for V5QV at

LET<3.6MeVcm2/mg. However, configuration bit errors were observed (design dependent). We are focused on system performance.

slide-17
SLIDE 17

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Reliability across Fluence at LET=0.07MeVcm2/mg And Below

  • V5QV: no system errors

were observed below LET=3.6MeVcm2/mg. Total fluence > 5.0×108 particles/cm2.

  • PowerPC:

– No system errors were

  • bserved from an

LET=0.07MeVcm2/mg with total fluence = 1.0×108 particles/cm2. – Hence, at 0.07, we will assume an upper-bound MFTF = 1.0×108 particles/cm2. – More tests would increase the MFTF for this bin.

17

1.0E+02 1.0E+03 1.0E+04 1.0E+05 1.0E+06 1.0E+07 1.0E+08 50 100

MFTF (particles/cm2)

LET MeVcm2/mg

V5QV: MicroBlaze with Cache Enabled V5: PowerPC

slide-18
SLIDE 18

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

9.998400E-01 9.998600E-01 9.998800E-01 9.999000E-01 9.999200E-01 9.999400E-01 9.999600E-01 9.999800E-01 1.000000E+00 1000 2000 3000 4000 5000 6000 7000 8000 9000

Reliability

Fluence (particles/cm2)

Reliability across Fluence up to LET=0.07 MeVcm2/mg – Low Bound Analysis

18

Binned GEO Environment data shows approximately 3000 particles/(cm210-minutes), in the range of 0.0MeVcm2/mg to 0.07MeVcm2/mg. We are using MFTF for 0.07MeVcm2/mg to upper bound this bin. Reliability at 3000 particles/(cm210-minutes) > 99.99% for the PowerPC design implementation. “9’s” could be increased with more tests.

R(Φ)=eΦ/1.0×108

PowerPC: MFTF = 1.0×108

Used MFTF= 1.0×108 because that was the maximum fluence for tests (no errors observed)

slide-19
SLIDE 19

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

9.999920E-01 9.999930E-01 9.999940E-01 9.999950E-01 9.999960E-01 9.999970E-01 9.999980E-01 9.999990E-01 1.000000E+00

2.5 5 7.5 10 12.5 15 17.5 20 22.5

Reliability Fluence (particles/cm2)

Reliability across Fluence up to LET=0.14MeVcm2/mg

19

Binned GEO Environment data shows approximately 11 particles/(cm210-minutes), in the range of 0.07MeVcm2/mg to 0.14MeVcm2/mg. We are using MFTF for 0.1MeVcm2/mg to upper bound this bin. Reliability at 5 particles/(cm210-minutes) > 99.999% for the V5QV PowerPC design implementation.

R(Φ)=eΦ/5.0×106

PowerPC: MFTF = 5.0×106

slide-20
SLIDE 20

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

9.992000E-01 9.993000E-01 9.994000E-01 9.995000E-01 9.996000E-01 9.997000E-01 9.998000E-01 9.999000E-01 1.000000E+00

4 8 12 16 20 24 28

Reliability

Fluence (particles/cm2)

Reliability across Fluence up to LET=1.8 MeVcm2/mg

20

Binned GEO Environment data shows approximately 9 particles/(cm210-minutes), in the range of 0.14MeVcm2/mg to 1.8MeVcm2/mg. We are using MFTF for 1.8MeVcm2/mg to upper bound this bin. Reliability at 9 particles/(cm210-minutes) > 99.9% for the PowerPC design implementation. This is the most susceptible bin for the system. We fall below 99.99% at approximately 6particles/cm2!

R(Φ)=eΦ/6.0×104

PowerPC: MFTF = 6.0×104

slide-21
SLIDE 21

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Reliability across Fluence up to LET=3.6MeVcm2/mg

21

Binned GEO Environment data shows approximately 0.23 particles/(cm210-minutes), in the range of 1.8MeVcm2/mg to 3.6MeVcm2/mg. Within this LET range, reliability at 0.23 particles/(cm210-minutes) > 99.999% for both design implementations.

V5QV: MFTF= 3.0×106 PowerPC: MFTF = 1.2×103

R(Φ)=eΦ/1.2×103 R(Φ)=eΦ/3.0×106

9.99700E-01 9.99750E-01 9.99800E-01 9.99850E-01 9.99900E-01 9.99950E-01 1.00000E+00 1 2 3 4 5 6 7 8 9 10

Reliability

Fluence (particle/cm2)

slide-22
SLIDE 22

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Reliability across Fluence at LET=40MeVcm2/mg

22

Within this LET range, reliability at 0.07 particles/(cm210-minutes) > 99.9% for both design implementations. We can refine by analyzing smaller bins. Binned GEO environment data shows approximately 0.07 particles/(cm210-minutes), in the range of 3.6MeVcm2/mg to 40.0MeVcm2/mg.

V5QV: MFTF= 7.0×105 PowerPC: MFTF = 2.8×102

R(Φ)=eΦ/2.8×102 R(Φ)=eΦ/7.0×105

0.9994 0.9995 0.9996 0.9997 0.9998 0.9999 1 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

Reliability

Fluence (particle/cm2)

We fall below 99.99% at approximately 0.02particles/cm2!

slide-23
SLIDE 23

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Example Conclusion

  • Using the proposed methodology, the commercial Xilinx

V5 device will meet project requirements.

  • In this case, the project is able to save money by

selecting the significantly cheaper FPGA device and gain performance because of the embedded PowerPC.

23

slide-24
SLIDE 24

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Conclusions

  • This study transforms proven classical reliability models into the

SEU particle fluence domain. The intent is to better characterize SEU responses for complex systems.

  • The method for reliability-model application is as follows:

– SEU data are obtained as MFTF. – Reliability curves (in the fluence domain) are calculated using MFTF; and are analyzed with a piecemeal approach. – Environment data are then used to determine particle flux exposure within required windows of mission operation.

  • The proposed method does not rely on data-fitting and hence

removes a significant source of error.

  • The proposed method provides information for highly SEU-

susceptible scenarios; hence enabling a better choice of mitigation strategy.

  • This is preliminary work. There is more to come.

24

This methodology expresses SEU behavior and response in terms that missions understand via classical reliability metrics.

slide-25
SLIDE 25

To be presented by Melanie D. Berg at the Single Event Effects (SEE) Symposium and Military and Aerospace Programmable Logic Devices (MAPLD) Workshop, La Jolla, CA, May 22-25, 2017.

Acknowledgements

  • Some of this work has been sponsored by the

NASA Electronic Parts and Packaging (NEPP) Program and the Defense Threat Reduction Agency (DTRA).

  • Thanks is given to the NASA Goddard Radiation

Effects and Analysis Group (REAG) for their technical assistance and support. REAG is led by Kenneth LaBel and Jonathan Pellish.

25

Contact Information: Melanie Berg: NASA Goddard REAG FPGA Principal Investigator: Melanie.D.Berg@NASA.GOV