Radiation Reliability Issues in Current and Future Supercomputers - - PowerPoint PPT Presentation

radiation reliability issues in current and future
SMART_READER_LITE
LIVE PREVIEW

Radiation Reliability Issues in Current and Future Supercomputers - - PowerPoint PPT Presentation

September 26 th 2017 Grenoble, France PAOLO RECH Radiation Reliability Issues in Current and Future Supercomputers Sponsors HPC reliability importance Paolo Rech Grenoble, France 2 Available Accelerators Modern parallel


slide-1
SLIDE 1

Radiation Reliability Issues in Current and Future Supercomputers

September 26th 2017 – Grenoble, France

PAOLO RECH

slide-2
SLIDE 2

Sponsors

slide-3
SLIDE 3

Paolo Rech – Grenoble, France

HPC reliability importance

2

slide-4
SLIDE 4

Paolo Rech – Grenoble, France

Available Accelerators

Modern parallel accelerators offer:

  • Low cost
  • Flexible platform
  • High efficiency (low per-thread consumption)
  • High computational power and frequency
  • Huge amount of resources

3

Kepler K40 Xeon-Phi

slide-5
SLIDE 5

Paolo Rech – Grenoble, France

Available Accelerators

Modern parallel accelerators offer:

  • Low cost
  • Flexible platform
  • High efficiency (low per-thread consumption)
  • High computational power and frequency
  • Huge amount of resources
  • Reliability?

3

Kepler K40 Xeon-Phi

slide-6
SLIDE 6

Paolo Rech – Grenoble, France

Available Accelerators

Modern parallel accelerators offer:

  • Low cost
  • Flexible platform
  • High efficiency (low per-thread consumption)
  • High computational power and frequency
  • Huge amount of resources
  • Reliability?

Error Rate

3

Kepler K40 Xeon-Phi

slide-7
SLIDE 7

Paolo Rech – Grenoble, France

Titan

Titan (Oak Ridge National Lab): 18,688 GPUs High probability of having a GPU corrupted Titan Detected Uncorrectable Errors MTBF is ~44h*

*(field and experimental data from HPCA’15)

4

slide-8
SLIDE 8

Paolo Rech – Grenoble, France

HPC bad stories

Virginia Tech’s Advanced Computing facility built a supercomputer called Big Mac in 2003

  • 1,100 Apple Power Mac G5
  • Couldn't boot because of the failure rate
  • Power Mac G5 did not have error-correcting code (ECC) memory
  • Big Mac was broken apart and sold on-line

Jaguar – (2009 #1 Top500 list) ● 360 terabytes of main memory ● 350 ECC errors per minute ASCI Q – (2002 #2 in Top500 list)

  • Built with AlphaServers
  • 7 Teraflops
  • Couldn't run more than 1h without crash
  • After putting metal side it could last 6h before crash
  • Address bus on the microprocessors were unprotected (causing

the crashes)

4

slide-9
SLIDE 9

Paolo Rech – Grenoble, France

Outline

The origins of the issue: § Radiation Effects Essentials § Error Criticality in HPC Understand the issue: § Experimental Procedure § K40 vs Xeon Phi Toward the solution of the issue: § ECC – ABFT – Duplication § Selective Hardening What’s the Plan?

5

slide-10
SLIDE 10

Paolo Rech – Grenoble, France

Outline

The origins of the issue: § Radiation Effects Essentials § Error Criticality in HPC Understand the issue: § Experimental Procedure § K40 vs Xeon Phi Toward the solution of the issue: § ECC – ABFT – Duplication § Selective Hardening What’s the Plan?

slide-11
SLIDE 11

Paolo Rech – Grenoble, France

Terrestrial Radiation Environment

6

Cosmic rays could be so energetic to pass the Van Allen belts

slide-12
SLIDE 12

Paolo Rech – Grenoble, France

Terrestrial Radiation Environment

Galactic cosmic rays interact with atmosphere shower of energetic particles: Muons, Pions, Protons, Gamma rays, Neutrons

13 n/(cm2žh) @sea level*

*JEDEC JESD89A Standard

6

Cosmic rays could be so energetic to pass the Van Allen belts

slide-13
SLIDE 13

Paolo Rech – Grenoble, France

Altitude and Radiation

Maximum ionization @ ~13KM above sea level

7

slide-14
SLIDE 14

Paolo Rech – Grenoble, France

Altitude and Radiation

Maximum ionization @ ~13KM above sea level

7

LANL

slide-15
SLIDE 15

Paolo Rech – Grenoble, France

Radiation Effects - Soft Errors

1

IONIZING PARTICLE

1

  • One or more bit-flips

Single Event Upset (SEU) Multiple Bit Upset (MBU) Soft Errors: the device is not permanently damaged, but the particle may generate:

  • Transient voltage pulse

Single Event Transient (SET) FF Logic

IONIZING PARTICLE

8

slide-16
SLIDE 16

Paolo Rech – Grenoble, France

Silent Data Corruption vs Crash

Soft Errors in:

  • data cache
  • register files
  • logic gates (ALU)
  • scheduler

Soft Errors in:

  • instruction cache
  • scheduler / dispatcher
  • PCI-e bus controller

Silent Data Corruption DUE (Crash)

9

slide-17
SLIDE 17

Paolo Rech – Grenoble, France

Radiation Effects on Parallel Accelerators

SM

CUDA GPU

DRAM

Blocks Scheduler and Dispatcher L2 Cache

SM SM SM SM SM SM SM SM SM SM SM Streaming Multiprocessor

Instruction Cache Warp Scheduler Dispatch Unit Register File

core core core core

core core

Shared Memory / L1 Cache

core core

Warp Scheduler Dispatch Unit

SM SM SM SM SM SM SM SM SM SM SM SM

X X

core

X

core core core core core core core

X X X

10

slide-18
SLIDE 18

Paolo Rech – Grenoble, France

Output Correctness in HPC

A single fault can propagate to several parallel threads: multiple corrupted elements.

11

slide-19
SLIDE 19

Paolo Rech – Grenoble, France

Output Correctness in HPC

error can be in the float intrinsic variance Values in a given range are accepted as correct in physical simulations Imprecise computation is being applied to HPC

Not all SDCs are critical for HPC applications …

11

A single fault can propagate to several parallel threads: multiple corrupted elements.

slide-20
SLIDE 20

Paolo Rech – Grenoble, France

Output Correctness in HPC

error can be in the float intrinsic variance Values in a given range are accepted as correct in physical simulations Imprecise computation is being applied to HPC

Not all SDCs are critical for HPC applications

Goal: quantify and qualify SDC in NVIDIA and Intel architectures.

11

A single fault can propagate to several parallel threads: multiple corrupted elements.

slide-21
SLIDE 21

Paolo Rech – Grenoble, France

Outline

The origins of the issue: § Radiation Effects Essentials § Error Criticality in HPC Understand the issue: § Experimental Procedure § K40 vs Xeon Phi Toward the solution of the issue: § ECC – ABFT – Duplication § Selective Hardening What’s the Plan?

slide-22
SLIDE 22

Paolo Rech – Grenoble, France

Radiation Test Facilities

12

Irradiation of Chips Electronics

slide-23
SLIDE 23

Paolo Rech – Grenoble, France

Experimental Setup

13

slide-24
SLIDE 24

Paolo Rech – Grenoble, France

Radiation Test are NOT for dummies

What can (and actually went) wrong:

  • Ethernet cables failures
  • Bios checksum error
  • HDD failures
  • Linux GRUB failure
  • power plug failure (wow, this was risky)
  • board boot failure
  • GPU fell off the BUS (this was funny)
  • mic is lost
  • etc… etc… etc…
  • Heather/Sean, can you add something to the list?

14

slide-25
SLIDE 25

Paolo Rech – Grenoble, France

GPU Radiation Test Setup

microcontrollers FPGA SoC FPGA SoC Flash GPU APU

15

slide-26
SLIDE 26

Paolo Rech – Grenoble, France

GPU Radiation Test Setup

23/48

GPU power control circuitry is out of beam

NVIDIA K40 Intel Xeon-Phi desktop PCs AMD APU

slide-27
SLIDE 27

Paolo Rech – Grenoble, France

@LANSCE 1.8x106 n/(cm2 h) @NYC 13 n/(cm2 h)

We test each architecture for 800h, simulating 9.2x108 h of natural radiation (~ 91,000 years)

Neutrons Spectrum

17

slide-28
SLIDE 28

Paolo Rech – Grenoble, France

@LANSCE 1.8x106 n/(cm2 h) @NYC 13 n/(cm2 h)

We test each architecture for 800h, simulating 9.2x108 h of natural radiation (~ 91,000 years)

Neutrons Spectrum

All the collected SDCs are publicly available:

https://github.com/UFRGS-CAROL/HPCA2017-log-data

18

slide-29
SLIDE 29

Paolo Rech – Grenoble, France

  • DGEMM: matrix multiplication
  • lavaMD: particles interactions
  • Hotspot: heat simulation
  • Needleman–Wunsch: Biology
  • CLAMR: DOE’s workload
  • Quick- Merge- Radix-Sort
  • Matrix Transpose: Memory
  • Gaussian

Selected Algorithms

We select a set of benchmarks that:

  • stimulate different resources
  • are representative of HPC applications
  • minimize error masking (high AVF)

19

slide-30
SLIDE 30

Paolo Rech – Grenoble, France

Xeon Phi vs K40 SDC rate

1 10 100 1000 Xeon Phi K40 15 19 23 210 211 212 Hotspot CLAMR N/A lavaMD DGEMM SDC Relative FIT [a.u.]

Xeon Phi error rate seems lower than Kepler, but:

  • Xeon Phi is built in 3D Trigate, Kepler in planar CMOS
  • Xeon Phi and K40 have different throughput

20

slide-31
SLIDE 31

Paolo Rech – Grenoble, France

Parallelism Management Reliability

200 400 600 800 50 100 150 200 250 300 15 19 23 lavaMD 210 211 212 DGEMM Relative FIT [a.u.] Relative FIT [a.u.]

~95% processor resources used with smallest input Increasing the input size we increase the #threads:

  • Xeon-Phi error rate remains constant (<20% variation)
  • K40 SDC error rate increases with input size

K40 Xeon Phi 21

slide-32
SLIDE 32

Paolo Rech – Grenoble, France

Parallelism Management Reliability

K40 Xeon-Phi FIT increases with input size: HW scheduler is prone to be corrupted! data of 2048 active threads is maintained in the register file constant FIT rate: embedded OS is OK!

  • nly 4 threads/core are
  • maintained. Other

threads data in the main memory (not exposed)

22

slide-33
SLIDE 33

Paolo Rech – Grenoble, France

29x29 210x210 211x211 212x212 213x213 DGEMM GFlops

0.00E+00 2.00E+02 4.00E+02 6.00E+02 8.00E+02 1.00E+03 1.20E+03

Xeon Phi K40

Xeon-Phi GFlops almost constant K40 Gflops rapidly increase

Parallelism Management Reliability

K40 throughput increases with input size. Reliability vs Performances trade-off should be considered

23

slide-34
SLIDE 34

Paolo Rech – Grenoble, France

Mean Workload Between Failures

Parallel threads Error rate Throughput

24

slide-35
SLIDE 35

Paolo Rech – Grenoble, France

Mean Workload Between Failures

Parallel threads Error rate Throughput Error rate Throughput

25

slide-36
SLIDE 36

Paolo Rech – Grenoble, France

Mean Workload Between Failures

Which architecture produces a higher amount of data before experiencing a failure? Is there a sweet spot?

Mean Workload Between Failures

Parallel threads Error rate Throughput Error rate Throughput

26

slide-37
SLIDE 37

Paolo Rech – Grenoble, France

DGEMM MWBF

Xeon-Phi MWBF decreases significantly with input size. Even if more prone to be corrupted, Kepler produces more correct data (if parallelism is exploited)

27

slide-38
SLIDE 38

Paolo Rech – Grenoble, France

Quantify and Qualify SDCs

Number of incorrect elements Relative Error how different the error is from the expected value Spatial Locality

x x x xx x x x x x x x x x x x x x x x x x x

line square random

28

slide-39
SLIDE 39

Paolo Rech – Grenoble, France

Quantify and Qualify SDCs

Number of incorrect elements Relative Error how different the error is from the expected value

x x x xx

Spatial Locality

x x x x x x x x x x x x x x x x x x

line square random

28

slide-40
SLIDE 40

Paolo Rech – Grenoble, France

Number of Incorrect Elements vs Relative Error

DGEMM lavaMD

29 K40 Xeon Phi

slide-41
SLIDE 41

Paolo Rech – Grenoble, France

Number of Incorrect Elements vs Relative Error

DGEMM lavaMD

Greater different from expected value

29 K40 Xeon Phi

slide-42
SLIDE 42

Paolo Rech – Grenoble, France

Number of Incorrect Elements vs Relative Error

DGEMM lavaMD

Higher number of corrupted elements Greater different from expected value

29 K40 Xeon Phi

slide-43
SLIDE 43

Paolo Rech – Grenoble, France

Number of Incorrect Elements vs Relative Error

DGEMM lavaMD

Higher number of corrupted elements Greater different from expected value

BAD: high number of corrupted elements, which are very different from the expected output

29 K40 Xeon Phi

slide-44
SLIDE 44

Paolo Rech – Grenoble, France

Number of Incorrect Elements vs Relative Error

DGEMM lavaMD

K40 few corrupted elements, value similar to expected one

Xeon Phi: a lot of corrupted elements, which are very different from expected value

29 K40 Xeon Phi

slide-45
SLIDE 45

Paolo Rech – Grenoble, France

Number of Incorrect Elements vs Relative Error

DGEMM lavaMD

Both K40 and Xeon Phi have few corrupted elements. K40 corruption are very different from the expected one

29 K40 Xeon Phi

slide-46
SLIDE 46

Paolo Rech – Grenoble, France

Number of Incorrect Elements vs Relative Error

Purely arithmetic operations are more reliable (and faster) on the K40 (GPUs have shorten and faster pipelines). Xeon Phi is more reliable for Finite Different Methods (lavaMD), which are based on transcendental functions (exp).

29

DGEMM lavaMD

K40 Xeon Phi

slide-47
SLIDE 47

Paolo Rech – Grenoble, France

Outline

The origins of the issue: § Radiation Effects Essentials § Error Criticality in HPC Understand the issue: § Experimental Procedure § K40 vs Xeon Phi Toward the solution of the issue: § ECC – ABFT – Duplication § Selective Hardening What’s the Plan?

slide-48
SLIDE 48

Paolo Rech – Grenoble, France

Experimental Results (ECC OFF)

1 10 100 1000 10000

MxM MTrans FFT NW lavaMD Hotspot

Crashes SDC

Single Error Correction Double Error Detection ECC.

K20 FIT

35

00100000100000000 00000000100000000 00100000000000000 OK

X

data from Oliveira et

  • al. Trans. Comp.

2016

slide-49
SLIDE 49

Paolo Rech – Grenoble, France

ECC ON - SDC

1 10 100 1000 10000

MxM FFT NW lavaMD Hotspot

K20 FIT ECC reduces the SDC FIT of ~1 order of magnitude (there is almost no code dependence)

ECC OFF ECC ON

36

slide-50
SLIDE 50

Paolo Rech – Grenoble, France

ECC ON - Crash

MxM FFT NW lavaMD Hotspot

K20 FIT

1 10 100 1000 10000

ECC increases the Crash FIT of about 50% (there is almost no code dependence)

Double Bit Errors cause a crash scheduler is not protected

ECC OFF ECC ON

37

00100000100000000

X

slide-51
SLIDE 51

Paolo Rech – Grenoble, France

MxM FFT NW lavaMD Hotspot

K20 FIT

ECC ON – SDC vs Crashes

1 10 100 1000 10000

When the ECC is ON Crashes are more likely to occur than SDCs (this is GOOD for HPC centers!) Crash SDC

38

slide-52
SLIDE 52

Paolo Rech – Grenoble, France

Algorithm Based Fault Tolerance x

A B

checksum checksum

∑ ∑ M

=

col-check row-check

Freivalds ’79

ABFT: technique designed specifically for an algorithm. ABFT requires: input coding, algorithm modification, and output decoding with error detection/correction

col-sum row-sum

X X X

Huang and Abraham ’84 Rech et al., TNS ‘13 39

slide-53
SLIDE 53

Paolo Rech – Grenoble, France

FFT Hardening Idea*

... ...

x0 x1 x2 x3 xN-2 xN-1

+

2x

+

2x

+

2x

... ...

÷(2+w-0) ÷(2+w-1) ÷(2+w-2) ÷(2+w-3) ÷(2+w-N-2) ÷(2+w-N-1)

...

64-points FFT

+

x N

error

*J.Y. Jou and Abraham ‘88 40

slide-54
SLIDE 54

Paolo Rech – Grenoble, France

ECC vs ABFT

FIT [log scale] MxM FFT

SDC crash SDC crash

ECC reduces FIT of ~10 times, ABFT of ~56 times! ECC increases Crashes

  • f 50% ABFT of 10%!

1 10 100 1000 10000

Unhardened ECC ABFT 41

slide-55
SLIDE 55

Paolo Rech – Grenoble, France

Duplication With Comparison

Spatial: block i and i+N are duplicated E-O Spatial: block i and i+1 are duplicated Time: a thread executes twice the operations

SM0

a b c d

SM1

a' b' c' d'

time

SM0

b b' d d'

SM1

a c c'

time

a'

SM0

b & b'

d & d'

SM1

a & a' c & c'

time

42

slide-56
SLIDE 56

Paolo Rech – Grenoble, France

Hotspot - DWC results*

1 10 100 1000 Unhardened ECC Spatial DWC E-O Spatial DWC Time DWC

FIT [log scale]

SDC crash Spatial DWC detects all SDC Spatial E-O detects 80% of SDC Time DWC detects 90% of SDC Only Time DWC reduces Crashes (no additional Blocks scheduling required) DWC is promising: it is generic, easily implemented, and effective… BUT execution time overhead for Spatial DWC and Spatial E-O is 2.5x and for Time DWC is 2x (data is not copied)

*details on Oliveira et al.

  • Trans. Nucl. Sci., 2014

43

slide-57
SLIDE 57

Paolo Rech – Grenoble, France

Duplicate only what REALLY matters

analyze SDC criticality: are there “acceptable” SDCs?

example: CLAMR (DOE workload) experimental result SDC causes a single pixel error SDC causes a huge error

What’s next? Selective Hardening!

44

slide-58
SLIDE 58

Paolo Rech – Grenoble, France

Tolerable SDCs

Xeon Phi K40 - ECC K40 TitanX

45

45

slide-59
SLIDE 59

Paolo Rech – Grenoble, France

Tolerable SDCs

Xeon Phi K40 - ECC K40 TitanX

Output must match the expected output (0% tolerance)

45

45

slide-60
SLIDE 60

Paolo Rech – Grenoble, France

Tolerable SDCs

Xeon Phi K40 - ECC K40 TitanX

Output must match the expected output (0% tolerance) Increasing acceptable difference at the output

45

45

slide-61
SLIDE 61

Paolo Rech – Grenoble, France

Tolerable SDCs

Xeon Phi K40 - ECC K40 TitanX

If we accept a 2.5% of variance from the expected value more than 60% of SDCs could be tolerated

45

45

slide-62
SLIDE 62

Paolo Rech – Grenoble, France

Tolerable SDCs

46

K40 K40 ECC Titan X

Gaussian DGEMM lavaMD Hotspot

slide-63
SLIDE 63

Paolo Rech – Grenoble, France

Tolerable SDCs

K40 K40 ECC Titan X

Gaussian DGEMM lavaMD Hotspot Hotspot: with a 0.1% of tolerance the error rate is reduced of 90%!

45

slide-64
SLIDE 64

Paolo Rech – Grenoble, France

  • 2. detect SW-HW causes for critical SDCs
  • code analysis
  • fault-injection

(NVIDIA SASSIFI and UFRGS CAROL-FI)

  • 1. analyze SDC criticality: are there “acceptable” SDCs?

Duplicate only what REALLY matters

46

What’s next? Selective Hardening!

slide-65
SLIDE 65

Paolo Rech – Grenoble, France

  • 2. detect SW-HW causes for critical SDCs
  • 3. harden selected portions of the code
  • 4. evaluate enhanced reliability and performances
  • 1. analyze SDC criticality: are there “acceptable” SDCs?

Duplicate only what REALLY matters

46

  • code analysis
  • fault-injection

(NVIDIA SASSIFI and UFRGS CAROL-FI)

What’s next? Selective Hardening!

slide-66
SLIDE 66

Paolo Rech – Grenoble, France

SASSI-FI and CAROL-FI

SASSI-FI: NVIDIA architectural-level fault-injector

47

slide-67
SLIDE 67

Paolo Rech – Grenoble, France

SASSI-FI and CAROL-FI

CAROL-FI: UFRGS high-Level Fault Injector for Xeon-Phi and any X86-base processor Modify content of memory currently allocated. Fault Injector requirements: –GDB with python support –OS Interruption signals –Compile the source code in debug mode SASSI-FI: NVIDIA architectural-level fault-injector

47

slide-68
SLIDE 68

Paolo Rech – Grenoble, France

CAROL-FI

Fault model can be adapted We only inject single bit-flip Overhead ~5x

48

slide-69
SLIDE 69

Paolo Rech – Grenoble, France

Radiation Data vs CAROL-FI

Radiation Fault-injection Radiation and FI give very different information.

49

slide-70
SLIDE 70

Paolo Rech – Grenoble, France

CAROL-FI Results

We have injected more than 67,000 faults

49

slide-71
SLIDE 71

Paolo Rech – Grenoble, France

Results - DGEMM

95% of adverse outcomes come from matrices and loop control variables Matrices: Mem occupation Chance to occur SDC or DUE Loop control variables: Mem occupation Chance to occur SDC or DUE

49

slide-72
SLIDE 72

Paolo Rech – Grenoble, France

Results - CLAMR

Most adverse outcomes come from 3 Mesh components Sort K-D Tree Other Mesh operations Faults in Sort and K-D Tree are equally harmful

50

slide-73
SLIDE 73

Paolo Rech – Grenoble, France

Results - Hotspot

Most harmful faults come from constants and control variables Small portion of memory causes most harm: Easy to protect

51

slide-74
SLIDE 74

Paolo Rech – Grenoble, France

Results - LavaMD

Most harmful faults come from Input arrays(charge and distance) Big portion of memory causes most harm: Hard to protect

52

slide-75
SLIDE 75

Paolo Rech – Grenoble, France

Results - LUD

SDCs are generated by faults in matrices DUEs are generated by faults in control variables

53

slide-76
SLIDE 76

Paolo Rech – Grenoble, France

Results - NW

SDCs and DUEs are generated by faults in matrices(with an equal chance) Big portion of memory causes most harm: Hard to protect

54

slide-77
SLIDE 77

Paolo Rech – Grenoble, France

Results

CAROL-FI insights:

– Selective hardening will be effective for DGEMM and Hotspot (small portion of memory causes harm) – Selective hardening may not be effective for LavaMD and NW (big portion of memory causes harm) – CLAMR: specific operations should be hardened (Sort and K-D Tree)

55

slide-78
SLIDE 78

Paolo Rech – Grenoble, France

What’s The Plan?

Exascale = 55x Titan. Can we afford a 55x error rate? Probably not.

56

slide-79
SLIDE 79

Paolo Rech – Grenoble, France

What’s The Plan?

Exascale = 55x Titan. Can we afford a 55x error rate? Probably not.

  • We can show how SDC appears at the output, to

ease detection

  • Understand SDC criticality. Not all errors significantly

affect output: there are “acceptable” SDC

56

slide-80
SLIDE 80

Paolo Rech – Grenoble, France

What’s The Plan?

Exascale = 55x Titan. Can we afford a 55x error rate? Probably not.

  • We can show how SDC appears at the output, to

ease detection

  • Understand SDC criticality. Not all errors significantly

affect output: there are “acceptable” SDC

  • Fault-injection to better understand error propagation

SASSIFI: NVIDIA architectural-level fault-injector CAROL-FI: UFRGS fault-injector for Xeon Phi and X86

56

slide-81
SLIDE 81

Paolo Rech – Grenoble, France

What’s The Plan?

Exascale = 55x Titan. Can we afford a 55x error rate? Probably not.

  • We can show how SDC appears at the output, to

ease detection

  • Understand SDC criticality. Not all errors significantly

affect output: there are “acceptable” SDC

  • Fault-injection to better understand error propagation

SASSIFI: NVIDIA architectural-level fault-injector CAROL-FI: UFRGS fault-injector for Xeon Phi and X86

  • Propose selective-hardening solutions

(duplicate only what matters, what REALLY matters)

56

slide-82
SLIDE 82

Paolo Rech – Grenoble, France

Acknowledgments

Caio Lunardi Caroline Aguiar Daniel Oliveira Fernando Santos Laercio Pilla Vinicius Frattin Philippe Navaux Luigi Carro Chris Frost Nathan DeBardeleben Sean Blanchard Heather Quinn Thomas Fairbanks Steve Wender Timothy Tsai Siva Hari Steve Keckler David Kaeli NUCAR group Matteo Sonza Reorda Luca Sterpone