CUDA accelerated fault tree analysis with C-XSC Gabor Rebner 1 , - - PowerPoint PPT Presentation

cuda accelerated fault tree analysis with c xsc
SMART_READER_LITE
LIVE PREVIEW

CUDA accelerated fault tree analysis with C-XSC Gabor Rebner 1 , - - PowerPoint PPT Presentation

Table of Contents Motivation Definitions Implementation Conclusion References CUDA accelerated fault tree analysis with C-XSC Gabor Rebner 1 , Michael Beer 2 1 Department of Computer and Cognitive Sciences (INKO) University of Duisburg-Essen


slide-1
SLIDE 1

Table of Contents Motivation Definitions Implementation Conclusion References

CUDA accelerated fault tree analysis with C-XSC

Gabor Rebner1, Michael Beer2

1Department of Computer and Cognitive Sciences (INKO)

University of Duisburg-Essen Duisburg, Germany

2Institute for Risk & Uncertainty

University of Liverpool Liverpool, UK

19.09.2012

1 / 19

slide-2
SLIDE 2

Table of Contents Motivation Definitions Implementation Conclusion References

Table of Contents

1 Motivation 2 Definitions

Verification CUDA Fault Tree Analysis

3 Implementation

C++ and CUDA Evaluation

4 Conclusion

Conclusion Future Work

2 / 19

slide-3
SLIDE 3

Table of Contents Motivation Definitions Implementation Conclusion References

Motivation

Implementation of verified fault tree analysis in C++ using high-performance GPU1 computing Issues Using GPU accelerated high-performance features to

1 Reduce the trade-off between computation accuracy and

computation time

2 Use directed rounding based on the IEEE 754-2008 standard

  • n the GPU

1Graphics Processing Unit 3 / 19

slide-4
SLIDE 4

Table of Contents Motivation Definitions Implementation Conclusion References Verification CUDA Fault Tree Analysis

Verification

Definition We use verification in its narrow sense of referring to a mathematical proof for correctness of a result obtained by a computer calculation. Tools Interval arithmetic provided by C-XSC Floating point arithmetic with directed rounding

Central Processing Unit (CPU) Compute Unified Device Architecture (CUDA)

4 / 19

slide-5
SLIDE 5

Table of Contents Motivation Definitions Implementation Conclusion References Verification CUDA Fault Tree Analysis

A short introduction to CUDA

Compute Unified Device Architecture (CUDA) High Performance GPU architecture

Single Instruction, Multiple Data (SIMD) implementation Up to 210 CUDA cores on the NVIDIA GTX 590

Restriction to NVIDIA graphic cards Support of IEEE 754 floating point operations

Double precision Directed rounding to the next floating point number (such as fl▽ (x) and fl△ (x) with x ∈ R )

5 / 19

slide-6
SLIDE 6

Table of Contents Motivation Definitions Implementation Conclusion References Verification CUDA Fault Tree Analysis

Fault Tree Analysis

Fundamentals The implementation is based on The approach by Traczinsky et al. (2006) Verified on modern computer systems CUDA

6 / 19

slide-7
SLIDE 7

Table of Contents Motivation Definitions Implementation Conclusion References Verification CUDA Fault Tree Analysis 7 / 19

slide-8
SLIDE 8

Table of Contents Motivation Definitions Implementation Conclusion References Verification CUDA Fault Tree Analysis

Complexity

Computation step Each computation of a logical gate (AND- or OR-gate) has a complexity of O(n3): Computation of each interval element (O (n × n)) Computation of the mass assignment for each interval Total complexity: O(n3) Improvements The algorithm can be improved to obtain an upper bound of complexity slightly smaller than O(n3).

8 / 19

slide-9
SLIDE 9

Table of Contents Motivation Definitions Implementation Conclusion References C++ and CUDA Evaluation

Verification under CUDA

Goal Compute correct results on computer systems using finite floating point arithmetic Approach Directed rounding (GPU source code) Interval arithmetic (C-XSC in CPU source code)

9 / 19

slide-10
SLIDE 10

Table of Contents Motivation Definitions Implementation Conclusion References C++ and CUDA Evaluation

Interval Notation

Real Intervals (IR) x = [x, x] | x ≤ x ≤ x, x, x and x ∈ R Machine Intervals (IF) x = [x, x] | x ≤ x ≤ x, x, x and x ∈ F\{Not a number, ±∞} Description x is an interval from the set IR or IF x is the infimum/minimum of x x is the supremum/maximum of x

10 / 19

slide-11
SLIDE 11

Table of Contents Motivation Definitions Implementation Conclusion References C++ and CUDA Evaluation

Verification under CUDA

Goal Compute correct results on computer systems using finite floating point arithmetic Problem Let x = 1

3 and x ∈ R

x + x = 2 3

  • in floating point arithmetic

2 3 ∈ [fl▽ (x + x)

  • lower bound

, fl△ (x + x)

  • upper bound

]

11 / 19

slide-12
SLIDE 12

Table of Contents Motivation Definitions Implementation Conclusion References C++ and CUDA Evaluation

Verification under CUDA

Let x and y be two scale elements (intervals) and mx and my the corresponding mass assignments Lower Failure Bound (OR-Gate) lb = fl▽

  • fl▽
  • x + y
  • − fl△
  • x · y
  • with x, y ∈ [0, 1] ,

mlb = fl△ (mx · my) with mx, my ∈ [0, 1] .

12 / 19

slide-13
SLIDE 13

Table of Contents Motivation Definitions Implementation Conclusion References C++ and CUDA Evaluation

Verification under CUDA

Let x and y be two scale elements (intervals) and mx and my the corresponding mass assignments Lower Failure Bound (AND-Gate) lb = fl▽

  • x · y
  • with x, y ∈ [0, 1]

ub = fl△ (x · y) with x, y ∈ [0, 1] m = fl△ (mx · my) with mx, my ∈ [0, 1].

13 / 19

slide-14
SLIDE 14

Table of Contents Motivation Definitions Implementation Conclusion References C++ and CUDA Evaluation

Computation time

Wall-clock time [s] spend on computation Configurations: Benchmark 1 (B1): n = 200, f = 20, l = 100 Benchmark 2 (B2): n = 5000, f = 100, l = 60 C++(LB)a C++(UB) DSIb(LB) DSI(UB) B1 7 7 1685 1712 B2 721 654 48070 46160

aC++ utilizing C-XSC and CUDA bDSI 3.5.2 and INTLAB V6 14 / 19

slide-15
SLIDE 15

Table of Contents Motivation Definitions Implementation Conclusion References C++ and CUDA Evaluation

Computation time

b e n c h m a r k 1 ( L B ) b e n c h m a r k 1 ( U B ) b e n c h m a r k 2 ( L B ) b e n c h m a r k 2 ( U B ) 101 102 103 104 105 Wall-clock time [s] C++ & CUDA MATLAB & INTLAB

Figure : Wall-clock time [s] spend on computation (logarithmic)

15 / 19

slide-16
SLIDE 16

Table of Contents Motivation Definitions Implementation Conclusion References Conclusion Future Work

Conclusion

Achievements Reduction of the trade-off between accuracy and computation time Verified computation on the GPU using CUDA

16 / 19

slide-17
SLIDE 17

Table of Contents Motivation Definitions Implementation Conclusion References Conclusion Future Work

Future Work

Perspective Using high performance computing

In MATLAB utilizing the MEX-Interface with CUDA and C-XSC To compute Markov set chains (imprecise Markov chains)

17 / 19

slide-18
SLIDE 18

Table of Contents Motivation Definitions Implementation Conclusion References

References

[1] Auer, E. ; Luther, W. ; Rebner, G. ; Limbourg, P.: A Verified MATLAB Toolbox for the Dempster-Shafer

  • Theory. In: Proceedings of the Workshop on the Theory of Belief Functions www. udue. de/ DSIPaperone ,

http: // www. udue. de/ DSI , 2010 [2] Carreras, C. ; Walker, I.: Interval Methods for Fault-Tree Analyses in Robotics. In: IEEE Transactions on Reliability 50 (2001), 3–11. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=00935010 [3] IEEE Computer Society: IEEE Standard for Floating-Point Arithmetic. In: IEEE Std 754-2008 (2008), 29,

  • S. 1 –58. http://dx.doi.org/10.1109/IEEESTD.2008.4610935. – DOI 10.1109/IEEESTD.2008.4610935

[4] Kr¨ amer, H.: C-XSC 2.0: A C++ Library for Extended Scientific Computing. In: Lecture Notes in Computer Science Bd. 2991/2004. Springer-Verlag, Heidelberg, 2004, S. 15–35 [5] Kr¨ amer, W. ; Zimmer, M. ; Hofschuster, W.: Using C-XSC for High Performance Verified Computing. Version: 2012. http://dx.doi.org/10.1007/978-3-642-28145-7_17. In: J´

  • nasson, Kristj´

an (Hrsg.): Applied Parallel and Scientific Computing Bd. 7134. Springer Berlin / Heidelberg, 2012. – ISBN 978–3–642–28144–0, 168-178. – 10.1007/978-3-642-28145-7 17 [6] NVIDIA: Plattform f¨ ur Parallel-Programmierung und parallele Berechnungen. Website http://www.nvidia.de/object/cuda_home_new_de.html, [7] Rebner, G. ; Auer, E. ; Luther, W.: A verified realization of a Dempster–Shafer based fault tree analysis. In: Computing 94 (2012), S. 313–324. http://dx.doi.org/10.1007/s00607-011-0179-3. – DOI 10.1007/s00607–011–0179–3. – ISSN 0010–485X 18 / 19

slide-19
SLIDE 19

Table of Contents Motivation Definitions Implementation Conclusion References

Thank you

19 / 19