CUDA accelerated fault tree analysis with C-XSC Gabor Rebner 1 , - PowerPoint PPT Presentation

Table of Contents Motivation Definitions Implementation Conclusion References CUDA accelerated fault tree analysis with C-XSC Gabor Rebner 1 , Michael Beer 2 1 Department of Computer and Cognitive Sciences (INKO) University of Duisburg-Essen Duisburg, Germany 2 Institute for Risk & Uncertainty University of Liverpool Liverpool, UK 19.09.2012 1 / 19

Table of Contents Motivation Definitions Implementation Conclusion References Table of Contents 1 Motivation 2 Definitions Verification CUDA Fault Tree Analysis 3 Implementation C++ and CUDA Evaluation 4 Conclusion Conclusion Future Work 2 / 19

Table of Contents Motivation Definitions Implementation Conclusion References Motivation Implementation of verified fault tree analysis in C++ using high-performance GPU 1 computing Issues Using GPU accelerated high-performance features to 1 Reduce the trade-off between computation accuracy and computation time 2 Use directed rounding based on the IEEE 754-2008 standard on the GPU 1 Graphics Processing Unit 3 / 19

Table of Contents Motivation Verification Definitions CUDA Implementation Fault Tree Analysis Conclusion References Verification Definition We use verification in its narrow sense of referring to a mathematical proof for correctness of a result obtained by a computer calculation. Tools Interval arithmetic provided by C-XSC Floating point arithmetic with directed rounding Central Processing Unit (CPU) Compute Unified Device Architecture (CUDA) 4 / 19

Table of Contents Motivation Verification Definitions CUDA Implementation Fault Tree Analysis Conclusion References A short introduction to CUDA Compute Unified Device Architecture (CUDA) High Performance GPU architecture Single Instruction, Multiple Data (SIMD) implementation Up to 2 10 CUDA cores on the NVIDIA GTX 590 Restriction to NVIDIA graphic cards Support of IEEE 754 floating point operations Double precision Directed rounding to the next floating point number (such as fl ▽ ( x ) and fl △ ( x ) with x ∈ R ) 5 / 19

Table of Contents Motivation Verification Definitions CUDA Implementation Fault Tree Analysis Conclusion References Fault Tree Analysis Fundamentals The implementation is based on The approach by Traczinsky et al. (2006) Verified on modern computer systems CUDA 6 / 19

Table of Contents Motivation Verification Definitions CUDA Implementation Fault Tree Analysis Conclusion References 7 / 19

Table of Contents Motivation Verification Definitions CUDA Implementation Fault Tree Analysis Conclusion References Complexity Computation step Each computation of a logical gate (AND- or OR-gate) has a complexity of O ( n 3 ) : Computation of each interval element ( O ( n × n ) ) Computation of the mass assignment for each interval Total complexity: O ( n 3 ) Improvements The algorithm can be improved to obtain an upper bound of complexity slightly smaller than O ( n 3 ) . 8 / 19

Table of Contents Motivation Definitions C++ and CUDA Implementation Evaluation Conclusion References Verification under CUDA Goal Compute correct results on computer systems using finite floating point arithmetic Approach Directed rounding (GPU source code) Interval arithmetic (C-XSC in CPU source code) 9 / 19

Table of Contents Motivation Definitions C++ and CUDA Implementation Evaluation Conclusion References Interval Notation Real Intervals ( IR ) x = [ x , x ] | x ≤ x ≤ x , x , x and x ∈ R Machine Intervals ( IF ) x = [ x , x ] | x ≤ x ≤ x , x , x and x ∈ F \{ Not a number , ±∞} Description x is an interval from the set IR or IF x is the infimum/minimum of x x is the supremum/maximum of x 10 / 19

Table of Contents Motivation Definitions C++ and CUDA Implementation Evaluation Conclusion References Verification under CUDA Goal Compute correct results on computer systems using finite floating point arithmetic Problem Let x = 1 3 and x ∈ R x + x � = 2 3 � �� in floating point arithmetic 2 3 ∈ [ fl ▽ (x + x) , fl △ (x + x) ] � �� lower bound upper bound 11 / 19

Table of Contents Motivation Definitions C++ and CUDA Implementation Evaluation Conclusion References Verification under CUDA Let x and y be two scale elements (intervals) and m x and m y the corresponding mass assignments Lower Failure Bound (OR-Gate) � � � � �� lb = fl ▽ fl ▽ x + y − fl △ x · y with x , y ∈ [0 , 1] , m lb = fl △ ( m x · m y ) with m x , m y ∈ [0 , 1] . 12 / 19

Table of Contents Motivation Definitions C++ and CUDA Implementation Evaluation Conclusion References Verification under CUDA Let x and y be two scale elements (intervals) and m x and m y the corresponding mass assignments Lower Failure Bound (AND-Gate) � � lb = fl ▽ x · y with x , y ∈ [0 , 1] ub = fl △ ( x · y ) with x , y ∈ [0 , 1] m = fl △ ( m x · m y ) with m x , m y ∈ [0 , 1] . 13 / 19

Table of Contents Motivation Definitions C++ and CUDA Implementation Evaluation Conclusion References Computation time Wall-clock time [s] spend on computation Configurations: Benchmark 1 (B1): n = 200 , f = 20 , l = 100 Benchmark 2 (B2): n = 5000 , f = 100 , l = 60 C++(LB) a DSI b (LB) C++(UB) DSI(UB) B1 7 7 1685 1712 B2 721 654 48070 46160 a C++ utilizing C-XSC and CUDA b DSI 3.5.2 and INTLAB V6 14 / 19

Table of Contents Motivation Definitions C++ and CUDA Implementation Evaluation Conclusion References Computation time 10 5 C++ & CUDA MATLAB & INTLAB 10 4 Wall-clock time [s] 10 3 10 2 10 1 ) ) ) ) B B B B L U L U ( ( ( ( 1 1 2 2 k k k k r r r r a a a a m m m m h h h h c c c c n n n n e e e e b b b b Figure : Wall-clock time [s] spend on computation (logarithmic) 15 / 19

Table of Contents Motivation Definitions Conclusion Implementation Future Work Conclusion References Conclusion Achievements Reduction of the trade-off between accuracy and computation time Verified computation on the GPU using CUDA 16 / 19

Table of Contents Motivation Definitions Conclusion Implementation Future Work Conclusion References Future Work Perspective Using high performance computing In MATLAB utilizing the MEX-Interface with CUDA and C-XSC To compute Markov set chains (imprecise Markov chains) 17 / 19

Table of Contents Motivation Definitions Implementation Conclusion References References [1] Auer , E. ; Luther , W. ; Rebner , G. ; Limbourg , P.: A Verified MATLAB Toolbox for the Dempster-Shafer Theory. In: Proceedings of the Workshop on the Theory of Belief Functions www. udue. de/ DSIPaperone , http: // www. udue. de/ DSI , 2010 [2] Carreras , C. ; Walker , I.: Interval Methods for Fault-Tree Analyses in Robotics. In: IEEE Transactions on Reliability 50 (2001), 3–11. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=00935010 [3] IEEE Computer Society : IEEE Standard for Floating-Point Arithmetic. In: IEEE Std 754-2008 (2008), 29, S. 1 –58. http://dx.doi.org/10.1109/IEEESTD.2008.4610935 . – DOI 10.1109/IEEESTD.2008.4610935 [4] Kr¨ amer , H.: C-XSC 2.0: A C++ Library for Extended Scientific Computing. In: Lecture Notes in Computer Science Bd. 2991/2004. Springer-Verlag, Heidelberg, 2004, S. 15–35 [5] Kr¨ amer , W. ; Zimmer , M. ; Hofschuster , W.: Using C-XSC for High Performance Verified Computing. Version: 2012. http://dx.doi.org/10.1007/978-3-642-28145-7_17 . In: J´ onasson , Kristj´ an (Hrsg.): Applied Parallel and Scientific Computing Bd. 7134. Springer Berlin / Heidelberg, 2012. – ISBN 978–3–642–28144–0, 168-178. – 10.1007/978-3-642-28145-7 17 [6] NVIDIA : Plattform f¨ ur Parallel-Programmierung und parallele Berechnungen . Website http://www.nvidia.de/object/cuda_home_new_de.html , [7] Rebner , G. ; Auer , E. ; Luther , W.: A verified realization of a Dempster–Shafer based fault tree analysis. In: Computing 94 (2012), S. 313–324. http://dx.doi.org/10.1007/s00607-011-0179-3 . – DOI 10.1007/s00607–011–0179–3. – ISSN 0010–485X 18 / 19

Table of Contents Motivation Definitions Implementation Conclusion References Thank you 19 / 19

CUDA accelerated fault tree analysis with C-XSC Gabor Rebner 1 , - PowerPoint PPT Presentation

Table of Contents Motivation Definitions Implementation Conclusion References CUDA accelerated fault tree analysis with C-XSC Gabor Rebner 1 , Michael Beer 2 1 Department of Computer and Cognitive Sciences (INKO) University of Duisburg-Essen

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

JUST ONE FAULT Persistent Fault Analysis on Block Ciphers Shivam Bhasin Temasek Labs @ NTU ASK

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge

CUDA-Accelerated Short-Read Alignment to a Large Reference Genome Richard Wilton Department of

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

Differential Fault Analysis of HC-128 Aleksandar Kircanski and Amr M. Youssef AFRICACRYPT 2010

Astro 1: Introductory Astronomy David Cohen Spring 2014 Class 21: Tuesday, April 8 facing

Diffuse Stellar Component in State-of-the-art Cosmological Hydrodynamical Simulations Stellar

Chapter 6 Organization and the Arts Management & the Arts, 5e, (C) Wm. Byrnes, 2014 Chapter

University of Montevallo, Department of Art Tips for Photographing Your Artwork High quality

Local System Voting Feature for Machine Translation System Combination Markus Freitag,

Semantics of computing 2/17 Classical view A computer program transforms an input into an

Grade 9 Parent Information Session Strategies for Success KCVI Wednesday, August 28, 2019

C ORPORATION Algoma Central Corporation D ECEMBER 2015 Board Of Directors Sept 9th, 2015 Ken

CUDA accelerated fault tree analysis with C-XSC Gabor Rebner 1 , - PowerPoint PPT Presentation

Table of Contents Motivation Definitions Implementation Conclusion References CUDA accelerated fault tree analysis with C-XSC Gabor Rebner 1 , Michael Beer 2 1 Department of Computer and Cognitive Sciences (INKO) University of Duisburg-Essen

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

JUST ONE FAULT Persistent Fault Analysis on Block Ciphers Shivam Bhasin Temasek Labs @ NTU ASK

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge

CUDA-Accelerated Short-Read Alignment to a Large Reference Genome Richard Wilton Department of

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

Differential Fault Analysis of HC-128 Aleksandar Kircanski and Amr M. Youssef AFRICACRYPT 2010

Astro 1: Introductory Astronomy David Cohen Spring 2014 Class 21: Tuesday, April 8 facing

Diffuse Stellar Component in State-of-the-art Cosmological Hydrodynamical Simulations Stellar

Chapter 6 Organization and the Arts Management &amp; the Arts, 5e, (C) Wm. Byrnes, 2014 Chapter

University of Montevallo, Department of Art Tips for Photographing Your Artwork High quality

Local System Voting Feature for Machine Translation System Combination Markus Freitag,

Semantics of computing 2/17 Classical view A computer program transforms an input into an

Grade 9 Parent Information Session Strategies for Success KCVI Wednesday, August 28, 2019

C ORPORATION Algoma Central Corporation D ECEMBER 2015 Board Of Directors Sept 9th, 2015 Ken

Chapter 6 Organization and the Arts Management & the Arts, 5e, (C) Wm. Byrnes, 2014 Chapter