high performance outlier detection algorithm for finding
play

High-Performance Outlier Detection Algorithm for Finding - PowerPoint PPT Presentation

High-Performance Outlier Detection Algorithm for Finding Blob-Filaments in Plasma Lingfei Wu 1 , Kesheng Wu 2 , Alex Sim 2 , Michael Churchill 3 , Jong Y. Choi 4 , Andreas Stathopoulos 1 , CS Chang 3 , and Scott Klasky 4 1 College of William and


  1. High-Performance Outlier Detection Algorithm for Finding Blob-Filaments in Plasma Lingfei Wu 1 , Kesheng Wu 2 , Alex Sim 2 , Michael Churchill 3 , Jong Y. Choi 4 , Andreas Stathopoulos 1 , CS Chang 3 , and Scott Klasky 4 1 College of William and Mary 2 Lawrence Berkeley National laboratory 3 Princeton Plasma Physics Laboratory 4 Oak Ridge National Laboratory BDAC-SC14 1 / 17

  2. Outline • Outline Introduction Introduction Related work Related work Blob detection Blob detection Hybrid parallel Hybrid parallel Evaluations Conclusion Evaluations Conclusion BDAC-SC14 2 / 17

  3. What is an outlier ? • Outline An outlier is a data object that deviates significantly from the Introduction rest of the objects, as if it were generated by a different • Outlier Detection mechanism. 1 • Our goal • Blobs in fusion • Motivation • Outliers could be errors or noise to be eliminated Related work • Outliers can lead to the discovery of important information in data Blob detection Hybrid parallel Outlier detection is employed in a variety of applications: Evaluations • Conclusion fraud detection • time-series monitoring • medical care • public safety and security 1 Jiawei Han and Micheline Kamber, Data Mining, Southeast Asia Edition: Concepts and Techniques , Morgan kaufmann, 2006. BDAC-SC14 3 / 17

  4. Our goal • Outline Outlier detection is an important task in many safety critical Introduction environments. • Outlier Detection • Our goal • An outlier demands to be detected in real-time • Blobs in fusion • Motivation • A suitable feedback is provided to alarm the control system Related work • The size of data sets need fast and scalable outlier detection Blob detection methods Hybrid parallel Evaluations Our goal: apply the outlier detection techniques to effectively Conclusion tackle the fusion blob detection problem on extremely large parallel machines • Massive amounts of data are generated from fusion experiments / simulations • Near real-time understanding of data is needed to predict performance BDAC-SC14 4 / 17

  5. Blobs in fusion • Outline What is fusion & Why fusion? Introduction • Outlier Detection • Fusion is viable energy • Our goal • Blobs in fusion source for the future • Motivation • Fossil fuels will run out Related work Blob detection soon; Solar and wind have Hybrid parallel limited potential Evaluations • Advantages of fusion: Conclusion inexhaustible, clear and safe BDAC-SC14 5 / 17

  6. Blobs in fusion • Outline Blobs are intermittent bursts of particles near the edge of Introduction the confined plasma • Outlier Detection ⇒ Driven by turbulence • Our goal • Blobs in fusion • Motivation Blobs are bad for fusion performance Related work Blob detection because they: Hybrid parallel • Transport heat and particles away from Evaluations the confined plasma Conclusion • May damage the main chamber wall • Lead to increased levels of neutrals and impurities, bypassing control mechanisms Blob detection is a very important task! BDAC-SC14 5 / 17

  7. Big data challenges in fusion energy • Outline Fusion experiments generate massive amounts of data: Introduction • Outlier Detection • Diagnostics measuring lasts • Our goal • Blobs in fusion from a few to several hundred • Motivation seconds generating large Related work Blob detection amounts of data, ∼ Gigabytes Hybrid parallel to Terabytes! Evaluations • Large-scale fusion simulation Conclusion generates ∼ a few tens of Terabytes per second! BDAC-SC14 6 / 17

  8. Big data challenges in fusion energy • Outline Difficulties in large-scale data analysis: Introduction • Outlier Detection • Existing data analysis is often • Our goal • Blobs in fusion a single-threaded, slow, and • Motivation only for post-run analysis Related work Blob detection • Fusion experiments demand Hybrid parallel real-time data analysis Evaluations • E.g. ICEE aims to apply blob Conclusion detection for monitoring health of fusion experiments in KSTAR Real-time blob detection is a very challenging task! BDAC-SC14 6 / 17

  9. Three approaches for blob detection • Outline • The exact criterion varies Single Introduction • Averaging may destroy important threshold & Related work • Related work conditional information Blob detection averaging Hybrid parallel Evaluations • Image Very sensitive to the setting of Conclusion analysis parameters • techniques Hard to use generic method for all images • Contouring Can not be a real-time blob detection • method & May miss detecting blobs at the edge • thresholding Is still post-run-analysis BDAC-SC14 7 / 17

  10. An efficient blob detection approach • Outline • Our approach: an outlier detection algorithm for efficiently Introduction finding blobs in fusion simulations / experiments Related work ◦ Two-step outlier detection with various criteria after Blob detection • Our approach normalizing the local intensity • The sketch • Refine mesh ◦ Leverage a fast connected component labeling method to • Two-step detection find blob components based on a refined triangular mesh • Fast CCL Hybrid parallel • Contributions: Evaluations ◦ Conclusion A new method not missing detection of blobs in the edge of the region of interests compared to contouring method ◦ Targeting for more challenging in-shot-analysis and between-shot-analysis ◦ The first research work to achieve blob detection in a few milliseconds BDAC-SC14 8 / 17

  11. Outlier detection algorithm for finding blobs • Outline Sketch the proposed outlier detection algorithm: Introduction Related work Blob detection • Our approach • The sketch • Refine mesh • Two-step detection • Fast CCL Hybrid parallel Evaluations Conclusion BDAC-SC14 9 / 17

  12. Refine mesh in the region of interests • Outline Magnetic Fields in Poloidal Plane Introduction Poloidal Plane Reinfed Region of Interests 0.25 Original Related work 1 0.2 Blob detection 0.15 0.5 0.1 • Our approach 0.05 • The sketch Z (m) Z (m) 0 0 • Refine mesh -0.05 • Two-step detection -0.5 -0.1 • Fast CCL -0.15 -1 Hybrid parallel -0.2 -0.25 1.2 1.4 1.6 1.8 2 2.2 2.4 2.25 2.26 2.27 2.28 2.29 2.3 2.31 2.32 Evaluations R (m) R (m) Conclusion • Compute 4 times more triangles by creating new vertexes with the three middle points of original edges • Apply recursively until reaching the desired resolution • Depend on specified data set and demanded resolution BDAC-SC14 10 / 17

  13. Two-step outlier detection to identify blobs • Outline Motivation for two-step outlier detection for finding blobs: Introduction Related work Blob detection • Our approach • The sketch • Refine mesh • Two-step detection • Fast CCL Hybrid parallel Evaluations Conclusion A contour plot in the region of interests BDAC-SC14 11 / 17

  14. Two-step outlier detection to identify blobs • Outline Apply exploratory data analysis to analyze the underlying Introduction distribution of the local normalized density: Related work 4 Density distribution fitting using 50 bins 4 Density distribution fitting using 50 bins 7 x 10 7 x 10 Blob detection • Our approach 6 6 • The sketch Number of points in each bin Number of points in each bin 5 5 • Refine mesh • Two-step detection 4 4 • Fast CCL 3 3 Hybrid parallel 2 2 Evaluations 1 1 Conclusion 0 0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Normalized electron density (n_e/n_e0) Normalized electron density (n_e/n_e0) (a) Extreme Value Distribution (b) Log Normal Distribution � � N ( r i , z i , t ) − µ > α ∗ σ, ∀ ( r i , z i ) ∈ Γ , N ( r i , z i , t ) − µ 2 > β ∗ σ 2 , ∀ ( r i , z i ) ∈ Γ 2 . BDAC-SC14 11 / 17

  15. A fast connected component labeling algorithm • Outline We apply an efficient connected component labeling algorithm Introduction on a refined triangular mesh to find blob components: Related work • This is a two-pass approach and each triangle is scanned firstly Blob detection • Our approach • Reduce unnecessary memory access if any vertex in a triangle • The sketch • Refine mesh is found to be connected with others • Two-step detection • After the label array is filled full, we need flatten the union and • Fast CCL Hybrid parallel find tree Evaluations • Second pass is performed to correct labels and all blob Conclusion candidate components are found BDAC-SC14 12 / 17

  16. Parallelization of blob detection approach • Outline A hybrid MPI/OpenMP parallelization on many-core processor Introduction architecture: Related work • High-level: use MPI to allocate n processes to process each Blob detection time frame Hybrid parallel • MPI/OpenMP • Low-level: use OpenMP to accelerate the computations with m Evaluations threads Conclusion BDAC-SC14 13 / 17

  17. Results: same time frame + four planes • Outline Introduction Related work Blob detection Hybrid parallel Evaluations • Results I • Results II • Results III Conclusion BDAC-SC14 14 / 17

  18. Results: same plane + four time frames • Outline Introduction Related work Blob detection Hybrid parallel Evaluations • Results I • Results II • Results III Conclusion BDAC-SC14 15 / 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend