High-Performance Outlier Detection Algorithm for Finding - - PowerPoint PPT Presentation

high performance outlier detection algorithm for finding
SMART_READER_LITE
LIVE PREVIEW

High-Performance Outlier Detection Algorithm for Finding - - PowerPoint PPT Presentation

High-Performance Outlier Detection Algorithm for Finding Blob-Filaments in Plasma Lingfei Wu 1 , Kesheng Wu 2 , Alex Sim 2 , Michael Churchill 3 , Jong Y. Choi 4 , Andreas Stathopoulos 1 , CS Chang 3 , and Scott Klasky 4 1 College of William and


slide-1
SLIDE 1

BDAC-SC14 1 / 17

High-Performance Outlier Detection Algorithm for Finding Blob-Filaments in Plasma

Lingfei Wu1, Kesheng Wu2, Alex Sim2, Michael Churchill3, Jong Y. Choi4, Andreas Stathopoulos1, CS Chang3, and Scott Klasky4

1College of William and Mary 2Lawrence Berkeley National laboratory 3Princeton Plasma Physics Laboratory 4Oak Ridge National Laboratory

slide-2
SLIDE 2

Outline

  • Outline

Introduction Related work Blob detection Hybrid parallel Evaluations Conclusion

BDAC-SC14 2 / 17

Introduction Related work Blob detection Hybrid parallel Evaluations Conclusion

slide-3
SLIDE 3

What is an outlier ?

  • Outline

Introduction

  • Outlier Detection
  • Our goal
  • Blobs in fusion
  • Motivation

Related work Blob detection Hybrid parallel Evaluations Conclusion

BDAC-SC14 3 / 17

An outlier is a data object that deviates significantly from the rest of the objects, as if it were generated by a different mechanism.1

  • Outliers could be errors or noise to be eliminated
  • Outliers can lead to the discovery of important information in data

Outlier detection is employed in a variety of applications:

  • fraud detection
  • time-series monitoring
  • medical care
  • public safety and security

1Jiawei Han and Micheline Kamber, Data Mining, Southeast Asia Edition:

Concepts and Techniques, Morgan kaufmann, 2006.

slide-4
SLIDE 4

Our goal

  • Outline

Introduction

  • Outlier Detection
  • Our goal
  • Blobs in fusion
  • Motivation

Related work Blob detection Hybrid parallel Evaluations Conclusion

BDAC-SC14 4 / 17

Outlier detection is an important task in many safety critical environments.

  • An outlier demands to be detected in real-time
  • A suitable feedback is provided to alarm the control system
  • The size of data sets need fast and scalable outlier detection

methods

Our goal: apply the outlier detection techniques to effectively tackle the fusion blob detection problem on extremely large parallel machines

  • Massive amounts of data are generated from fusion experiments

/ simulations

  • Near real-time understanding of data is needed to predict

performance

slide-5
SLIDE 5

Blobs in fusion

  • Outline

Introduction

  • Outlier Detection
  • Our goal
  • Blobs in fusion
  • Motivation

Related work Blob detection Hybrid parallel Evaluations Conclusion

BDAC-SC14 5 / 17

What is fusion & Why fusion?

  • Fusion is viable energy

source for the future

  • Fossil fuels will run out

soon; Solar and wind have limited potential

  • Advantages of fusion:

inexhaustible, clear and safe

slide-6
SLIDE 6

Blobs in fusion

  • Outline

Introduction

  • Outlier Detection
  • Our goal
  • Blobs in fusion
  • Motivation

Related work Blob detection Hybrid parallel Evaluations Conclusion

BDAC-SC14 5 / 17

Blobs are intermittent bursts of particles near the edge of the confined plasma

⇒ Driven by turbulence

Blobs are bad for fusion performance because they:

  • Transport heat and particles away from

the confined plasma

  • May damage the main chamber wall
  • Lead to increased levels of neutrals and

impurities, bypassing control mechanisms

Blob detection is a very important task!

slide-7
SLIDE 7

Big data challenges in fusion energy

  • Outline

Introduction

  • Outlier Detection
  • Our goal
  • Blobs in fusion
  • Motivation

Related work Blob detection Hybrid parallel Evaluations Conclusion

BDAC-SC14 6 / 17

Fusion experiments generate massive amounts of data:

  • Diagnostics measuring lasts

from a few to several hundred seconds generating large amounts of data, ∼ Gigabytes to Terabytes!

  • Large-scale fusion simulation

generates ∼ a few tens of Terabytes per second!

slide-8
SLIDE 8

Big data challenges in fusion energy

  • Outline

Introduction

  • Outlier Detection
  • Our goal
  • Blobs in fusion
  • Motivation

Related work Blob detection Hybrid parallel Evaluations Conclusion

BDAC-SC14 6 / 17

Difficulties in large-scale data analysis:

  • Existing data analysis is often

a single-threaded, slow, and

  • nly for post-run analysis
  • Fusion experiments demand

real-time data analysis

  • E.g. ICEE aims to apply blob

detection for monitoring health

  • f fusion experiments in

KSTAR

Real-time blob detection is a very challenging task!

slide-9
SLIDE 9

Three approaches for blob detection

  • Outline

Introduction Related work

  • Related work

Blob detection Hybrid parallel Evaluations Conclusion

BDAC-SC14 7 / 17

Single threshold & conditional averaging

  • The exact criterion varies
  • Averaging may destroy important

information

Image analysis techniques

  • Very sensitive to the setting of

parameters

  • Hard to use generic method for all

images

Contouring method & thresholding

  • Can not be a real-time blob detection
  • May miss detecting blobs at the edge
  • Is still post-run-analysis
slide-10
SLIDE 10

An efficient blob detection approach

  • Outline

Introduction Related work Blob detection

  • Our approach
  • The sketch
  • Refine mesh
  • Two-step detection
  • Fast CCL

Hybrid parallel Evaluations Conclusion

BDAC-SC14 8 / 17

  • Our approach: an outlier detection algorithm for efficiently

finding blobs in fusion simulations / experiments

  • Two-step outlier detection with various criteria after

normalizing the local intensity

  • Leverage a fast connected component labeling method to

find blob components based on a refined triangular mesh

  • Contributions:
  • A new method not missing detection of blobs in the edge of

the region of interests compared to contouring method

  • Targeting for more challenging in-shot-analysis and

between-shot-analysis

  • The first research work to achieve blob detection in a few

milliseconds

slide-11
SLIDE 11

Outlier detection algorithm for finding blobs

  • Outline

Introduction Related work Blob detection

  • Our approach
  • The sketch
  • Refine mesh
  • Two-step detection
  • Fast CCL

Hybrid parallel Evaluations Conclusion

BDAC-SC14 9 / 17

Sketch the proposed outlier detection algorithm:

slide-12
SLIDE 12

Refine mesh in the region of interests

  • Outline

Introduction Related work Blob detection

  • Our approach
  • The sketch
  • Refine mesh
  • Two-step detection
  • Fast CCL

Hybrid parallel Evaluations Conclusion

BDAC-SC14 10 / 17

1.2 1.4 1.6 1.8 2 2.2 2.4

  • 1
  • 0.5

0.5 1 R (m) Magnetic Fields in Poloidal Plane Z (m) Poloidal Plane Region of Interests

2.25 2.26 2.27 2.28 2.29 2.3 2.31 2.32

  • 0.25
  • 0.2
  • 0.15
  • 0.1
  • 0.05

0.05 0.1 0.15 0.2 0.25 R (m) Z (m) Reinfed Original

  • Compute 4 times more triangles by creating new vertexes with

the three middle points of original edges

  • Apply recursively until reaching the desired resolution
  • Depend on specified data set and demanded resolution
slide-13
SLIDE 13

Two-step outlier detection to identify blobs

  • Outline

Introduction Related work Blob detection

  • Our approach
  • The sketch
  • Refine mesh
  • Two-step detection
  • Fast CCL

Hybrid parallel Evaluations Conclusion

BDAC-SC14 11 / 17

Motivation for two-step outlier detection for finding blobs:

A contour plot in the region of interests

slide-14
SLIDE 14

Two-step outlier detection to identify blobs

  • Outline

Introduction Related work Blob detection

  • Our approach
  • The sketch
  • Refine mesh
  • Two-step detection
  • Fast CCL

Hybrid parallel Evaluations Conclusion

BDAC-SC14 11 / 17

Apply exploratory data analysis to analyze the underlying distribution of the local normalized density:

2 4 6 8 10 12 1 2 3 4 5 6 7 x 10

4

Density distribution fitting using 50 bins Normalized electron density (n_e/n_e0) Number of points in each bin

(a) Extreme Value Distribution

2 4 6 8 10 12 1 2 3 4 5 6 7 x 10

4

Density distribution fitting using 50 bins Normalized electron density (n_e/n_e0) Number of points in each bin

(b) Log Normal Distribution

  • N(ri, zi, t) − µ > α ∗ σ, ∀(ri, zi) ∈ Γ,

N(ri, zi, t) − µ2 > β ∗ σ2, ∀(ri, zi) ∈ Γ2.

slide-15
SLIDE 15

A fast connected component labeling algorithm

  • Outline

Introduction Related work Blob detection

  • Our approach
  • The sketch
  • Refine mesh
  • Two-step detection
  • Fast CCL

Hybrid parallel Evaluations Conclusion

BDAC-SC14 12 / 17

We apply an efficient connected component labeling algorithm

  • n a refined triangular mesh to find blob components:
  • This is a two-pass approach and each triangle is scanned firstly
  • Reduce unnecessary memory access if any vertex in a triangle

is found to be connected with others

  • After the label array is filled full, we need flatten the union and

find tree

  • Second pass is performed to correct labels and all blob

candidate components are found

slide-16
SLIDE 16

Parallelization of blob detection approach

  • Outline

Introduction Related work Blob detection Hybrid parallel

  • MPI/OpenMP

Evaluations Conclusion

BDAC-SC14 13 / 17

A hybrid MPI/OpenMP parallelization on many-core processor architecture:

  • High-level: use MPI to allocate n processes to process each

time frame

  • Low-level: use OpenMP to accelerate the computations with m

threads

slide-17
SLIDE 17

Results: same time frame + four planes

  • Outline

Introduction Related work Blob detection Hybrid parallel Evaluations

  • Results I
  • Results II
  • Results III

Conclusion

BDAC-SC14 14 / 17

slide-18
SLIDE 18

Results: same plane + four time frames

  • Outline

Introduction Related work Blob detection Hybrid parallel Evaluations

  • Results I
  • Results II
  • Results III

Conclusion

BDAC-SC14 15 / 17

slide-19
SLIDE 19

Results: real-time blob detection

  • Outline

Introduction Related work Blob detection Hybrid parallel Evaluations

  • Results I
  • Results II
  • Results III

Conclusion

BDAC-SC14 16 / 17

10 10

1

10

2

10

3

10

4

10

  • 3

10

  • 2

10

  • 1

10 10

1

10

2

10

3

Real Time Blob Detection Number of processes Time (Second) I/O Time - MPI I/O Time - MPI/OpenMP Detection Time - MPI Detection Time - MPI/OpenMP 10 10

1

10

2

10

3

10

4

10 10

1

10

2

10

3

10

4

Real Time Blob Detection Number of processes Speedup over sequatial MPI Speedup MPI/OpenMP Speedup

  • Complete blob detection in around 2 ms with MPI/OpenMP using

4096 cores and in 3 ms with MPI using 1024 cores

  • MPI/OpenMP is two times faster than MPI
  • Linear time speedup in blob detection time and slightly more in

I/O time

slide-20
SLIDE 20

Conclusion and future work

  • Outline

Introduction Related work Blob detection Hybrid parallel Evaluations Conclusion

  • Conclusion

BDAC-SC14 17 / 17

We present for the first time a real time blob detection method for finding blob-filaments in real fusion experiments or numerical simulations.

  • Key components:
  • Two-step outlier detection with various criteria
  • A fast connected component labeling method
  • Hybrid MPI/OpenMP parallelization
  • Future work:
  • Test the detection algorithm to experimental measurement

data from operating fusion devices

  • Develop a blob tracking algorithm