Machine Learning in Physics Romain Dupuis CmPA May 2, 2019 Romain - - PowerPoint PPT Presentation

machine learning in physics
SMART_READER_LITE
LIVE PREVIEW

Machine Learning in Physics Romain Dupuis CmPA May 2, 2019 Romain - - PowerPoint PPT Presentation

Machine Learning in Physics Romain Dupuis CmPA May 2, 2019 Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 1 / 55 Why a talk about Machine Learning at CmPA ? Interest in branches of physics - High Energy Physics - Astronomy -


slide-1
SLIDE 1

Machine Learning in Physics

Romain Dupuis

CmPA

May 2, 2019

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 1 / 55

slide-2
SLIDE 2

Why a talk about Machine Learning at CmPA ?

Interest in branches of physics

  • High Energy Physics
  • Astronomy
  • Computational fluid dynamics

Kaggle competition

Great potential for plasma physics

  • Wide amount of data from spacecraft
  • Scientific discovery from data
  • Support for numerical simulations

A small group at CmPA Two objectives : introduction to ML and link to problems from physics

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 2 / 55

slide-3
SLIDE 3

Content

1

Some basics for Machine Learning

2

Learning from the existing : supervised learning

3

Extracting knowledge from data : clustering and dimension reduction

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 3 / 55

slide-4
SLIDE 4

Important notions in Machine Learning

slide-5
SLIDE 5

Some notions of machine learning

Machine Learning Hype

Many applications

  • Natural Language
  • Computer vision
  • Fraud/anomaly detection
  • Deep learning success

Object detection, Redmon et al., 2016

Increase in data size and computational resources

  • Massive use of GPU
  • Distributed computing

General development of Machine Learning (ML) frameworks

  • Facebook, Google, etc.
  • Various API : Python, C, Java, etc.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 5 / 55

slide-6
SLIDE 6

Some notions of machine learning

Some vocabulary

Deep Learning

Cascade of multiple neural layers

Machine Learning

Learning without being explicitly programmed

Artificial Intelligence

Computer systems performing ”intelligent” tasks

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 6 / 55

slide-7
SLIDE 7

Important notions in Machine Learning

Different kinds of learning

Supervised learning

  • Regression
  • Classification

Credit S. Carrazza

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 7 / 55

slide-8
SLIDE 8

Important notions in Machine Learning

Different kinds of learning

Unsupervised learning

  • Clustering
  • Dimension Reduction

Credit S. Carrazza

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 8 / 55

slide-9
SLIDE 9

Important notions in Machine Learning

Different kinds of learning

Also reinforcement learning

Credit Deepmind.com

Semi-supervised learning

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 9 / 55

slide-10
SLIDE 10

Important notions in Machine Learning

Different kinds of learning

Also reinforcement learning

Credit Deepmind.com

Semi-supervised learning

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 10 / 55

slide-11
SLIDE 11

Important notions in Machine Learning

A lot of algorithm

Credit S. Carrazza

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 11 / 55

slide-12
SLIDE 12

Supervised learning : inferring from data

slide-13
SLIDE 13

Supervised learning

Notations

Learning/training sample {(x1, y1), · · · , (xN, yN)} x ∈ Rd and y ∈ Y Regression Y ⊂ R and classification Y ⊂ Z Vocabulary

  • Estimate : link between input x and output y
  • Predict : associating a value y to unobserved input x

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 13 / 55

slide-14
SLIDE 14

Supervised learning

No free lunch theorem

Assumption : The target function is chosen from a uniform distribution of all possible functions Theorem no algorithm performs better than random choice (Wolpert) No universally best method Scientific choice : Adaptation of the method to the data

Credit towardsdatascience.com, Catherine Huang

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 14 / 55

slide-15
SLIDE 15

Supervised learning

General learning process

Credit Y. Abu-Mostafa

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 15 / 55

slide-16
SLIDE 16

Supervised learning

General learning process

Preparing the data

  • Cleaning, scaling, etc.
  • Partitioning the data : learning, validation, test

Learning : determining the parameters

  • Very common form to minimize : J(w) = 1

n

N

i=1 Ji(w)

  • Gradient descent, simulated annealing, BFGS, etc.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 16 / 55

slide-17
SLIDE 17

Supervised learning Regression

Linear regression

Very simple model : ˆ y(x; w) = w Tx Can add complexity with basis function : ˆ y(x; w) = w TΦ(x)

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 17 / 55

slide-18
SLIDE 18

Supervised learning Regression

Linear regression : error of the model

How to quantify the error of the model ?

1 2 3 X

  • 1

1 2 3 Y Bad (w0 = 2, w1 = −0.6) Model Samples Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 18 / 55

slide-19
SLIDE 19

Supervised learning Regression

Linear regression : error of the model

How to quantify the error of the model ?

1 2 3 X

  • 1

1 2 3 Y OK (w0 = 0, w1 = 0.7) Model Samples Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 18 / 55

slide-20
SLIDE 20

Supervised learning Regression

Linear regression : error of the model

How to quantify the error of the model ?

1 2 3 X

  • 1

1 2 3 Y OK (w0 = 0, w1 = 0.7) Error Model Samples Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 18 / 55

slide-21
SLIDE 21

Supervised learning Regression

Linear regression : error of the model

How to quantify the error of the model ?

1 2 3 X

  • 1

1 2 3 Y OK (w0 = 0, w1 = 0.7) Error Model Samples

Cost function : Sum of the quadratic errors on the training set J(w) = 1 2N

N

  • i=1

[yi − ˆ y(xi; w)]2

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 18 / 55

slide-22
SLIDE 22

Supervised learning Regression

Linear regression : error of the model

How to quantify the error of the model ?

1 2 3 X

  • 1

1 2 3 Y Good (w0 = −1.0, w1 = 1.25) Model Samples

Cost function : Sum of the quadratic errors on the training set w ∗ = arg min

w

J(w)

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 18 / 55

slide-23
SLIDE 23

Supervised learning Regression

Overfitting

Polynomial of degree m : ˆ y(x) = w TΦ(x) = w0 + wT

1 x + ... + wmxm

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 19 / 55

slide-24
SLIDE 24

Supervised learning Regression

Overfitting

Polynomial of degree m : ˆ y(x) = w TΦ(x) = w0 + wT

1 x + ... + wmxm

1 2 3 X

  • 1

1 2 3 4 5 Y Degree 1 Cost function = 5.18e-01 Model Samples

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 19 / 55

slide-25
SLIDE 25

Supervised learning Regression

Overfitting

Polynomial of degree m : ˆ y(x) = w TΦ(x) = w0 + wT

1 x + ... + wmxm

1 2 3 X

  • 1

1 2 3 4 5 Y Degree 4 Cost function = 1.11e-02 Model Samples

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 19 / 55

slide-26
SLIDE 26

Supervised learning Regression

Overfitting

Polynomial of degree m : ˆ y(x) = w TΦ(x) = w0 + wT

1 x + ... + wmxm

1 2 3 X

  • 1

1 2 3 4 5 Y Degree 15 Cost function = 4.99e-03 Model Samples

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 19 / 55

slide-27
SLIDE 27

Supervised learning Regression

Overfitting

Polynomial of degree m : ˆ y(x) = w TΦ(x) = w0 + wT

1 x + ... + wmxm

1 2 3 X

  • 1

1 2 3 4 5 Y Degree 15 Cost function = 4.99e-03 Model Samples

Overfitting : the model becomes too specialized

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 19 / 55

slide-28
SLIDE 28

Supervised learning Regression

Model selection : regularization

Limiting the complexity : J(w) = 1 2N

N

  • i=1

[yi − ˆ y(xi; w)]2 + λΩ(w) Lasso : Ω(w) = w1 and Ridge regression : Ω(w) = w2

Two different constrained optimizations. Mehta et al.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 20 / 55

slide-29
SLIDE 29

Supervised learning Regression

Regularization

Each color corresponds to a different weight. Mehta et al.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 21 / 55

slide-30
SLIDE 30

Supervised learning Regression

Model complexity : biais-variance trade-off

Generalization error = systematic error + sensitivity of prediction

Credit M. Kagan

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 22 / 55

slide-31
SLIDE 31

Supervised learning Regression

Ensemble learning

Various weak learner (left) and average model (right). Credit Bishop.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 23 / 55

slide-32
SLIDE 32

Supervised learning Regression

Generalization in practice

Split your data

  • Training : determining the parameters
  • Validation : check performance and tune hyperparameters
  • Test : final evaluation

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 24 / 55

slide-33
SLIDE 33

Supervised learning Regression

Generalization in practice

Credit M. Kagan

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 25 / 55

slide-34
SLIDE 34

Supervised learning Regression

Other models

Gaussian Process Support Vector Machine Neural Networks Random forest, boosting

Support Vector Machine

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 26 / 55

slide-35
SLIDE 35

Supervised learning Regression

Applications : material science

Predicting properties (bulk, band gap, melting T) from fingerprints

Ramprasad et al., Computational Materials, 2017

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 27 / 55

slide-36
SLIDE 36

Supervised learning Regression

Applications : Labquake prediction

Machine learning predicting earthquake. Credit Rouet-Leduc.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 28 / 55

slide-37
SLIDE 37

Supervised learning Regression

Applications : Labquake prediction

Specific signals detected by machine learning. Credit Rouet-Leduc.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 29 / 55

slide-38
SLIDE 38

Supervised learning Regression

Applications : multi-fidelity approach

Many information sources : historical data, simplified model, complex model, experimental data, ...

Stochastic Burgers equation, Perdikaris et al.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 30 / 55

slide-39
SLIDE 39

Supervised learning Classification

Classification

Goal : separate two classes

  • Input : D = {x1, · · · , xN}, x ∈ R
  • Output : yi ∈ {0, 1}

First idea : Linear decision boundary

  • h(x; w) = w Tx + w0
  • If h(xi; w) > 0, yi = 1 else yi =0
  • Least squares loss fails

Move to a probabilistic paradigm for soft classifiers

Credit M. Kagan

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 31 / 55

slide-40
SLIDE 40

Supervised learning Classification

Logistic regression

Sigmoid function σ(s) =

1 1+e−s

Probability to get the class 1 P(yi = 1|xi, w) = σ(xi, w) = 1 1 + e−wT xi P(yi = 0|xi, w) = 1 − P(yi = 1|xi, w) Cost function : Maximum likelihood estimation

Maximizing the probability of seeing the observed data For binary labels : P(D; w) =

N

  • i=1

[σ(w Txi)]yi[1−σ(w Txi)]1−yi

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 32 / 55

slide-41
SLIDE 41

Supervised learning Classification

Beyond linearity

Mapping to another feature space (Kernel Trick)

Credit Y. Abu-Mostafa Credit Davidson et al.

Neural Network

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 33 / 55

slide-42
SLIDE 42

Supervised learning Classification

Applications

Identification of model failures in fluid simulations. Ling et al.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 34 / 55

slide-43
SLIDE 43

Supervised learning Classification

Application in physics

Classification of actives solar regions

Various magnetic parameters and active regions. Classification with SVM. Bobra et al.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 35 / 55

slide-44
SLIDE 44

Supervised learning Classification

Application in physics

Classification of actives solar regions

Distribution of active regions. Classification with random forest. Liu et al.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 35 / 55

slide-45
SLIDE 45

Unsupervised learning : exploring the data

slide-46
SLIDE 46

Unsupervised learning

Data format

There is no label We have N samples of dimension d D = {x1, · · · , xN}, x ∈ Rd D = D1 ∪ ... ∪ DK

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 37 / 55

slide-47
SLIDE 47

Unsupervised learning Clustering

Clustering : K-means

Idea : Minimizing the distortion J between clusters J =

N

  • i=1

K

  • k=1

rik xi − µk2 , rik =    1, if k = arg min

k′∈{1,··· ,K}

xi − µk′2 0,

  • therwise

. Can be seen as an intertia where µk is the center of mass Two steps Lloyd’s algorithm

  • Assignment step (rik computation)
  • Update step (µk computation)

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 38 / 55

slide-48
SLIDE 48

Unsupervised learning Clustering

K-Means

b (1) b (2)

(a) Initialization.

b (1) b (2)

(b) First iteration.

b (1) b (2)

(c) Second iteration.

b (1) b (2)

(d) Third iteration. Figure – K-means illustration.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 39 / 55

slide-49
SLIDE 49

Unsupervised learning Clustering

K-Means

Advantages

  • Simple
  • Easy to interpret
  • Lloyd’s algorithm usually linear in complexity

Drawbacks

  • No optimal guarantee (stochastic)
  • Sensitive to outliers, noise, and data order
  • Euclidean distance : fixed cluster shape
  • Specify K-values

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 40 / 55

slide-50
SLIDE 50

Unsupervised learning Clustering

Various algorithms

Credit scikit-learn.org

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 41 / 55

slide-51
SLIDE 51

Unsupervised learning Clustering

Application in reconnection

Identification of specific ions distributions.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 42 / 55

slide-52
SLIDE 52

Unsupervised learning Clustering

Application in turbulence

Clustering of various turbulent structures. Bermejo-Moreno et al.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 43 / 55

slide-53
SLIDE 53

Unsupervised learning Dimension reduction

Principal Component Analysis

Karhunen-Loeve transform, Proper Orthogonal Decomposition, etc. Finding directions that explain most variation of data (Rp ⊂ Rd) Idea : Orthogonal transformation, first component has highest variance First component w1 = arg maxw=1

  • i(xi.w)2

PCA transformation. Source : techannouncer.com/

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 44 / 55

slide-54
SLIDE 54

Unsupervised learning Dimension reduction

Dimensionality reduction

Low dimensionally visualization cancer cells. From 10 features to 2 principal

  • components. Credit Kaggle

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 45 / 55

slide-55
SLIDE 55

Unsupervised learning Dimension reduction

Data compression

Image reconstruction with PCA. Nobach et al.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 46 / 55

slide-56
SLIDE 56

Unsupervised learning Dimension reduction

Reduced-order model

Governing equations are projected into PCA basis

Finite element model F16

  • aircraft. Lieu et al.

Aero-elastic evolution of the lift. Lieu et al.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 47 / 55

slide-57
SLIDE 57

Coupling physics with data

slide-58
SLIDE 58

Add physic constraints

Toward Theory-guided Data Science

Removing physically inconsistent solutions, Karpatne et al.

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 49 / 55

slide-59
SLIDE 59

Add physic constraints

Data-driven models

ML-enhanced simulations : computation of flux, specific components

  • f governing equations, etc.

Infer parameters of a model : can add uncertainties

  • How quantifying the information content in the data ?
  • Consistency between data and models ?
  • The right balance between data and models ?

Ideal goal for credible and useful models : (Duraisamy et al)

1

Advances in ML

2

Combines the data-driven with the information content

3

Add physical and mathematical constraints

4

Acknowledges the assumptions

5

Rigorously quantifies uncertainties

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 50 / 55

slide-60
SLIDE 60

Conclusion

slide-61
SLIDE 61

Conclusion

Conclusion

Need of cross-disciplinary collaboration Common complain : black boxes with no physical understanding Impressive results from Deep Learning Algorithms Since 2016-2017 : explosion of DL applications in physics

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 52 / 55

slide-62
SLIDE 62

Conclusion

Some Materials

Books

Pattern Recognition and Machine Learning (Bishop) Elements of statistical Learning (Hastie)

Onlines

Coursera (Andre Ng’s MOOC), CS229 Stanford, scikit-learn

More

Deep Learning (Ian Goodfellow et al.)

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 53 / 55

slide-63
SLIDE 63

Conclusion

References

Wolpert, D. H. (1996). The lack of a priori distinctions between learning algorithms. Neural computation Mehta, P. et al. (2019). A high-bias, low-variance introduction to machine learning for physicists. Physics Reports. Ramprasad, R. et al. (2017). Machine learning in materials informatics : recent applications and prospects. npj Computational Materials, 3(1), 54. Rouet-Leduc, B. (2017). Machine Learning for Materials Science (Doctoral thesis) Perdikaris, P. et al. (2015). Multi-fidelity modelling via recursive co-kriging and Gaussian-Markov random

  • fields. Proceedings of the Royal Society A : Mathematical, Physical and Engineering Sciences

Davidson, P. et al. (2018). Probabilistic defect analysis of fiber reinforced composites using kriging and support vector machine based surrogates. Composite Structures Karpatne, A. et al. (2017). Theory-guided data science : A new paradigm for scientific discovery from data. IEEE Transactions on Knowledge and Data Engineering Ling, J. and Templeton, J. (2015). Evaluation of machine learning algorithms for prediction of regions of high Reynolds averaged Navier Stokes uncertainty. Physics of Fluids Bermejo-Moreno, I. et al. (2008). On the non-local geometry of turbulence. Journal of Fluid Mechanics, 603, 101-135. Bobra, M. G., & Couvidat, S. (2015). Solar flare prediction using SDO/HMI vector magnetic field data with a machine-learning algorithm. The Astrophysical Journal Liu, C. et al. (2017). Predicting solar flares using SDO/HMI vector magnetic data products and the random forest algorithm. The Astrophysical Journal Lieu, T. et al.(2006). Reduced-order fluid/structure modeling of a complete aircraft configuration. Computer methods in applied mechanics and engineering Nobach et al. (2007). Review of Some Fundamentals of Data Processing, Springer Handbook of Experimental Fluid Mechanics Duraisamy, K. et al. (2019). Turbulence modeling in the age of data. Annual Review of Fluid Mechanics Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 54 / 55

slide-64
SLIDE 64

Conclusion

Generalization in practice : cross validation

Romain Dupuis (CmPA) Machine Learning in Physics May 2, 2019 55 / 55