Pre-seminar What is ML? Geneva University , Jan, 2020 What is - - PowerPoint PPT Presentation

pre seminar
SMART_READER_LITE
LIVE PREVIEW

Pre-seminar What is ML? Geneva University , Jan, 2020 What is - - PowerPoint PPT Presentation

Pre-seminar What is ML? Geneva University , Jan, 2020 What is Machine learning? What is Dimensionality reduction? What is Clustering Analysis? What am I doing with ML? What is Machine learning? Classification vs. regression


slide-1
SLIDE 1

Geneva University , Jan, 2020

Pre-seminar

What is ML?

slide-2
SLIDE 2
  • What is Machine learning?
  • What is Dimensionality reduction?
  • What is Clustering Analysis?
  • What am I doing with ML?
slide-3
SLIDE 3

What is Machine learning?

slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

Classification vs. regression

slide-8
SLIDE 8

Supervised vs Unsupervised

slide-9
SLIDE 9

What is Dimensionality reduction?

slide-10
SLIDE 10
slide-11
SLIDE 11

What is Clustering Analysis?

slide-12
SLIDE 12

What am I doing with ML?

slide-13
SLIDE 13

RFI mitigation

slide-14
SLIDE 14

Inpaintjng CMB

slide-15
SLIDE 15

Inpaintjng CMB

slide-16
SLIDE 16

Inpaintjng CMB

slide-17
SLIDE 17

Medical physics

slide-18
SLIDE 18

Mining cosmic datasets

+some cool DS stufg

Alireza Vafaei Sadr

IPM, Tehran

Geneva University , Jan, 2020

slide-19
SLIDE 19

Outline

 Cosmology and BIG data  A Quick Review of Applicatjons  Anomaly detectjon  DRAMA  Future directjons

slide-20
SLIDE 20
slide-21
SLIDE 21
  • E. Siegel, with images derived from ESA/Planck and the DoE/NASA/ NSF interagency task force on CMB research. From his

book, Beyond The Galaxy.

Cosmology/Astrophysics

slide-22
SLIDE 22

Cosmology and Big data

slide-23
SLIDE 23
slide-24
SLIDE 24

It is getting hotter!

Number of physics submitted manuscripts that include “machine learning” in their abstracts.

slide-25
SLIDE 25

A quick review on what people have done

slide-26
SLIDE 26

Classifjcatjon

htup://cdn.spacetelescope.org/archives/images/screen/heic9902o.jpg

slide-27
SLIDE 27

htups://www.galaxyzoo.org/

Galaxy zoo challenge

slide-28
SLIDE 28
slide-29
SLIDE 29

Detectjon

slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32

Data cleansing

slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35
slide-36
SLIDE 36

Simulatjon

slide-37
SLIDE 37

htups://towardsdatascience.com/do-gans-really-model-the-true-data-distributjon-or-are-they-just-cleverly-fooling-us-d08df69f25eb

slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41

htups://arxiv.org/pdf/1905.08233v1.pdf

slide-42
SLIDE 42

The light from quasar pairs reach Earth, although some were absorbed by the gas in the cosmic web, Springel et al. (2005) (cosmic web) / J. Neidel, MPIA

Anomaly detectjon

slide-43
SLIDE 43
slide-44
SLIDE 44

10

slide-45
SLIDE 45

Norris, R. P. (2017). Discovering the unexpected in astronomical survey data. Publications of the Astronomical Society of Australia, 34.

slide-46
SLIDE 46
slide-47
SLIDE 47
slide-48
SLIDE 48

 Source detection

(Mightee, SKAch-I)

 Anomaly detection

WTF, PLAsTiCC

 RFI simulatjon/mitjgatjon  Extended source simulation  Observation strategy

Current projects in SKA (MeerKAT) DS team:

slide-49
SLIDE 49
  • A. Vafaei Sadr, B. Bassetu, M. Kunz

ISCMI-2019

slide-50
SLIDE 50

No Free Lunch Theorems

 Any two optjmizatjon algorithms are equivalent

when their performance is averaged across all possible problems

 No anomaly detectjon algorithm works for all

anomalies

 No anomaly algorithm is “best” on average. Difgerent algorithms work for difgerent anomalies So lets consider families of anomaly algorithms

slide-51
SLIDE 51

Dimensionality Reductjon Anomaly Meta Algorithm

htups://github.com/vafaei-ar/drama

slide-52
SLIDE 52

Dimensionality reductjon

slide-53
SLIDE 53

Clustering

slide-54
SLIDE 54

As an example: MNIST

slide-55
SLIDE 55

DRAMA (Dimensionality Reductjon Anomaly Meta-Algorithm):

slide-56
SLIDE 56
slide-57
SLIDE 57

Dimensionality Reduction T echnique

  • Autoencoder
  • Variational autoencoder
  • principal component analysis
  • independent component analysis
  • non negative matrix factorization

Also newly added:

  • Convolutjonal (V)AE (1D)
  • Convolutjonal (V)AE (2D)
  • Convolutjonal UMAP
slide-58
SLIDE 58

Clustering

slide-59
SLIDE 59

Clustering

slide-60
SLIDE 60

Distance metrics

slide-61
SLIDE 61

Comparison with LOF and i-forest:

Real Simulated

slide-62
SLIDE 62

Benchmark metrics:

slide-63
SLIDE 63
slide-64
SLIDE 64

AE, VAE, PCA, ICA, NMF, …? L1, L2, L4, Chebyshev, Canbera, …?

Semi-supervised or actjve learning DRAMA vs. LOF vs. iforest

slide-65
SLIDE 65
slide-66
SLIDE 66

Local anomaly in low dimension nf=100

slide-67
SLIDE 67

New class anomaly in low dimension nf=100

slide-68
SLIDE 68

Local anomaly in high dimension nf=3000

slide-69
SLIDE 69

New class anomaly in high dimension nf=3000

slide-70
SLIDE 70

Averaged on real data

slide-71
SLIDE 71

Widefield ouTlier Finder (WTF)

Includes 337 boring objects and 17 interesting AUC: 87 MCC: 26 RWS: 31 boring interesting

slide-72
SLIDE 72

Future directjons

  • 2D convolutjons
  • PLAsTiCC (Kaggle competjtjons)
  • Deep learning based clustering
  • Graphical user interface
  • Actjve learning mode
  • Detect and classify
  • ...
slide-73
SLIDE 73

Thanks for your attention.

You can fjnd DRAMA here

htups://github.com/vafaei-ar/drama

slide-74
SLIDE 74