supervised learning
play

Supervised Learning Part 1 Theory Sven Krippendorf Workshop on Big - PowerPoint PPT Presentation

Supervised Learning Part 1 Theory Sven Krippendorf Workshop on Big Data in String Theory Boston, 01.12.2017 Content Theory Applications: Mathematica Discussion Def: Supervised Learning Supervised learning is the machine


  1. Supervised Learning Part 1 — Theory Sven Krippendorf 
 Workshop on Big Data in String Theory Boston, 01.12.2017

  2. Content • Theory • Applications: Mathematica • Discussion

  3. Def: Supervised Learning Supervised learning is the machine learning task of inferring a function from labelled training data. Workflow: 1. Determine training examples 2. Prepare training set 3. How to represent the input object 4. How to represent the output object 5. Determine your algorithm 6. Run algorithm, adjust/determine parameters 7. Evaluate accuracy

  4. Learning algorithms • Support Vector Machines • Naive Bayes • Linear discriminant analysis • Decision trees • k-nearest neighbour algorithm • Neural networks

  5. Known issues • Bias-variance tradeo ff • Function complexity and amount of training data • Dimensionality of input space • Noise in output values • Heterogeneous data

  6. Examples • Geometric classification • Handwritten number recognition - the harmonic oscillator of ML • Voice recognition (spectral features)

  7. A 1 st problem Classify data into two classes: y Class 1: above line 10 Class 2: below line Input: data points 5 x - 10 - 5 5 10 - 5 - 10

  8. SVM Which line? y 10 5 x - 10 - 5 5 10 - 5 - 10

  9. 
 
 
 
 
 
 
 
 
 SVM • SVM (support vector machine) identify the lines maximally separating the data sets: 
 y 10 5 x - 10 - 5 5 10 w . x i − b ≥ 1 2 | w | - 5 w . x i − b ≤ − 1 - 10 • Useful to be stable against “perturbations”

  10. SVM • How are these lines determined? Minimisation with constraints, dual problem using Lagrange-multipliers. This problem then can be dealt with using quadratic programming algorithms. • These are readily implemented in standard environments (Mathematica, Matlab, Python, etc.) • In higher dimensions: plane, hyperplane

  11. 
 
 
 
 
 
 
 
 
 
 SVM: hard layer vs soft layer • Penalty for outliers, might be a better fit to data 
 y 10 5 x - 10 - 5 5 10 - 5 - 10

  12. 2nd problem y y 10 10 5 5 x x - 10 - 5 5 10 - 10 - 5 5 10 - 5 - 5 - 10 - 10

  13. SVM: Kernel trick Di ff erent representation of data via kernel map: y 10 y 10 5 5 x - 10 - 5 5 10 x - 10 - 5 5 10 - 5 - 5 - 10 - 10 { x, y } → { x 2 , y } { x, y } → { x 2 , y 2 } 6 100 5 80 4 60 3 2 40 1 20 0 20 40 60 80 20 40 60 80 100 - 1

  14. 
 Linear discriminant analysis • Use information about mean/variance of data set to distinguish classes. 
 ( x − µ 0 ) Σ − 1 0 ( x − µ 0 ) + log | Σ 0 | − ( x − µ 1 ) Σ − 1 1 ( x − µ 1 ) − log | Σ 1 | < threshold • Set threshold to identify class: # 6 500 4 400 2 300 - 6 - 4 - 2 2 4 6 200 - 2 100 - 4 - 6 0 T - 100 - 50 0 50 100

  15. k-nearest neighbour • Classify data according to data point and nearest neighbours. 10 8 8 10 6 6 4 5 4 2 2 - 10 - 5 5 10 - 6 - 4 - 2 2 4 6 - 6 - 4 - 2 2 4 6 - 2 - 2 - 5 - 4 more less noise noise k boundaries boundaries clear less clear

  16. Neural Network d, w, b d, w, b … output layer hidden layer input layer • d (data), w (weight), b (bias). Linear layer: w ij d j + b e d i • Softmax Layer: softmax( d i ) = j e d j P • Loss function, capturing how far the desired output is from true output.

  17. Hand-written number recognition 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 • Handwritten number recognition: • Input 28x28 matrix with entries {0,1} • Simple network, taking every entry as an input and using one layer (w.x+b), a success rate of 89% is achieved. • More sophisticated networks achieve incredible accuracy.

  18. 
 
 
 
 Voice recognition • A voice signal 
 0.5 - 0.5 0 1 2 3 4 5 6 7 • Representing it via wavelet transform (spectrogram) 0.5 0.4 0.3 0.2 0.1 0 0 100000 200000 300000

  19. String theory example: Dimers 1 2 1 � � 6 1 3 2 � � � 3 1 3 � � 2 1 � � 3 2 1 2 1 � � = X 23 Y 31 Z 12 − X 12 Y 31 Z 23 + X 36 Y 62 Z 23 − X 23 Y 62 Z 36 W dP 1 − X 36 Y 23 Z 12 Φ 61 + X 12 Y 23 Z 36 Φ 61 bounds on number of families (1002.1790):

  20. end of part 1

  21. Supervised Learning Part 2 — Applications Sven Krippendorf 
 Workshop on Big Data in String Theory Boston, 01.12.2017

  22. Disclaimer • There are many tools you can use… • I just talk about one at a very basic level: Mathematica 
 … simply because it’s quick for me and I assume people are familiar with it.

  23. Mathematica

  24. Mathematica • You need version 11.1.1 or later…

  25. Example 1: basic SVN • Let’s switch to notebook01.nb

  26. Example 2: kernel trick • let’s look at notebook02.nb

  27. Example 3: kernel trick • let’s look at notebook03.nb

  28. Thank you. Let’s discuss about applications…

  29. Supervised Learning Part 3 — Discussion Sven Krippendorf 
 Workshop on Big Data in String Theory Boston, 01.12.2017

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend