Supervised Learning Part 1 Theory Sven Krippendorf Workshop on Big - PowerPoint PPT Presentation

Supervised Learning Part 1 — Theory Sven Krippendorf   Workshop on Big Data in String Theory Boston, 01.12.2017

Content • Theory • Applications: Mathematica • Discussion

Def: Supervised Learning Supervised learning is the machine learning task of inferring a function from labelled training data. Workflow: 1. Determine training examples 2. Prepare training set 3. How to represent the input object 4. How to represent the output object 5. Determine your algorithm 6. Run algorithm, adjust/determine parameters 7. Evaluate accuracy

Learning algorithms • Support Vector Machines • Naive Bayes • Linear discriminant analysis • Decision trees • k-nearest neighbour algorithm • Neural networks

Known issues • Bias-variance tradeo ff • Function complexity and amount of training data • Dimensionality of input space • Noise in output values • Heterogeneous data

Examples • Geometric classification • Handwritten number recognition - the harmonic oscillator of ML • Voice recognition (spectral features)

A 1 st problem Classify data into two classes: y Class 1: above line 10 Class 2: below line Input: data points 5 x - 10 - 5 5 10 - 5 - 10

SVM Which line? y 10 5 x - 10 - 5 5 10 - 5 - 10

                  SVM • SVM (support vector machine) identify the lines maximally separating the data sets:   y 10 5 x - 10 - 5 5 10 w . x i − b ≥ 1 2 | w | - 5 w . x i − b ≤ − 1 - 10 • Useful to be stable against “perturbations”

SVM • How are these lines determined? Minimisation with constraints, dual problem using Lagrange-multipliers. This problem then can be dealt with using quadratic programming algorithms. • These are readily implemented in standard environments (Mathematica, Matlab, Python, etc.) • In higher dimensions: plane, hyperplane

                    SVM: hard layer vs soft layer • Penalty for outliers, might be a better fit to data   y 10 5 x - 10 - 5 5 10 - 5 - 10

2nd problem y y 10 10 5 5 x x - 10 - 5 5 10 - 10 - 5 5 10 - 5 - 5 - 10 - 10

SVM: Kernel trick Di ff erent representation of data via kernel map: y 10 y 10 5 5 x - 10 - 5 5 10 x - 10 - 5 5 10 - 5 - 5 - 10 - 10 { x, y } → { x 2 , y } { x, y } → { x 2 , y 2 } 6 100 5 80 4 60 3 2 40 1 20 0 20 40 60 80 20 40 60 80 100 - 1

  Linear discriminant analysis • Use information about mean/variance of data set to distinguish classes.   ( x − µ 0 ) Σ − 1 0 ( x − µ 0 ) + log | Σ 0 | − ( x − µ 1 ) Σ − 1 1 ( x − µ 1 ) − log | Σ 1 | < threshold • Set threshold to identify class: # 6 500 4 400 2 300 - 6 - 4 - 2 2 4 6 200 - 2 100 - 4 - 6 0 T - 100 - 50 0 50 100

k-nearest neighbour • Classify data according to data point and nearest neighbours. 10 8 8 10 6 6 4 5 4 2 2 - 10 - 5 5 10 - 6 - 4 - 2 2 4 6 - 6 - 4 - 2 2 4 6 - 2 - 2 - 5 - 4 more less noise noise k boundaries boundaries clear less clear

Neural Network d, w, b d, w, b … output layer hidden layer input layer • d (data), w (weight), b (bias). Linear layer: w ij d j + b e d i • Softmax Layer: softmax( d i ) = j e d j P • Loss function, capturing how far the desired output is from true output.

Hand-written number recognition 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 • Handwritten number recognition: • Input 28x28 matrix with entries {0,1} • Simple network, taking every entry as an input and using one layer (w.x+b), a success rate of 89% is achieved. • More sophisticated networks achieve incredible accuracy.

        Voice recognition • A voice signal   0.5 - 0.5 0 1 2 3 4 5 6 7 • Representing it via wavelet transform (spectrogram) 0.5 0.4 0.3 0.2 0.1 0 0 100000 200000 300000

String theory example: Dimers 1 2 1 � � 6 1 3 2 � � � 3 1 3 � � 2 1 � � 3 2 1 2 1 � � = X 23 Y 31 Z 12 − X 12 Y 31 Z 23 + X 36 Y 62 Z 23 − X 23 Y 62 Z 36 W dP 1 − X 36 Y 23 Z 12 Φ 61 + X 12 Y 23 Z 36 Φ 61 bounds on number of families (1002.1790):

end of part 1

Supervised Learning Part 2 — Applications Sven Krippendorf   Workshop on Big Data in String Theory Boston, 01.12.2017

Disclaimer • There are many tools you can use… • I just talk about one at a very basic level: Mathematica   … simply because it’s quick for me and I assume people are familiar with it.

Mathematica

Mathematica • You need version 11.1.1 or later…

Example 1: basic SVN • Let’s switch to notebook01.nb

Example 2: kernel trick • let’s look at notebook02.nb

Example 3: kernel trick • let’s look at notebook03.nb

Thank you. Let’s discuss about applications…

Supervised Learning Part 3 — Discussion Sven Krippendorf   Workshop on Big Data in String Theory Boston, 01.12.2017

Supervised Learning Part 1 Theory Sven Krippendorf Workshop on Big - PowerPoint PPT Presentation

Supervised Learning Part 1 Theory Sven Krippendorf Workshop on Big Data in String Theory Boston, 01.12.2017 Content Theory Applications: Mathematica Discussion Def: Supervised Learning Supervised learning is the machine

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Learning frameworks Self-supervised learning: (Auto)encoder networks Supervised learning Network

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

Web Mining and Recommender Systems Supervised learning Regression Learning Goals Introduce

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Self-Supervised Feature Learning by Learning to Spot Artifacts Wonbin Kim Self-Supervised

Quantum corrections for Work Olivier Brodier - L.M.P.T., Tours, France in collaboration with

Quantum Criticality in Polar Materials I : New Perspectives on Quantum Criticality from Polar

Out-of-equilibrium field theories coupled to strong external sources Kyoto University, December

International House Tashkent Subject: Physics Department: ES, Course 1 Lesson 10. Simple

Meaning of temperature in different thermostatistical ensembles Peter Hnggi Universitt

An Introduction to BORPH Hayden Kwok-Hay So University of Hong Kong Aug 2, 2008 CASPER

Meeting 22: 9 August 2016 1 Karakia 2 Agenda 10:00am Welcome, karakia, notices, meeting

BAYES AND NEAREST NEIGHBOR BAYES AND NEAREST NEIGHBOR CLASSIFIERS CLASSIFIERS Matthieu R Bloch