Supervised Learning: Linear Methods (1/2) Applied Multivariate - PowerPoint PPT Presentation

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics – Spring 2013

Overview  Review: Conditional Probability  LDA / QDA: Theory  Fisher’s Discriminant Analysis  LDA: Example  Quality control: Testset and Crossvalidation  Case study: Text recognition 1

Conditional Probability Sample space T: Med. Test positive T (Marginal) Probability: P(T), P(C) C: Patient has cancer C New sample space: New sample space: People with pos. test Conditional Probability: People with cancer P(T|C), P(C|T) P(C|T) P(T|C) small large Bayes Theorem: P ( C j T ) = P ( T j C ) P ( C ) posterior P ( T ) prior Class conditional probability 2

One approach to supervised learning P ( C j X ) = P ( C ) P ( X j C ) » P ( C ) P ( X j C ) P ( X ) Prior / prevalence: Assume: Find some estimate Fraction of samples X j C » N ( ¹ c ; § c ) in that class Bayes rule: Choose class where P(C|X) is maximal (rule is “optimal” if all types of error are equally costly) Special case: Two classes (0/1) - choose c=1 if P(C=1|X) > 0.5 or - choose c=1 if posterior odds P(C=1|X)/P(C=0|X) > 1 In Practice: Estimate 𝑄 𝐷 , 𝜈 𝐷 , Σ 𝐷 3

¡ ¢ QDA: Doing the math… 2 ( x ¡ ¹ c ) T § ¡ 1 1 ¡ 1 p (2 ¼ ) d j § C j exp C ( x ¡ ¹ c ) 𝑄 𝐷 𝑌 ~ 𝑄 𝐷 𝑄(𝑌|𝐷)  Use the fact: max 𝑄 𝐷 𝑌 max(log 𝑄 𝐷 𝑌 )  𝜀 𝑑 𝑦 = log 𝑄 𝐷 𝑌 = log 𝑄 𝐷 + log 𝑄 𝑌 𝐷 =  −1 𝑦 − 𝜈 𝐷 + 𝑑 1 1 2 𝑦 − 𝜈 𝐷 𝑈 Σ 𝐷 = log 𝑄 𝐷 − 2 log Σ 𝐷 − Prior Additional Sq. Mahalanobis distance term Choose class where 𝜀 𝑑 𝑦 is maximal   Special case: Two classes Decision boundary: Values of x where 𝜀 0 𝑦 = 𝜀 1 (𝑦) is quadratic in x  Quadratic Discriminant Analysis (QDA) 4

Simplification  Assume same covariance matrix in all classes, i.e. 𝑌|𝐷 ~ 𝑂(𝜈 𝑑 , Σ) Fix for all classes 2 𝑦 − 𝜈 𝐷 𝑈 Σ −1 𝑦 − 𝜈 𝐷 + 𝑑 = 1 1 𝜀 𝑑 𝑦 = log 𝑄 𝐷 − 2 log Σ −  2 𝑦 − 𝜈 𝐷 𝑈 Σ −1 𝑦 − 𝜈 𝐷 + 𝑒 = Prior 1 Sq. Mahalanobis distance = log 𝑄 𝐷 − 1 + 𝑦 𝑈 Σ −1 𝜈 𝐷 − 𝑈 Σ −1 𝜈 𝐷 ) (= log 𝑄 𝐷 2 𝜈 𝐷 Decision boundary is linear in x  Linear Discriminant Analysis (LDA) 1 Classify to which class (assume equal prior)? • Physical distance in space is equal 0 • Classify to class 0, since Mahal. Dist. is smaller 5

LDA vs. QDA + Only few parameters to - Many parameters to estimate; less accurate estimate; accurate estimates + More flexible - Inflexible (quadratic decision boundary) (linear decision boundary) 6

Fisher’s Discriminant Analysis: Idea Find direction(s) in which groups are separated best • Class Y, predictors 𝑌 = 𝑌 1 , … , 𝑌 𝑒 1. Principal Component 𝑉 = 𝑥 𝑈 𝑌 1. Linear Discriminant • Find w so that groups are separated = along U best 1. Canonical Variable • Measure of separation: Rayleigh coefficient 𝐾 𝑥 = 𝐸(𝑉) 𝑊𝑏𝑠(𝑉) 2 where 𝐸 𝑉 = 𝐹 𝑉 𝑍 = 0 − 𝐹 𝑉 𝑍 = 1 • 𝐹 𝑌 𝑍 = 𝑘 = 𝜈 𝑘 , 𝑊𝑏𝑠 𝑌 𝑍 = 𝑘 = Σ 𝐹 𝑉 𝑍 = 𝑘 = 𝑥 𝑈 𝜈 𝑘 , 𝑊 𝑉 = 𝑥 𝑈 Σw • Concept extendable to many groups D(U) D(U) 𝐾 𝑥 small 𝐾 𝑥 large Var(U) Var(U) 7

LDA and Linear Discriminants  - Direction with largest J(w): 1. Linear Discriminant (LD 1) - orthogonal to LD1, again largest J(w): LD 2 - etc.  At most: min(Nmb. dimensions, Nmb. Groups -1) LD’s e.g.: 3 groups in 10 dimensions – need 2 LD’s  Computed using Eigenvalue Decomposition or Singular Value Decomposition Proportion of trace: Captured % of variance between group means for each LD  R: Function «lda» in package MASS does LDA and computes linear discriminants (also «qda» available) 8

Example: Classification of Iris flowers Iris setosa Iris versicolor Classify according to sepal/petal length/width Iris virginica 9

Quality of classification  Use training data also as test data: Overfitting Too optimistic for error on new data  Separate test data Test Training  Cross validation (CV; e.g. “leave -one-out cross validation): Every row is the test case once, the rest in the training data 10

Measures for prediction error  Confusion matrix (e.g. 100 samples) Truth = 0 Truth = 1 Truth = 2 Estimate = 0 23 7 6 Estimate = 1 3 27 4 Estimate = 2 3 1 26  Error rate: 1 – sum(diagonal entries) / (number of samples) = = 1 – 76/100 = 0.24  We expect that our classifier predicts 24% of new observations incorrectly (this is just a rough estimate) 11

Example: Digit recognition  7129 hand-written digits Sample of digits  Each (centered) digit was put in a 16*16 grid  Measure grey value in each part of the grid, i.e. 256 grey values Example with 8*8 grid 12

Concepts to know  Idea of LDA / QDA  Meaning of Linear Discriminants  Cross Validation  Confusion matrix, error rate 13

R functions to know  lda 14

Supervised Learning: Linear Methods (1/2) Applied Multivariate - PowerPoint PPT Presentation

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2013 Overview Review: Conditional Probability LDA / QDA: Theory Fishers Discriminant Analysis LDA: Example Quality control: Testset and

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Machine Learning for NLP Supervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Web Mining and Recommender Systems Supervised learning Regression Learning Goals Introduce

Introduction to Scikit-Learn: Machine Learning with Introduction to Scikit-Learn: Machine Learning

Supervised Learning Prof. Kuan-Ting Lai 2020/4/9 Machine Learning Supervised Unsupervised

Linear regression Linear regression is a simple approach to supervised learning. It assumes

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL,

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Towards the ultimate precision limits in parameter estimation: An introduction to quantum

Lecture 19 Spatial GLM + Point Reference Spatial Data Colin Rundel 11/09/2017 1 Spatial GLM

EVALUATION (1-10) IHCC 2019 Set a circle around your choice 1= no relevance/bad presentation at

Fisher vector image representation Jakob Verbeek January 13, 2012 Course website:

The Many Flavors of Penalized Linear Discriminant Analysis Daniela M. Witten Assistant Professor

The owning house data Can we separate the points with a line? 200 Income

Homework Homework Lecture 7: Linear Classification Methods Final projects? Groups Topics

Linear classification Course of Machine Learning Master Degree in Computer Science University of

Sambuz

Useful Links

Newsletter

Mail Us