Pattern Recognition Bertrand Thirion and John Ashburner Bertrand - PowerPoint PPT Presentation

Introduction Generalization Overview of the main methods Resources Pattern Recognition Bertrand Thirion and John Ashburner Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Some key concepts supervised learning : The data comes with additional attributes that we want to predict = ⇒ classification and regression. unsupervised learning : No target values. Discover groups of similar examples within the data (clustering). Determine the distribution of data within the input space (density estimation). Project the data down to two or three dimensions for visualization. Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources General supervised learning setting We have a training dataset of n observations, each consisting of an input x i and a target y i . Each input, x i , consists of a vector of p features. D = { ( x i , y i ) | i = 1 , .., n } The aim is to predict the target for a new input x ∗ . Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Classification Classification −1 −2 Targets ( y ) are categorical −3 Feature 2 labels. −4 Train with D and use result to make best guess −5 of y ∗ given x ∗ . −6 −7 0 2 4 Feature 1 Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Probabilistic classification Probabilistic classification −1 −2 −3 Targets ( y ) are categorical Feature 2 labels. −4 Train with D and compute −5 P ( y ∗ = k | x ∗ , D ). −6 −7 0 2 4 Feature 1 Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Regression 63 55 70 Targets ( y ) are continuous Feature 2 60 27 real variables. 31 50 35 Train with D and compute 40 31 p ( y ∗ | x ∗ , D ). 58 30 14 23 20 14 10 0 Feature 1 Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Many other settings Multi-class classification when there are more than two possible categories. Ordinal regression for classification when there is some ordering of the categories. Chu, Wei, and Zoubin Ghahramani. “Gaussian processes for ordinal regression.” In Journal of Machine Learning Research, pp. 1019-1041. 2005. Multi-task learning when there are multiple targets to predict, which may be related. etc Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Multi-Class classification Multinomial Logistic regression Theoretically optimal. Expensive optimization. One-versus-all classification [SVMs] Among several hyperplane, choose the one with maximal margin. = ⇒ recommended One-versus-one classification Vote across each pair of class. Expensive, not optimal. Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Curse of dimensionality Large p , small n . Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Nearest-neighbour classification 2 Not nice 1 smooth separations. Feature 2 Lots of sharp 0 corners. May be −1 improved with K-nearest −2 neighbours . −3 −2 −1 0 1 2 Feature 1 Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Behaviour changes in high-dimensions Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Definitions Generalization Classification and Regression Overview of the main methods Curse of Dimensionality Resources Behaviour changes in high-dimensions 1 0.9 0.8 Circle area = π r 2 0.7 Volume of hyper−sphere (r=1/2) 0.6 Sphere volume = 4/3 π r 3 0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 12 14 16 18 20 Number of dimensions Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Generalization Assessing generalizability Overview of the main methods Accuracy Measures Resources Occam’s razor “Everything should be kept as simple as possible, but no simpler.” — Einstein (allegedly) Complex models (with many estimated parameters) usually explain training data better than simpler models. Simpler models often generalise better to new data than nore complex models. Need to find the model with the optimal bias/variance tradeoff. Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Generalization Assessing generalizability Overview of the main methods Accuracy Measures Resources Bayesian model selection Real Bayesians don’t cross-validate (except when they need to). P ( M |D ) = p ( D| M ) P ( M ) p ( D ) The Bayes factor allows the plausibility of two models ( M 1 and M 2 ) to be compared: � θ M 1 p ( D| θ M 1 , M 1 ) p ( θ M 1 | M 1 ) d θ M 1 K = p ( D| M 1 ) p ( D| M 2 ) = � θ M 2 p ( D| θ M 2 , M 2 ) p ( θ M 2 | M 2 ) d θ M 2 This is usually too costly in practice, so approximations are used. Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Generalization Assessing generalizability Overview of the main methods Accuracy Measures Resources Model selection Some approximations/alternatives to the Bayesian approach: Laplace approximations : find the MAP/ML solution and use a Gaussian approximation to the parameter uncertainty. Minimum Message Length (MML): an information theoretic approach. Minimum Description Length (MDL): an information theoretic approach based on how well the model compresses the data. Akaike Information Criterion (AIC): − 2 log p ( D| θ ) + 2 k , where k is the number of estimated parameters. Bayesian Information Criterion (BIC): − 2 log p ( D| θ ) + k log q , where q is the number of observations. Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Generalization Assessing generalizability Overview of the main methods Accuracy Measures Resources Model selection by nested cross-validation Inner cross-validation loop used to evaluate model’s performance on a pre-defined grid of parameters and retain the best one. Safe, but costly. Supported by some libraries (e.g. scikit-learn). Some estimators have path model, hence allow faster evaluation (e.g. LASSO). Randomized techniques also exist, sometimes more efficient. Caveat: Inner cross-validation loop � = outer cross-validation loop for parameter evaluation. Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Generalization Assessing generalizability Overview of the main methods Accuracy Measures Resources Accuracy measures for regression Root-mean squared error for point predictions. Correlation coefficient for point predictions. Log predictive probability can be used for probabilistic predictions. Expected loss/risk for point predictions for decision making. Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Generalization Assessing generalizability Overview of the main methods Accuracy Measures Resources Accuracy measures for binary classification Wikipedia contributors, “Sensitivity and specificity,” Wikipedia, The Free Encyclopedia, http: //en.wikipedia.org/w/index. php?title=Sensitivity_and_ specificity&oldid=655245669 (accessed April 9, 2015). Bertrand Thirion and John Ashburner Pattern Recognition

Introduction Generalization Assessing generalizability Overview of the main methods Accuracy Measures Resources Accuracy measures from ROC curve ROC Curve (AUC=0.9769) 1 The Receiver operating characteristic (ROC) curve is a 0.8 plot of true-positive rate (sensitivity) versus false-positive 0.6 Sensitivity rate (1-specificity) over the full range of possible thresholds. 0.4 The area under the curve 0.2 (AUC) is the integral under the ROC curve. 0 0.2 0.4 0.6 0.8 1 1−Specificity Bertrand Thirion and John Ashburner Pattern Recognition

Pattern Recognition Bertrand Thirion and John Ashburner Bertrand - PowerPoint PPT Presentation

Introduction Generalization Overview of the main methods Resources Pattern Recognition Bertrand Thirion and John Ashburner Bertrand Thirion and John Ashburner Pattern Recognition Introduction Definitions Generalization Classification and

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

Pattern Recognition CSE 802 Michigan State University Spring 2017 Lecture 1, January 9, 2017

Applications of Pattern Recognition in Computational Biology Pattern Recognition Course

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing

Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

Scope Constrained Frequent Pattern Mining: Constrained Frequent Pattern Mining: A A

A common pattern: map Another common pattern: filter Pattern: take a list and produce a new list,

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Pattern Recognition Theory Lecture 12 : Correlation Filters Pattern Matching a How to match

PCIA Workshop 2 R.1706026 PCIA Perspective Energy Users Forum Carolyn Kehrein for EUF

AF AFRIC RICA A IN INVES VESTMENT TMENT FORUM ORUM Africas Investment Market Place

Subject/object asymmetry in questions with quantifiers: Syntax or discourse? Asya Achimova 1 ,

Financing Innovative Enterprises Jos Palacn Economic Cooperation and Integration Division

Why Initial Conditions? Many calculations of collapse Initial Conditions for Star Formation

Analysis of the impact of model nonlinearities in inverse problem solving Tomislava Vukicevic

Kick-off Meeting June 25, 2020 1 Meeting Logistics Summary - Roll Call - Members will unmute

Overview of SAB Review Materials Overview of SAB Review Materials Science Advisory Board