PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION

Pattern Recogniton Pattern: Any regularity in data X Pattern Recognition: Discovery of any regularity in data, through computer algorithms and takes actions (Such as classification). Y( X ) Pattern Recognition X

Example Handwritten Digit Recognition

Some Terminologies Supervised Learnings : Inputs with their corresponding outputs are known. • Classification : Predicting the output into a finite number of discrete categories after supervised learning. • Regression : Predicting the output as a continuous variable after supervised learning Unsupervised learning (density estimation) : • Clustering the data into groups

Some Terminologies Training set: A given set of sample input data used to tune the model parameter. Target vector: Represents the desired output for a given inputs. Training Phase: Determining the precise from of y( X ) based on training data. Generalization: The ability to correctly predict new data. Pre-processing: Reduction of dimension of X

Polynomial Curve Fitting

Sum-of-Squares Error Function

0 th Order Polynomial

1 st Order Polynomial

3 rd Order Polynomial

9 th Order Polynomial

Over-fitting Root-Mean-Square (RMS) Error:

Polynomial Coefficients

Data Set Size: 9 th Order Polynomial

Regularization Penalize large coefficient values

Regularization:

Regularization: vs.

Polynomial Coefficients

Probability Theory Apples and Oranges

Probability Theory Marginal Probability Joint Probability Conditional Probability

Probability Theory Sum Rule Product Rule

The Rules of Probability Sum Rule Product Rule

Bayes ’ Theorem posterior  likelihood × prior

Probability Densities

Transformed Densities

Expectations Conditional Expectation (discrete) Approximate Expectation (discrete and continuous)

Variances and Covariances

The Gaussian Distribution

Gaussian Mean and Variance

The Multivariate Gaussian

Gaussian Parameter Estimation Likelihood function

Maximum (Log) Likelihood

Properties of and

Curve Fitting Re-visited

Maximum Likelihood Determine by minimizing sum-of-squares error, .

Predictive Distribution

MAP: A Step towards Bayes Determine by minimizing regularized sum-of-squares error, .

Bayesian Curve Fitting

Bayesian Predictive Distribution

Model Selection Cross-Validation

Curse of Dimensionality

Curse of Dimensionality Polynomial curve fitting, M = 3 Gaussian Densities in higher dimensions

Decision Theory Inference step Determine either or . Decision step For given x , determine optimal t .

Minimum Misclassification Rate

Minimum Expected Loss Example: classify medical images as ‘ cancer ’ or ‘ normal ’ Decision Truth

Minimum Expected Loss Regions are chosen to minimize

Reject Option

Why Separate Inference and Decision? • Minimizing risk (loss matrix may change over time) • Reject option • Unbalanced class priors • Combining models

Decision Theory for Regression Inference step Determine . Decision step For given x , make optimal prediction, y ( x ) , for t . Loss function:

The Squared Loss Function

Generative vs Discriminative Generative approach: Model Use Bayes ’ theorem Discriminative approach: Model directly

Entropy Important quantity in • coding theory • statistical physics • machine learning

Entropy Coding theory: x discrete with 8 possible states; how many bits to transmit the state of x ? All states equally likely

Entropy

Entropy In how many ways can N identical objects be allocated M bins? Entropy maximized when

Entropy

Differential Entropy Put bins of width ¢ along the real line Differential entropy maximized (for fixed ) when in which case

Conditional Entropy

The Kullback-Leibler Divergence

Mutual Information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION - PowerPoint PPT Presentation

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION Pattern Recogniton Pattern: Any regularity in data X Pattern Recognition: Discovery of any regularity in data, through computer algorithms and takes actions (Such as

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

CS 7616 Pattern Recognition Introduction Aaron Bobick School of Interactive Computing

Pattern Recognition: An Overview Prof. Richard Zanibbi Pattern Recognition (One) Definition

PRLab TUDelft NL PATTERN RECOGNITION & MACHINE LEARNING An Introduction Marco Loog

CS 7616 Pattern Recognition Bayesian Decision Theory Aaron Bobick School of Interactive Computing

Pattern Recognition CSE 802 Michigan State University Spring 2017 Lecture 1, January 9, 2017

Applications of Pattern Recognition in Computational Biology Pattern Recognition Course

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Higgs Machine Learning Challenge experience. A HEP pattern recognition challenge ? David

A common pattern: map Another common pattern: filter Pattern: take a list and produce a new list,

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

High-dimensional statistics and probability Christophe Giraud 1 , Matthieu Lerasle 2 , 3 and

Recent Advances in Adaptive Sampling and Reconstruction for Monte

Applied Machine Learning Applied Machine Learning Some basic concepts Siamak Ravanbakhsh Siamak

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu

Applied Machine Learning Some important concepts Siamak Ravanbakhsh COMP 551 (fall 2020) Admin

Kernel-based Methods and Support Vector Machines Larry Holder CSE 6363 Machine Learning

How to Cope with the Curse of Dimensionality ? Henryk Wo zniakowski University of Warsaw and

Steganalysis in high dimensions: Fusing classifiers built on random subspaces Jan Kodovsk,