Introduction to Statistical Machine Learning Marcus Hutter - PowerPoint PPT Presentation

Introduction to Statistical Machine Learning - 1 - Marcus Hutter Introduction to Statistical Machine Learning Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Machine Learning Summer School MLSS-2009, 26 Janurary – 6 February, Canberra

Introduction to Statistical Machine Learning - 2 - Marcus Hutter Abstract This course provides a brief overview of the methods and practice of statistical machine learning. It’s purpose is to (a) give a mini-introduction and background to logicians interested in the AI courses, and (b) to summarize the core concepts covered by the machine learning courses during this week. Topics covered include Bayesian inference and maximum likelihood modeling; regression, classification, density estimation, clustering, principal component analysis; parametric, semi-parametric, and non-parametric models; basis functions, neural networks, kernel methods, and graphical models; deterministic and stochastic optimization; overfitting, regularization, and validation.

Introduction to Statistical Machine Learning - 3 - Marcus Hutter Table of Contents 1. Introduction / Overview / Preliminaries 2. Linear Methods for Regression 3. Nonlinear Methods for Regression 4. Model Assessment & Selection 5. Large Problems 6. Unsupervised Learning 7. Sequential & (Re)Active Settings 8. Summary

Intro/Overview/Preliminaries - 4 - Marcus Hutter 1 INTRO/OVERVIEW/PRELIMINARIES • What is Machine Learning? Why Learn? • Related Fields • Applications of Machine Learning • Supervised ↔ Unsupervised ↔ Reinforcement Learning • Dichotomies in Machine Learning • Mini-Introduction to Probabilities

Intro/Overview/Preliminaries - 5 - Marcus Hutter What is Machine Learning? Machine Learning is concerned with the development of algorithms and techniques that allow computers to learn Learning in this context is the process of gaining understanding by constructing models of observed data with the intention to use them for prediction. Related fields • Artificial Intelligence: smart algorithms • Statistics: inference from a sample • Data Mining: searching through large volumes of data • Computer Science: efficient algorithms and complex models

Intro/Overview/Preliminaries - 6 - Marcus Hutter Why ‘Learn’? There is no need to “learn” to calculate payroll Learning is used when: • Human expertise does not exist (navigating on Mars), • Humans are unable to explain their expertise (speech recognition) • Solution changes in time (routing on a computer network) • Solution needs to be adapted to particular cases (user biometrics) Example: It is easier to write a program that learns to play checkers or backgammon well by self-play rather than converting the expertise of a master player to a program.

Intro/Overview/Preliminaries - 7 - Marcus Hutter Handwritten Character Recognition an example of a difficult machine learning problem Task: Learn general mapping from pixel images to digits from examples

Intro/Overview/Preliminaries - 8 - Marcus Hutter Applications of Machine Learning machine learning has a wide spectrum of applications including: • natural language processing, • search engines, • medical diagnosis, • detecting credit card fraud, • stock market analysis, • bio-informatics, e.g. classifying DNA sequences, • speech and handwriting recognition, • object recognition in computer vision, • playing games – learning by self-play: Checkers, Backgammon. • robot locomotion.

Intro/Overview/Preliminaries - 9 - Marcus Hutter Some Fundamental Types of Learning • Reinforcement Learning • Supervised Learning Agents Classification Regression • Others • Unsupervised Learning SemiSupervised Learning Association Active Learning Clustering Density Estimation

Intro/Overview/Preliminaries - 10 - Marcus Hutter Supervised Learning • Prediction of future cases: Use the rule to predict the output for future inputs • Knowledge extraction: The rule is easy to understand • Compression: The rule is simpler than the data it explains • Outlier detection: Exceptions that are not covered by the rule, e.g., fraud

Intro/Overview/Preliminaries - 11 - Marcus Hutter Classification Example: Credit scoring Differentiating between low-risk and high-risk customers from their Income and Savings Discriminant: IF income > θ 1 AND savings > θ 2 THEN low-risk ELSE high-risk

Intro/Overview/Preliminaries - 12 - Marcus Hutter Regression Example: Price y = f ( x ) +noise of a used car as function of age x

Intro/Overview/Preliminaries - 13 - Marcus Hutter Unsupervised Learning • Learning “what normally happens” • No output • Clustering: Grouping similar instances • Example applications: Customer segmentation in CRM Image compression: Color quantization Bioinformatics: Learning motifs

Intro/Overview/Preliminaries - 14 - Marcus Hutter Reinforcement Learning • Learning a policy: A sequence of outputs • No supervised output but delayed reward • Credit assignment problem • Game playing • Robot in a maze • Multiple agents, partial observability, ...

Intro/Overview/Preliminaries - 15 - Marcus Hutter Dichotomies in Machine Learning (machine) learning / statistical ⇔ logic/knowledge-based (GOFAI) induction ⇔ prediction ⇔ decision ⇔ action regression ⇔ classification independent identically distributed ⇔ sequential / non-iid online learning ⇔ offline/batch learning passive prediction ⇔ active learning parametric ⇔ non-parametric conceptual/mathematical ⇔ computational issues exact/principled ⇔ heuristic supervised learning ⇔ unsupervised ⇔ RL learning

Intro/Overview/Preliminaries - 16 - Marcus Hutter Probability Basics Probability is used to describe uncertain events; the chance or belief that something is or will be true. Example: Fair Six-Sided Die: • Sample space: Ω = { 1 , 2 , 3 , 4 , 5 , 6 } • Events: Even = { 2 , 4 , 6 } , Odd = { 1 , 3 , 5 } ⊆ Ω • Probability: P(6) = 1 6 , P( Even ) = P( Odd ) = 1 2 • Outcome: 6 ∈ E . • Conditional probability: P (6 | Even ) = P (6 and Even ) = 1 / 6 1 / 2 = 1 P ( Even ) 3 General Axioms: • P( {} ) = 0 ≤ P( A ) ≤ 1 = P(Ω) , • P( A ∪ B ) + P( A ∩ B ) = P( A ) + P( B ) , • P( A ∩ B ) = P( A | B )P( B ) .

Intro/Overview/Preliminaries - 17 - Marcus Hutter Probability Jargon Example: (Un)fair coin: Ω = { Tail,Head } ≃ { 0 , 1 } . P(1) = θ ∈ [0 , 1] : Likelihood: P(1101 | θ ) = θ × θ × (1 − θ ) × θ Maximum Likelihood (ML) estimate: ˆ θ = arg max θ P(1101 | θ ) = 3 4 Prior: If we are indifferent, then P( θ ) = const. 1 � Evidence: P(1101) = � θ P(1101 | θ )P( θ ) = 20 (actually ) Posterior: P( θ | 1101) = P(1101 | θ )P( θ ) ∝ θ 3 (1 − θ ) (BAYES RULE!). P(1101) Maximum a Posterior (MAP) estimate: ˆ θ = arg max θ P( θ | 1101) = 3 4 Predictive distribution: P(1 | 1101) = P(11011) P(1101) = 2 3 θ f ( θ )P( θ | ... ) , e.g. E [ θ | 1101] = 2 Expectation: E [ f | ... ] = � 3 2 Variance: Var ( θ ) = E [( θ − E θ ) 2 | 1101] = 63 Probability density: P( θ ) = 1 ε P([ θ, θ + ε ]) for ε → 0

Linear Methods for Regression - 18 - Marcus Hutter 2 LINEAR METHODS FOR REGRESSION • Linear Regression • Coefficient Subset Selection • Coefficient Shrinkage • Linear Methods for Classifiction • Linear Basis Function Regression (LBFR) • Piecewise linear, Splines, Wavelets • Local Smoothing & Kernel Regression • Regularization & 1D Smoothing Splines

Linear Methods for Regression - 19 - Marcus Hutter Linear Regression fitting a linear function to the data • Input “feature” vector x := (1 ≡ x (0) , x (1) , ..., x ( d ) ) ∈ I R d +1 • Real-valued noisy response y ∈ I R . • Linear regression model: y = f w ( x ) = w 0 x (0) + ... + w d x ( d ) ˆ • Data: D = ( x 1 , y 1 ) , ..., ( x n , y n ) • Error or loss function: Example: Residual sum of squares: Loss( w ) = � n i =1 ( y i − f w ( x i )) 2 • Least squares (LSQ) regression: w = arg min w Loss( w ) ˆ • Example: Person’s weight y as a function of age x 1 , height x 2 .

Linear Methods for Regression - 20 - Marcus Hutter Coefficient Subset Selection Problems with least squares regression if d is large: • Overfitting: The plane fits the data well (perfect for d ≥ n ), but predicts (generalizes) badly. • Interpretation: We want to identify a small subset of features important/relevant for predicting y . Solution 1: Subset selection: Take those k out of d features that minimize the LSQ error.

Linear Methods for Regression - 21 - Marcus Hutter Coefficient Shrinkage Solution 2: Shrinkage methods: Shrink the least squares w by penalizing the Loss: Ridge regression: Add ∝ || w || 2 2 . Lasso: Add ∝ || w || 1 . Bayesian linear regression: Comp. MAP arg max w P( w | D ) from prior P ( w ) and sampling model P ( D | w ) . Weights of low variance components shrink most.

Introduction to Statistical Machine Learning Marcus Hutter - PowerPoint PPT Presentation

Introduction to Statistical Machine Learning - 1 - Marcus Hutter Introduction to Statistical Machine Learning Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Machine Learning Summer School MLSS-2009,

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Day 1: Introduction to Statistical Learning Lucas Leemann Essex Summer School Introduction to

Attorney Expertise, Litigant Success, and Judicial Decisionmaking in the U.S. Courts of Appeals

"More Ballast!" The International Lisp Conference Keynote For those of you who haven't

Agro-processing, value chains and regional integration in Southern Africa SA-TIED Webinar

GOVERNING FOR THE FUTURE: ASSESSING THE CONSTITUTIONAL, INSTITUTIONAL AND POLICY OPTIONS TO

about visual categories Kristen Grauman Department of Computer Science University of Texas at

1 Peter Series Lesson #127 April 19, 2018 Dean Bible Ministries www.deanbibleministries.org Dr.

Translating Unknown Words by Analogical Learning Philippe Langlais and Alexandre Patry Dept.

The Dynamic economic effects of a US corporate income tax Rate Reduction John W. Diamond Kelly

Introduction to Statistical Machine Learning Marcus Hutter - PowerPoint PPT Presentation

Introduction to Statistical Machine Learning - 1 - Marcus Hutter Introduction to Statistical Machine Learning Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Machine Learning Summer School MLSS-2009,

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

MACHINE LEARNING, STATISTICAL LEARNING AND PARALLEL COMPUTING INTRODUCTION VS MACHINE LEARNING

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 23. PGM

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Day 1: Introduction to Statistical Learning Lucas Leemann Essex Summer School Introduction to

Attorney Expertise, Litigant Success, and Judicial Decisionmaking in the U.S. Courts of Appeals

&quot;More Ballast!&quot; The International Lisp Conference Keynote For those of you who haven't

Agro-processing, value chains and regional integration in Southern Africa SA-TIED Webinar

GOVERNING FOR THE FUTURE: ASSESSING THE CONSTITUTIONAL, INSTITUTIONAL AND POLICY OPTIONS TO

about visual categories Kristen Grauman Department of Computer Science University of Texas at

1 Peter Series Lesson #127 April 19, 2018 Dean Bible Ministries www.deanbibleministries.org Dr.

Translating Unknown Words by Analogical Learning Philippe Langlais and Alexandre Patry Dept.

The Dynamic economic effects of a US corporate income tax Rate Reduction John W. Diamond Kelly

"More Ballast!" The International Lisp Conference Keynote For those of you who haven't