Introduction to Machine Learning ML-Basics: Components of Supervised - PowerPoint PPT Presentation

Introduction to Machine Learning ML-Basics: Components of Supervised Learning Learning goals Know the three components of a learner: Hypothesis space, risk, 6 θ = ( −1.7 , 1.3 ) T optimization 4 2 Understand that defining these R emp = 5.88 0 separately is the basic design of −2 a learner 0 2 4 6 8 Know a variety of choices for all three components

COMPONENTS OF SUPERVISED LEARNING Summarizing what we have seen before, many supervised learning algorithms can be described in terms of three components: Learning = Hypothesis Space + Risk + Optimization Hypothesis Space: Defines (and restricts!) what kind of model f can be learned from the data. Risk: Quantifies how well a specific model performs on a given data set. This allows us to rank candidate models in order to choose the best one. Optimization: Defines how to search for the best model in the hypothesis space , i.e., the model with the smallest risk . � c Introduction to Machine Learning – 1 / 8

COMPONENTS OF SUPERVISED LEARNING This concept can be extended by the concept of regularization , where the model complexity is accounted for in the risk: Learning = Hypothesis Space + Risk + Optim Learning = Hypothesis Space + Loss (+ Regularization) + Optim For now you can just think of the risk as sum of the losses. While this is a useful framework for most supervised ML problems, it does not cover all special cases, because some ML methods are not defined via risk minimization and for some models, it is not possible (or very hard) to explicitly define the hypothesis space. � c Introduction to Machine Learning – 2 / 8

VARIETY OF LEARNING COMPONENTS The framework is a good orientation to not get lost here:  Step functions    Linear functions      Sets of rules  Hypothesis Space : Neural networks     Voronoi tesselations      ...  Mean squared error    Misclassification rate     Risk / Loss : Negative log-likelihood  Information gain      ...   Analytical solution     Gradient descent    Optimization : Combinatorial optimization  Genetic algorithms      ...  � c Introduction to Machine Learning – 3 / 8

SUPERVISED LEARNING, FORMALIZED A learner (or inducer ) I is a program or algorithm which receives a training set D ∈ X × Y , and, for a given hypothesis space H of models f : X → R g , uses a risk function R emp ( f ) to evaluate f ∈ H on D ; or we use R emp ( θ ) to evaluate f’s parametrization θ on D uses an optimization procedure to find ˆ ˆ f = arg min R emp ( f ) θ = arg min R emp ( θ ) . or f ∈H θ ∈ Θ So the inducer mapping (including hyperparameters Λ ) is: I : D × Λ → H We can also adapt this concept to finding ˆ θ for parametric models: I : D × Λ → Θ � c Introduction to Machine Learning – 4 / 8

EXAMPLE: LINEAR REGRESSION ON 1D The hypothesis space in univariate linear regression is the set of all linear functions, with θ = ( θ 0 , θ ) ⊤ : H = { f ( x ) = θ 0 + θ x : θ 0 , θ ∈ R } 6 4 2 0 −2 0 2 4 6 8 Design choice: We could add more flexibility by allowing polynomial effects or by using a spline basis. � c Introduction to Machine Learning – 5 / 8

EXAMPLE: LINEAR REGRESSION ON 1D We might use the squared error as loss function to our risk , punishing larger distances more severely: n ( y ( i ) − θ 0 − θ x ( i ) ) 2 � R emp ( θ ) = i = 1 6 θ = ( 0.3 , 0 ) T 4 2 0 R emp = 40.96 −2 0 2 4 6 8 Design choice: Use absolute error / the L 1 loss to create a more robust model which is less sensitive regarding outliers. � c Introduction to Machine Learning – 6 / 8

EXAMPLE: LINEAR REGRESSION ON 1D Optimization will usually mean deriving the ordinary-least-squares (OLS) estimator ˆ θ analytically. 100 6 θ = ( −1.7 , 1.3 ) T 80 R emp 4 60 2 40 R emp = 5.88 0 20 −2 2 0.0 1 0.5 0 2 4 6 8 Intercept 0 Slope 1.0 -1 -2 1.5 Design choice: We could use stochastic gradient descent to scale better to very large or out-of-memory data. � c Introduction to Machine Learning – 7 / 8

SUMMARY By decomposing learners into these building blocks: we have a framework to better understand how they work, we can more easily evaluate in which settings they may be more or less suitable, and we can tailor learners to specific problems by clever choice of each of the three components. Getting this right takes a considerable amount of experience. � c Introduction to Machine Learning – 8 / 8

Introduction to Machine Learning ML-Basics: Components of Supervised - PowerPoint PPT Presentation

Introduction to Machine Learning ML-Basics: Components of Supervised Learning Learning goals Know the three components of a learner: Hypothesis space, risk, 6 = ( 1.7 , 1.3 ) T optimization 4 2 Understand that defining these R emp =

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

COMP24111: Machine Learning and Optimisation Chapter 1A: Machine Learning Basics Dr. Tingting Mu

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Machine Learning Basics Prof. Kuan-Ting Lai 2020/4/4 Machine Learning Francois Chollet , Deep

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Class2: Constraint Networks Rina Dechter Dbook: chapter 2-3, Constraint book: chapters 2 and 4

Mixed-integer conic optimization and MOSEK Dagstuhl seminar on MINLP, February 20th 2018 Sven

Counting the Optimal Solutions in Graphical Models Rina Dechter Radu Marinescu University of

and Analysis of Decision Trees Mikhail Moshkov King Abdullah University of Science and Technology

Combinatorial Optimization inspired by Uncertainties Arie M.C.A. Koster Operations Research 2018

Boosting Methods: Implicit Combinatorial Optimization via First-Order Convex Optimization Robert

Max-Point-Tolerance Graphs D. Catanzaro 1 , S. Chaplick 2 , S. Felsner 6 , B. Halldrsson 3 , M.

Unit 2: Problem Classification and Difficulty in Optimization Learning goals Unit 2 I. What