CS534: Machine Learning CS534: Machine Learning Thomas G. - PowerPoint PPT Presentation

CS534: Machine Learning CS534: Machine Learning Thomas G. Dietterich Thomas G. Dietterich 221C Dearborn Hall 221C Dearborn Hall tgd@cs.orst.edu tgd@cs.orst.edu http://www.cs.orst.edu/~tgd/classes/534 http://www.cs.orst.edu/~tgd/classes/534 1 1

Course Overview Course Overview Introduction: Introduction: – Basic problems and questions in machine learning. Example applications cations – Basic problems and questions in machine learning. Example appli Linear Classifiers Linear Classifiers Five Popular Algorithms Five Popular Algorithms – – Decision trees (C4.5) Decision trees (C4.5) – Neural networks (backpropagation) – Neural networks (backpropagation) – Probabilistic networks (Naï ïve Bayes; Mixture models) ve Bayes; Mixture models) – Probabilistic networks (Na – Support Vector Machines (SVMs) – Support Vector Machines (SVMs) – – Nearest Neighbor Method Nearest Neighbor Method Theories of Learning: Theories of Learning: – – PAC, Bayesian, Bias- PAC, Bayesian, Bias -Variance analysis Variance analysis Optimizing Test Set Performance: Optimizing Test Set Performance: – Overfitting, Penalty methods, Holdout Methods, Ensembles – Overfitting, Penalty methods, Holdout Methods, Ensembles Sequential and Spatial Data Sequential and Spatial Data – Hidden Markov models, Conditional Random Fields; Hidden Markov SVMs VMs – Hidden Markov models, Conditional Random Fields; Hidden Markov S Problem Formulation Problem Formulation – Designing Input and Output representations – Designing Input and Output representations 2 2

Supervised Learning Supervised Learning – Given: Training examples Given: Training examples h , f f ( ( x ) i i for some unknown function for some unknown function f f . . – x , x ) h x – Find: A good approximation to – Find: A good approximation to f f . . Example Applications Example Applications – Handwriting recognition Handwriting recognition – x: data from pen motion x: data from pen motion f(x): letter of the alphabet f(x): letter of the alphabet – Disease Diagnosis – Disease Diagnosis x: properties of patient (symptoms, lab tests) x: properties of patient (symptoms, lab tests) f(x): disease (or maybe, recommended therapy) f(x): disease (or maybe, recommended therapy) – Face Recognition Face Recognition – x: bitmap picture of person x: bitmap picture of person’ ’s face s face f(x): name of person f(x): name of person – Spam Detection Spam Detection – x: email message x: email message f(x): spam or not spam f(x): spam or not spam 3 3

Appropriate Applications for Appropriate Applications for Supervised Learning Supervised Learning Situations where there is no human expert Situations where there is no human expert – x: bond graph of a new molecule x: bond graph of a new molecule – – f(x): predicted binding strength to AIDS protease molecule f(x): predicted binding strength to AIDS protease molecule – Situations were humans can perform the task but can’ ’t describe how t describe how Situations were humans can perform the task but can they do it they do it – x: bitmap picture of hand x: bitmap picture of hand- -written character written character – – – f(x): ascii code of the character f(x): ascii code of the character Situations where the desired function is changing frequently Situations where the desired function is changing frequently – x: description of stock prices and trades for last 10 days x: description of stock prices and trades for last 10 days – – f(x): recommended stock transactions f(x): recommended stock transactions – Situations where each user needs a customized function f Situations where each user needs a customized function f – x: incoming email message x: incoming email message – – f(x): importance score for presenting to the user (or deleting w f(x): importance score for presenting to the user (or deleting without ithout – presenting) presenting) 4 4

test point Formal Formal , y y i P( x , y ) x , h x h i training points Setting y Setting x Training learning f sample algorithm Training examples are drawn Training examples are drawn ŷ y independently at random according to independently at random according to loss unknown probability distribution P( x , y y ) ) unknown probability distribution P( x , function The learning algorithm analyzes the The learning algorithm analyzes the examples and produces a classifier f f examples and produces a classifier L( ŷ ,y ) Given a new data point h , y y i i drawn from P, drawn from P, Given a new data point x , h x the classifier is given x x and predicts and predicts ŷ ŷ = = f f ( ( x ) the classifier is given x ) The loss L( ŷ ŷ ,y ,y ) is then measured ) is then measured The loss L( Goal of the learning algorithm: Find the f f Goal of the learning algorithm: Find the that minimizes the expected loss expected loss that minimizes the 5 5

Formal Version of Spam Detection Formal Version of Spam Detection P( x , y y ): distribution of email messages ): distribution of email messages x x and their and their P( x , true labels y y ( (“ “spam spam” ” or or “ “not spam not spam” ”) ) true labels training sample: a set of email messages that have training sample: a set of email messages that have been labeled by the user been labeled by the user learning algorithm: what we study in this course! learning algorithm: what we study in this course! f : the classifier output by the learning algorithm : the classifier output by the learning algorithm f test point: A new email message x x (with its true, but (with its true, but test point: A new email message hidden, label y y ) ) hidden, label true label y y true label loss function L( L( ŷ ŷ ,y) ,y) : : loss function predicted predicted spam not spam not label ŷ ŷ label spam spam spam spam 0 0 10 10 not spam not spam 1 1 0 0 6 6

Three Main Approaches to Three Main Approaches to Machine Learning Machine Learning Learn a classifier: a function f f . . Learn a classifier: a function Learn a conditional distribution: a conditional Learn a conditional distribution: a conditional distribution P( y y | | x ) distribution P( x ) Learn the joint probability distribution: P( x , y y ) ) Learn the joint probability distribution: P( x , In the first two weeks, we will study one example In the first two weeks, we will study one example of each method: of each method: – Learn a classifier: The LMS algorithm Learn a classifier: The LMS algorithm – – Learn a conditional distribution: Logistic regression Learn a conditional distribution: Logistic regression – – Learn the joint distribution: Linear discriminant Learn the joint distribution: Linear discriminant – analysis analysis 7 7

Infering a classifier f f from P( from P( y y | | x ) Infering a classifier x ) Predict the ŷ ŷ that minimizes the expected that minimizes the expected Predict the loss: loss: f ( x ) = argmin E y | x [ L (ˆ y, y )] ˆ y X = argmin P ( y | x ) L (ˆ y, y ) ˆ y y 8 8

Example: Making the spam decision Example: Making the spam decision Suppose our spam detector Suppose our spam detector predicts that P( y y = =“ “spam spam” ” | | x ) = predicts that P( x ) = 0.6. What is the optimal 0.6. What is the optimal true label y y true label classification decision ŷ ŷ ? ? classification decision predicted predicted spam not spam not label ŷ ŷ label spam spam Expected loss of ŷ ŷ = = “ “spam spam” ” is is Expected loss of spam spam 0 0 10 10 0 * 0.6 + 10 * 0.4 = 4 0 * 0.6 + 10 * 0.4 = 4 not spam not spam 1 1 0 0 Expected loss of ŷ ŷ = = “ “no spam no spam” ” Expected loss of P( y y | | x ) P( x ) 0.6 0.6 0.4 0.4 is 1 * 0.6 + 0 * 0.4 = 0.6 is 1 * 0.6 + 0 * 0.4 = 0.6 Therefore, the optimal Therefore, the optimal prediction is “ “no spam no spam” ” prediction is 9 9

Inferring a classifier from Inferring a classifier from the joint distribution P( x , y y ) ) the joint distribution P( x , We can compute the conditional distribution We can compute the conditional distribution according to the definition of conditional according to the definition of conditional probability: probability: P ( x , y = k ) P ( y = k | x ) = j P ( x , y = j ) . P In words, compute P( x , y=k y=k ) for each value of ) for each value of k k . . In words, compute P( x , Then normalize these numbers. Then normalize these numbers. Compute ŷ ŷ using the method from the previous using the method from the previous Compute slide slide 10 10

Fundamental Problem of Machine Fundamental Problem of Machine Learning: It is ill- -posed posed Learning: It is ill Example x 1 x 2 x 3 x 4 y 1 0 0 1 0 0 2 0 1 0 0 0 3 0 0 1 1 1 4 1 0 0 1 1 5 0 1 1 0 0 6 1 1 0 0 0 7 0 1 0 1 0 11 11

CS534: Machine Learning CS534: Machine Learning Thomas G. - PowerPoint PPT Presentation

CS534: Machine Learning CS534: Machine Learning Thomas G. Dietterich Thomas G. Dietterich 221C Dearborn Hall 221C Dearborn Hall tgd@cs.orst.edu tgd@cs.orst.edu http://www.cs.orst.edu/~tgd/classes/534 http://www.cs.orst.edu/~tgd/classes/534

Introduction to MATLAB CS534 Fall 2016 What you'll be learning today MATLAB basics

Introduction to MATLAB CS534 Fall 2016 Contact Qisi Wang Office: 1308CS E-mail:

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

A Stereo-Panoramic Telepresence System for Construction Machines Paolo

Advisory Panel on Communication and Dissemination Research << Develop infrastructure for

NEW BROWN STOCK WASH AIDS FOR IMPROVED WASHING & FOAM CONTROL Dan Nicholson, Senior Staff

Palliative Care and Health Systems Strengthening in Africa Fatia Kiyange Director of Programs

Alesia Scribes Chernihova : Ankur Bambharoliya Problem * Models Motivating Hidden Manha

e- -NeXSh: OS Fortification NeXSh: OS Fortification e Protecting Software from Internet Malware

Fortify your MySQL data security in AWS using ProxySQL and Firewalling Marco Tusa Percona About

How risky is the Cyber Independent Testing Lab software you use? { Sarah Zatko , Tim Carstens ,

CS534: Machine Learning CS534: Machine Learning Thomas G. - PowerPoint PPT Presentation

CS534: Machine Learning CS534: Machine Learning Thomas G. Dietterich Thomas G. Dietterich 221C Dearborn Hall 221C Dearborn Hall tgd@cs.orst.edu tgd@cs.orst.edu http://www.cs.orst.edu/~tgd/classes/534 http://www.cs.orst.edu/~tgd/classes/534

Introduction to MATLAB CS534 Fall 2016 What you'll be learning today MATLAB basics

Introduction to MATLAB CS534 Fall 2016 Contact Qisi Wang Office: 1308CS E-mail:

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

A Stereo-Panoramic Telepresence System for Construction Machines Paolo

Advisory Panel on Communication and Dissemination Research &lt;&lt; Develop infrastructure for

NEW BROWN STOCK WASH AIDS FOR IMPROVED WASHING &amp; FOAM CONTROL Dan Nicholson, Senior Staff

Palliative Care and Health Systems Strengthening in Africa Fatia Kiyange Director of Programs

Alesia Scribes Chernihova : Ankur Bambharoliya Problem * Models Motivating Hidden Manha

e- -NeXSh: OS Fortification NeXSh: OS Fortification e Protecting Software from Internet Malware

Fortify your MySQL data security in AWS using ProxySQL and Firewalling Marco Tusa Percona About

How risky is the Cyber Independent Testing Lab software you use? { Sarah Zatko , Tim Carstens ,

Advisory Panel on Communication and Dissemination Research << Develop infrastructure for

NEW BROWN STOCK WASH AIDS FOR IMPROVED WASHING & FOAM CONTROL Dan Nicholson, Senior Staff