Machine Learning 1 Machine(Learning(in(a(Nutshell ( Data$ Model$ - PowerPoint PPT Presentation

Machine Learning 1

Machine(Learning(in(a(Nutshell ( Data$ Model$ Performance$ Measure$ Machine(Learner( 2

Machine(Learning(in(a(Nutshell ( Data$ Model$ Performance$ Measure$ Machine(Learner( Data$with$a5ributes$ ID( A1( Reflex( RefLow(RefHigh(Label( 1$ 5.6$ Normal$ 3.4$ 7$ No$ 2$ 5.5$ Normal$ 2.4$ 5.7$ No$ 3$ 5.3$ Normal$ 2.4$ 5.7$ Yes$ 4$ 5.3$ Elevated$ 2.4$ 5.7$ No$ 5$ 6.3$ Normal$ 3.4$ 7$ No$ 6$ 3.3$ Normal$ 2.4$ 5.7$ Yes$ 7$ 5.1$ Decreased$ 2.4$ 5.7$ Yes$ 8$ 4.2$ Normal$ 2.4$ 5.7$ Yes$ …$…$ …$ …$ …$ …$ Instance$$$ x i ∈ X with$label$ y i ∈ Y 3

Machine(Learning(in(a(Nutshell ( Data$ Model$ Performance$ Measure$ Machine(Learner( Data$with$a5ributes$ Model$ f : X 7! Y LogisHc$regression$ ID( A1( Reflex( RefLow(RefHigh(Label( 1$ 5.6$ Normal$ 3.4$ 7$ No$ Support$vector$$ 2$ 5.5$ Normal$ 2.4$ 5.7$ No$ machines$ 3$ 5.3$ Normal$ 2.4$ 5.7$ Yes$ 4$ 5.3$ Elevated$ 2.4$ 5.7$ No$ 5$ 6.3$ Normal$ 3.4$ 7$ No$ 6$ 3.3$ Normal$ 2.4$ 5.7$ Yes$ Mixture$ 7$ 5.1$ Decreased$ 2.4$ 5.7$ Yes$ Models$ 8$ 4.2$ Normal$ 2.4$ 5.7$ Yes$ …$…$ …$ …$ …$ …$ x 1 Hierarchical$ Bayesian$ Instance$$$ x 2 x 3 x i ∈ X Networks$ with$label$ y i ∈ Y x 4 x 5 4

Machine(Learning(in(a(Nutshell ( Data$ Model$ Performance$ Measure$ Machine(Learner( Data$with$a5ributes$ Model$ EvaluaHon$ f : X 7! Y Measure$predicted$labels$vs$ LogisHc$regression$ ID( A1( Reflex( RefLow(RefHigh(Label( actual$labels$on$test$data$ 1$ 5.6$ Normal$ 3.4$ 7$ No$ Support$vector$$ 2$ 5.5$ Normal$ 2.4$ 5.7$ No$ machines$ 3$ 5.3$ Normal$ 2.4$ 5.7$ Yes$ Learning$Curve$ 4$ 5.3$ Elevated$ 2.4$ 5.7$ No$ 5$ 6.3$ Normal$ 3.4$ 7$ No$ 6$ 3.3$ Normal$ 2.4$ 5.7$ Yes$ Mixture$ Performance$ 7$ 5.1$ Decreased$ 2.4$ 5.7$ Yes$ Models$ 8$ 4.2$ Normal$ 2.4$ 5.7$ Yes$ …$…$ …$ …$ …$ …$ x 1 Hierarchical$ Bayesian$ Instance$$$ x 2 x 3 x i ∈ X Networks$ with$label$ #$Training$Examples$ y i ∈ Y x 4 x 5 5

A training set 6

ID3-induced decision tree 7

Model spaces I I - - - - + + + + I Nearest - neighbor Decision - tree + + Version space 8

Decision tree-induced partition – example I Color green blue red Size + Shape big small square round - + Size + big small - + 9

The Naïve Bayes Classifier Some material adapted from slides by Tom Mitchell, CMU. 19

Deriving Naïve Bayes ! Idea: use the training data to directly estimate: P ( Y ) P ( X | Y ) and ! Then, we can use these values to estimate using Bayes rule. P ( Y | X ) new ! Recall that representing the full joint probability P ( X , X , … , X | Y ) is not practical. 1 2 n 21

Deriving Naïve Bayes ! However, if we make the assumption that the attributes are independent, estimation is easy! P ( X 1 … , , X | Y ) P ( X | Y ) ∏ = n i i ! In other words, we assume all attributes are conditionally independent given Y. ! Often this assumption is violated in practice, but more on that later… 22

Deriving Naïve Bayes X X 1 … , , X = ! Let and label Y be discrete. n P ( i Y ) P ( X i Y | ) ! Then, we can estimate and i directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(Sky = sunny | Play = yes) = ? P(Humid = high | Play = yes) = ? 23

The Naïve Bayes Classifier ! Now we have: P ( Y y ) P ( X | Y y ) ∏ = = j i j P ( Y y | X 1 … , , X ) i = = j n P ( Y y ) P ( X | Y y ) ∑ ∏ = = k i k k i which is just a one-level Bayesian Network P ( Y ) Y Labels (hypotheses) P ( H ) j j i P ( X i Y | ) j … … Attributes (evidence) X X X 1 i n ! To classify a new point X new : Y arg max P ( Y y ) P ( X | Y y ) ∏ ← "" = = new k i k y k i 24

The Naïve Bayes Algorithm ! For each value y k ! Estimate P(Y = y k ) from the data. ! For each value x ij of each attribute X i ! Estimate P(X i =x ij | Y = y k ) ! Classify a new point via: Y arg max P ( Y y ) P ( X | Y y ) ∏ ← "" = = new k i k y k i ! In practice, the independence assumption doesn � t often hold true, but Naïve Bayes performs very well despite it. 25

Naïve Bayes Applications ! Text classification ! Which e-mails are spam? ! Which e-mails are meeting notices? ! Which author wrote a document? ! Classifying mental states Learning P(BrainActivity | WordCategory) Pairwise Classification Accuracy: 85% 26 People Words Animal Words

Machine Learning 1 Machine(Learning(in(a(Nutshell ( Data$ Model$ - PowerPoint PPT Presentation

Machine Learning 1 Machine(Learning(in(a(Nutshell ( Data$ Model$ Performance$ Measure$ Machine(Learner( 2 Machine(Learning(in(a(Nutshell ( Data$ Model$ Performance$ Measure$ Machine(Learner( Data$with$a5ributes$ ID( A1( Reflex(

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Multi-Building WiFi Fingerprinting using Bayesian and Hierarchical Supervised Machine Learning

Modeling the growth of Swedish Scots pines Bayes@Lund Henrike H abel Mathematical Sciences

A Spatial Bayesian Hierarchical Model for a Precipitation Return Levels Map Daniel Cooley 1 , 2

What is a hierarchical choice model? Elea McDonnell Feit Instructor DataCamp Marketing

3.2 Hierarchical Modeling Hao Li http://cs420.hao-li.com 1 Roadmap Last lecture: Viewing

Normal and Unimodular Hierarchical Models Daniel Irving Bernstein and Seth Sullivant North

Interaction Effects: Helpful or Harmful? Ben Lengerich CMU AI Seminar Feb 18, 2020 1 Today 1.

More Than Two Factors Designing experiments with two factors extends easily to experiments with