0.9 0.8 0.7 0.6 0.5 0.4 R 23 = C 2 is misclassified as C 3 0.3 - PDF document

Artificial Intelligence: Representation and Problem Solving 15-381 April 17, 2007 Probabilistic Learning Reminder • No class on Thursday - spring carnival. Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 2

Recall the basic algorithm for learning decision trees: 1. starting with whole training data 2. select attribute or value along dimension that gives “best” split using information gain or other criteria 3. create child nodes based on split 4. recurse on each child using child data until a stopping criterion is reached • all examples have same class • amount of data is too small • tree too large • Does this capture probabilistic relationships? Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 3 Quantifying the certainty of decions • Suppose instead of a yes or no Predicting credit risk answer want some estimate of how <2 years at missed strongly we believe a loan applicant is defaulted? current job? payments? a credit risk. N N N • This might be useful if we want some Y N Y flexibility in adjusting our decision criteria. N N N - Eg, suppose we’re willing to take N N N more risk if times are good. N Y Y - Or, if we want to examine case we Y N N believe are higher risks more N Y N carefully. N Y Y Y N N Y N N • • • • • • • • • Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 4

The mushroom data • Or suppose we wanted to know how Mushroom data likely a mushroom was safe to eat? • • � • EDIBLE? CAP-SHAPE CAP-SURFACE • Do decision trees give us that 1 edible flat fibrous • • � • information? 2 poisonous convex smooth • • � • 3 edible flat fibrous • • � • 4 edible convex scaly • • � • 5 poisonous convex smooth • • � • 6 edible convex fibrous • • � • 7 poisonous flat scaly • • � • 8 poisonous flat scaly • • � • 9 poisonous convex fibrous • • � • 10 poisonous convex fibrous • • � • 11 poisonous flat smooth • • � • 12 edible convex smooth • • � • 13 poisonous knobbed scaly • • � • 14 poisonous flat smooth • • � • 15 poisonous flat fibrous • • � • • • • • • • • • • • • • Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 5 Fisher’s Iris data 2.5 In which example would Iris virginica you be more confident about the class? 2 petal width (cm) 1.5 1 Iris setosa Iris versicolor 0.5 Decision trees provide a 0 classification but not 1 2 3 4 5 6 7 uncertainty. petal length (cm) Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 6

The general classification problem output is a binary classification vector: desired output � 1 if x i ∈ C i ≡ class i, y i = y = { y 1 , . . . , y K } 0 otherwise model model (e.g. a decision tree) is defined by M parameters. θ = { θ 1 , . . . , θ M } Data input is a set of T observations, each an D = { x 1 , . . . , x T } N-dimensional vector (binary, discrete, or continuous) x i = { x 1 , . . . , x N } i How do we approach this Given data, we want to learn a model that probabilistically? can correctly classify novel observations. Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 7 The answer to all questions of uncertainty • Let’s apply Bayes’ rule to infer the most probable class given the observation: p ( x | C k ) p ( C k ) p ( C k | x ) = p ( x ) p ( x | C k ) p ( C k ) = � k p ( x | C k ) p ( C k ) • This is the answer, but what does it mean? • How do we specify the terms? - p ( C k ) is the prior probability on the different classes - p ( x | C k ) is the data likelihood, ie probability of x given class C k • How should we define this? Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 8

What classifier would give “optimal” performance? • Consider the iris data again. p ( petal length | C 3 ) • How would we minimize the number p ( petal length | C 2 ) 0.9 0.8 of future mis-classifications? 0.7 • We would need to know the true 0.6 0.5 distribution of the classes. 0.4 0.3 • Assume they follow a Gaussian 0.2 distribution. 0.1 0 • The number of samples in each class 2.5 is the same (50), so (assume) p ( C k ) is equal for all classes. 2 petal width (cm) • Because p ( x ) is the same for all 1.5 classes we have: 1 p ( x | C k ) p ( C k ) p ( C k | x ) = p ( x ) 0.5 p ( x | C k ) p ( C k ) ∝ 0 1 2 3 4 5 6 7 petal length (cm) Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 9 Where do we put the boundary? p ( petal length | C 3 ) p ( petal length | C 2 ) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 10

Where do we put the boundary? decision boundary p ( petal length | C 3 ) p ( petal length | C 2 ) 0.9 0.8 0.7 0.6 0.5 0.4 R 23 = C 2 is misclassified as C 3 0.3 R 32 = C 3 is misclassified as C 2 0.2 0.1 0 1 2 3 4 5 6 7 Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 11 Where do we put the boundary? Shifting the boundary trades-off the two errors. p ( petal length | C 3 ) p ( petal length | C 2 ) 0.9 0.8 0.7 0.6 0.5 0.4 R 23 = C 2 is misclassified as C 3 0.3 R 32 = C 3 is misclassified as C 2 0.2 0.1 0 1 2 3 4 5 6 7 Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 12

Where do we put the boundary? • The misclassification error is defined by � � p (error) = p ( C 3 | x ) d x + p ( C 2 | x ) d x R 32 R 23 • which in our case is proportional to the data likelihood p ( petal length | C 3 ) p ( petal length | C 2 ) 0.9 0.8 0.7 0.6 0.5 0.4 R 23 = C 2 is misclassified as C 3 0.3 R 32 = C 3 is misclassified as C 2 0.2 0.1 0 1 2 3 4 5 6 7 Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 13 Where do we put the boundary? • The misclassification error is defined by � � p (error) = p ( C 3 | x ) d x + p ( C 2 | x ) d x R 32 R 23 • which in our case is proportional to the data likelihood p ( petal length | C 3 ) p ( petal length | C 2 ) 0.9 0.8 0.7 0.6 This region would yield 0.5 p ( C 3 | x ) > p ( C 2 | x ) 0.4 but we’re still classifying 0.3 this region as C 2 ! 0.2 0.1 0 1 2 3 4 5 6 7 Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 14

The optimal decision boundary • The minimal misclassification error at the point where p ( C 3 | x ) = p ( C 2 | x ) Optimal decision boundary p ( x | C 3 ) p ( C 3 ) /p ( x ) = p ( x | C 2 ) p ( C 2 ) /p ( x ) ⇒ p ( x | C 3 ) = p ( x | C 2 ) ⇒ p ( C 2 | petal length ) p ( C 3 | petal length ) 1 0.8 p ( petal length | C 2 ) p ( petal length | C 3 ) 0.6 0.4 Note: this assumes we have only two classes. 0.2 0 1 2 3 4 5 6 7 Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 15 Bayesian classification for more complex models • Recall the class conditional probability: p ( x | C k ) p ( C k ) p ( C k | x ) = p ( x ) p ( x | C k ) p ( C k ) = � k p ( x | C k ) p ( C k ) • How do we define the data likelihood, p ( x | C k ) ie the probability of x given class C k Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 16

Defining a probabilistic classification model • How would we define credit risk problem? Predicting credit risk - Class: <2 years at missed defaulted? current job? payments? C 1 = “defaulted” N N N C 2 = “didn’t default” Y N Y - Data: N N N x = { “<2 years”, “missed payments” } N N N - Prior (from data): N Y Y Y N N p ( C 1 ) = 3/10; p ( C 2 ) = 7/10; - Likelihood: N Y N N Y Y p ( x 1 , x 2 | C 1 ) = ? Y N N p ( x 1 , x 2 | C 2 ) = ? Y N N - How would we determine these? • • • • • • • • • Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 17 Defining a probabilistic model by counting • The “prior” is obtained by counting number of classes in the data: Count ( C k = k ) p ( C k = k ) = # records • The likelihood is obtained the same way: Count ( x = v ∧ C k = k ) p ( x = v | C k ) = Count ( C k = k ) Count ( x 1 = v 1 , . . . ∧ x N = v N , ∧ C k = k ) p ( x 1 = v 1 , . . . , x N = v N | C k = k ) = Count ( C k = k ) • This is the maximum likelihood estimate (MLE) of the probabilities Artificial Intelligence: Probabilistic Learning Michael S. Lewicki � Carnegie Mellon 18

0.9 0.8 0.7 0.6 0.5 0.4 R 23 = C 2 is misclassified as C 3 0.3 - PDF document

Artificial Intelligence: Representation and Problem Solving 15-381 April 17, 2007 Probabilistic Learning Reminder No class on Thursday - spring carnival. Artificial Intelligence: Probabilistic Learning Michael S. Lewicki Carnegie

intro associations frequent patterns

MANIFOLDS AND DUALITY ANDREW RANICKI Classication of manifolds Uniqueness

Day 2 Mistakes are Beautiful Things (Believe in Yourself!) Video 1 * As you are watching *

Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks Victor Magron , MAC

The Generalized Auslander-Reiten Conjecture and Derived Equivalences Kosmas Diveris Syracuse

Turbulent transition in a high Reynolds number, Rayleigh-Taylor unstable plasma flow H. F. Robey,

COMP 431 Transport Layer Protocols & Services Internet Services & Protocols Performance

Why Strategic Plans Stall-Out and How to Light the Fire August 16, 2017 The webinar will

CS356 : Discussion #15 Review for Final Exam Marco Paolieri (paolieri@usc.edu) Illustrations

Midnight Laundry 2 Smarty Laundry 3 Pipelining Improve performance by increasing

Meaningful Guidance for Adults and Young People Developing and Managing Career Resilience in

WHAT THE DISCIPLE IS CALLED TO BE! Transformational Discipleship is a Demand to Be The State

Gauge Theories from the 11th Dimension Neil Lambert Birmingham 21 November 2018 Plan of Attack

TMB Update 2020: Board Rules on Pain Management Sherif Zaafran, MD, FASA President, TMB

2011 Glyphosate-Resistant Palmer Amaranth Management in Cotton Culpepper, Steckel, York GR

DHIN is a Statutory Entity since 1997! (16 Del. Code Chapter 103) Purpose of DHIN is Quite

LHCb Upstream Tracker upgrade and its off-detector electronics Zishuo Yang University of

21264 vs NetBurst Two Different Processors- Both Nonexistent CSE 240C - Rushi Chakrabarti - WI09

DHIS2 and HISP An overview Johan Ivar Sb Information Systems Research Group, IFI, UiO HISP

Lecture 14 Agents and Natural Language Terry Winograd CS147 - Introduction to Human-Computer

Ma Mark 1 k 12:30-31 31 I I STAN STAND D AM AMAZED AZED I I stand d ama mazed d in

English Acquisition IA k , IIA f , 2011 13 ( 14 ) ( )

Domes over Curves Igor Pak, UCLA (joint work with Alexey Glazyrin, UTRGV) Discrete Geometry

Great Student Questions and some not so great answers Topics for the Interview Must be

Sambuz

Useful Links

Newsletter

Mail Us

0.9 0.8 0.7 0.6 0.5 0.4 R 23 = C 2 is misclassified as C 3 0.3 - PDF document

Artificial Intelligence: Representation and Problem Solving 15-381 April 17, 2007 Probabilistic Learning Reminder No class on Thursday - spring carnival. Artificial Intelligence: Probabilistic Learning Michael S. Lewicki Carnegie

intro associations frequent patterns

MANIFOLDS AND DUALITY ANDREW RANICKI Classication of manifolds Uniqueness

Day 2 Mistakes are Beautiful Things (Believe in Yourself!) Video 1 * As you are watching *

Polynomial Optimization for Bounding Lipschitz Constants of Deep Networks Victor Magron , MAC

The Generalized Auslander-Reiten Conjecture and Derived Equivalences Kosmas Diveris Syracuse

Turbulent transition in a high Reynolds number, Rayleigh-Taylor unstable plasma flow H. F. Robey,

COMP 431 Transport Layer Protocols &amp; Services Internet Services &amp; Protocols Performance

Why Strategic Plans Stall-Out and How to Light the Fire August 16, 2017 The webinar will

CS356 : Discussion #15 Review for Final Exam Marco Paolieri (paolieri@usc.edu) Illustrations

Midnight Laundry 2 Smarty Laundry 3 Pipelining Improve performance by increasing

Meaningful Guidance for Adults and Young People Developing and Managing Career Resilience in

WHAT THE DISCIPLE IS CALLED TO BE! Transformational Discipleship is a Demand to Be The State

Gauge Theories from the 11th Dimension Neil Lambert Birmingham 21 November 2018 Plan of Attack

TMB Update 2020: Board Rules on Pain Management Sherif Zaafran, MD, FASA President, TMB

2011 Glyphosate-Resistant Palmer Amaranth Management in Cotton Culpepper, Steckel, York GR

DHIN is a Statutory Entity since 1997! (16 Del. Code Chapter 103) Purpose of DHIN is Quite

LHCb Upstream Tracker upgrade and its off-detector electronics Zishuo Yang University of

21264 vs NetBurst Two Different Processors- Both Nonexistent CSE 240C - Rushi Chakrabarti - WI09

DHIS2 and HISP An overview Johan Ivar Sb Information Systems Research Group, IFI, UiO HISP

Lecture 14 Agents and Natural Language Terry Winograd CS147 - Introduction to Human-Computer

Ma Mark 1 k 12:30-31 31 I I STAN STAND D AM AMAZED AZED I I stand d ama mazed d in

English Acquisition IA k , IIA f , 2011 13 ( 14 ) ( )

Domes over Curves Igor Pak, UCLA (joint work with Alexey Glazyrin, UTRGV) Discrete Geometry

Great Student Questions and some not so great answers Topics for the Interview Must be

Sambuz

Useful Links

Newsletter

Mail Us

COMP 431 Transport Layer Protocols & Services Internet Services & Protocols Performance