Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 4 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 33

Logistics • Piazza: https://piazza.com/colorado/fall2017/csci5622/ • Office hour • HW1 due • Final projects • Feedback Machine Learning: Chenhao Tan | Boulder | 2 of 33

Recap • Supervised learning • K-nearest neighbor • Training/validation/test, overfitting/underfitting Machine Learning: Chenhao Tan | Boulder | 3 of 33

Overview Generative vs. Discriminative models Naïve Bayes Classifier Motivating Naïve Bayes Example Naïve Bayes Definition Estimating Probability Distributions Logistic regression Logistic Regression Example Machine Learning: Chenhao Tan | Boulder | 4 of 33

Generative vs. Discriminative models Outline Generative vs. Discriminative models Naïve Bayes Classifier Motivating Naïve Bayes Example Naïve Bayes Definition Estimating Probability Distributions Logistic regression Logistic Regression Example Machine Learning: Chenhao Tan | Boulder | 5 of 33

Generative vs. Discriminative models Probabilistic Models • hypothesis function h : X → Y . Machine Learning: Chenhao Tan | Boulder | 6 of 33

Generative vs. Discriminative models Probabilistic Models • hypothesis function h : X → Y . In this special case, we define h based on estimating a probabilistic model P ( X , Y ) . Machine Learning: Chenhao Tan | Boulder | 6 of 33

Generative vs. Discriminative models Probabilistic Classification Input : S train = { ( x i , y i ) } N i = 1 training examples y i ∈ { c 1 , c 2 , . . . , c J } Goal : h : X → Y For each class c j , estimate P ( y = c j | x , S train ) Assign to x the class with the highest probability ˆ y = h ( x ) = arg max P ( y = c | x , S train ) c Machine Learning: Chenhao Tan | Boulder | 7 of 33

Generative vs. Discriminative models Generative vs. Discriminative Models Discriminative Generative Model only conditional probability p ( y | x ) , excluding the data x . Model joint probability p ( x , y ) including the data x . Logistic regression Naïve Bayes • Logistic: A special mathematical function it uses • Uses Bayes rule to reverse conditioning p ( x | y ) → p ( y | x ) • Regression: Combines a weight vector with observations to create an • Naïve because it ignores joint answer probabilities within the data distribution • General cookbook for building conditional probability distributions Machine Learning: Chenhao Tan | Boulder | 8 of 33

Naïve Bayes Classifier Outline Generative vs. Discriminative models Naïve Bayes Classifier Motivating Naïve Bayes Example Naïve Bayes Definition Estimating Probability Distributions Logistic regression Logistic Regression Example Machine Learning: Chenhao Tan | Boulder | 9 of 33

Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem • Suppose that I have two coins, C 1 and C 2 • Now suppose I pull a coin out of my pocket, flip it a bunch of times, record the coin and outcomes, and repeat many times: C1: 0 1 1 1 1 C1: 1 1 0 C2: 1 0 0 0 0 0 0 1 C1: 0 1 C1: 1 1 0 1 1 1 C2: 0 0 1 1 0 1 C2: 1 0 0 0 Machine Learning: Chenhao Tan | Boulder | 10 of 33

Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem • Suppose that I have two coins, C 1 and C 2 • Now suppose I pull a coin out of my pocket, flip it a bunch of times, record the coin and outcomes, and repeat many times: C1: 0 1 1 1 1 C1: 1 1 0 C2: 1 0 0 0 0 0 0 1 C1: 0 1 C1: 1 1 0 1 1 1 C2: 0 0 1 1 0 1 C2: 1 0 0 0 • Now suppose I am given a new sequence, 0 0 1 ; which coin is it from? Machine Learning: Chenhao Tan | Boulder | 10 of 33

Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem This problem has particular challenges: • different numbers of covariates for each observation • number of covariates can be large However, there is some structure: • Easy to get P ( C 1 ) , P ( C 2 ) • Also easy to get P ( X i = 1 | C 1 ) and P ( X i = 1 | C 2 ) • By conditional independence, P ( X = 0 0 1 | C 1 ) = P ( X 1 = 0 | C 1 ) P ( X 2 = 0 | C 1 ) P ( X 2 = 1 | C 1 ) • Can we use these to get P ( C 1 | X = 0 0 1 ) ? Machine Learning: Chenhao Tan | Boulder | 11 of 33

Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem This problem has particular challenges: • different numbers of covariates for each observation • number of covariates can be large However, there is some structure: • Easy to get P ( C 1 )= 4 / 7 , P ( C 2 )= 3 / 7 • Also easy to get P ( X i = 1 | C 1 ) and P ( X i = 1 | C 2 ) • By conditional independence, P ( X = 0 0 1 | C 1 ) = P ( X 1 = 0 | C 1 ) P ( X 2 = 0 | C 1 ) P ( X 2 = 1 | C 1 ) • Can we use these to get P ( C 1 | X = 0 0 1 ) ? Machine Learning: Chenhao Tan | Boulder | 11 of 33

Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem This problem has particular challenges: • different numbers of covariates for each observation • number of covariates can be large However, there is some structure: • Easy to get P ( C 1 )= 4 / 7 , P ( C 2 )= 3 / 7 • Also easy to get P ( X i = 1 | C 1 )= 12 / 16 and P ( X i = 1 | C 2 )= 6 / 18 • By conditional independence, P ( X = 0 0 1 | C 1 ) = P ( X 1 = 0 | C 1 ) P ( X 2 = 0 | C 1 ) P ( X 2 = 1 | C 1 ) • Can we use these to get P ( C 1 | X = 0 0 1 ) ? Machine Learning: Chenhao Tan | Boulder | 11 of 33

Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem However, there is some structure: • Easy to get P ( C 1 )= 4 / 7 , P ( C 2 )= 3 / 7 • Also easy to get P ( X i = 1 | C 1 )= 12 / 16 and P ( X i = 1 | C 2 ) = = 6 / 18 • By conditional independence, P ( X = 0 0 1 | C 1 ) = P ( X 1 = 0 | C 1 ) P ( X 2 = 0 | C 1 ) P ( X 2 = 1 | C 1 ) 4 / 7 × 4 / 16 × 4 / 16 × 12 / 16 P ( C 1 | X = 0 0 1 ) = 4 / 7 × 4 / 16 × 4 / 16 × 12 / 16 + 3 / 7 × 12 / 18 × 12 / 18 × 6 / 18 Machine Learning: Chenhao Tan | Boulder | 13 of 33

Naïve Bayes Classifier | Naïve Bayes Definition The Naïve Bayes classifier • The Naïve Bayes classifier is a probabilistic classifier. • We compute the probability of a document d being in a class c as follows: � P ( c | d ) ∝ P ( c , d ) = P ( c ) P ( w i | c ) 1 ≤ i ≤ n d Machine Learning: Chenhao Tan | Boulder | 14 of 33

Naïve Bayes Classifier | Naïve Bayes Definition The Naïve Bayes classifier • The Naïve Bayes classifier is a probabilistic classifier. • We compute the probability of a document d being in a class c as follows: � P ( c | d ) ∝ P ( c , d ) = P ( c ) P ( w i | c ) 1 ≤ i ≤ n d • n d is the length of the document. (number of tokens) • P ( w i | c ) is the conditional probability of term w i occurring in a document of class c • P ( w i | c ) as a measure of how much evidence w i contributes that c is the correct class. • P ( c ) is the prior probability of c . • If a document’s terms do not provide clear evidence for one class vs. another, we choose the c with higher P ( c ) . Machine Learning: Chenhao Tan | Boulder | 14 of 33

Naïve Bayes Classifier | Naïve Bayes Definition Maximum a posteriori class • Our goal is to find the “best” class. • The best class in Naïve Bayes classification is the most likely or maximum a posteriori (MAP) class c MAP : ˆ ˆ � ˆ c MAP = arg max P ( c j | d ) = arg max P ( c j ) P ( w i | c j ) c j ∈ C c j ∈ C 1 ≤ i ≤ n d • We write ˆ P for P since these values are estimates from the training set. Machine Learning: Chenhao Tan | Boulder | 15 of 33

Naïve Bayes Classifier | Naïve Bayes Definition Naive Bayes Classifier: More examples This works because the coin flips are independent given the coin parameter. What about this case: • want to identify the type of fruit given a set of features: color, shape and size • color: red, green, yellow or orange (discrete) • shape: round, oval or long+skinny (discrete) • size: diameter in inches (continuous) Machine Learning: Chenhao Tan | Boulder | 16 of 33

Naïve Bayes Classifier | Naïve Bayes Definition Naive Bayes Classifier: More examples Conditioned on type of fruit, these features are not necessarily independent: Given category “apple,” the color “green” has a higher probability given “size < 2”: P ( green | size < 2 , apple ) > P ( green | apple ) Machine Learning: Chenhao Tan | Boulder | 17 of 33

Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 4 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 33 Logistics Piazza:

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 3 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 7 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 1 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 2 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 10 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 23: Machine

User Level Sentiment Analysis Incorporating Social Networks Chenhao Tan Department of Computer

Natural Language Processing (CSEP 517): Computational Pragmatics Chenhao Tan 2017 c

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 18: Clustering

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 13: Boosting

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 12:

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 21: Reinforcement

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 16:

Introduction to Linear Regression Rebecca C. Steorts September 15, 2015 Today

of Paternity Overview th and 31 st st 2017 August 29 th 2017 Public Health Division Center for

LOOKING AT THE NUMBERS: SINGLE MOTHER STUDENTS IN PENNSYLVANIA Susana Contreras-Mendez, Research

Children in Medicaid and CHIP November 8, 2018 2:00 p.m. ET 1 Agenda Introduction and

Situacional Awareness Improvement: From Classroom to Field (digitally) Topological and tactical

DiVE Virtual Reality Lab Duke ITAC Meeting 2/22/2018 David J. Zielinski Dr. Regis Kopper

EECS 4441 Human-Computer Interaction Topic #3a: The Interaction Steven Castellucci York

Network performance requirements of Augmented Reality Systems Mike P. Wittie 1 Augmented