machine learning chenhao tan
play

Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 4 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 33 Logistics Piazza:


  1. Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 4 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 33

  2. Logistics • Piazza: https://piazza.com/colorado/fall2017/csci5622/ • Office hour • HW1 due • Final projects • Feedback Machine Learning: Chenhao Tan | Boulder | 2 of 33

  3. Recap • Supervised learning • K-nearest neighbor • Training/validation/test, overfitting/underfitting Machine Learning: Chenhao Tan | Boulder | 3 of 33

  4. Overview Generative vs. Discriminative models Naïve Bayes Classifier Motivating Naïve Bayes Example Naïve Bayes Definition Estimating Probability Distributions Logistic regression Logistic Regression Example Machine Learning: Chenhao Tan | Boulder | 4 of 33

  5. Generative vs. Discriminative models Outline Generative vs. Discriminative models Naïve Bayes Classifier Motivating Naïve Bayes Example Naïve Bayes Definition Estimating Probability Distributions Logistic regression Logistic Regression Example Machine Learning: Chenhao Tan | Boulder | 5 of 33

  6. Generative vs. Discriminative models Probabilistic Models • hypothesis function h : X → Y . Machine Learning: Chenhao Tan | Boulder | 6 of 33

  7. Generative vs. Discriminative models Probabilistic Models • hypothesis function h : X → Y . In this special case, we define h based on estimating a probabilistic model P ( X , Y ) . Machine Learning: Chenhao Tan | Boulder | 6 of 33

  8. Generative vs. Discriminative models Probabilistic Classification Input : S train = { ( x i , y i ) } N i = 1 training examples y i ∈ { c 1 , c 2 , . . . , c J } Goal : h : X → Y For each class c j , estimate P ( y = c j | x , S train ) Assign to x the class with the highest probability ˆ y = h ( x ) = arg max P ( y = c | x , S train ) c Machine Learning: Chenhao Tan | Boulder | 7 of 33

  9. Generative vs. Discriminative models Generative vs. Discriminative Models Discriminative Generative Model only conditional probability p ( y | x ) , excluding the data x . Model joint probability p ( x , y ) including the data x . Logistic regression Naïve Bayes • Logistic: A special mathematical function it uses • Uses Bayes rule to reverse conditioning p ( x | y ) → p ( y | x ) • Regression: Combines a weight vector with observations to create an • Naïve because it ignores joint answer probabilities within the data distribution • General cookbook for building conditional probability distributions Machine Learning: Chenhao Tan | Boulder | 8 of 33

  10. Naïve Bayes Classifier Outline Generative vs. Discriminative models Naïve Bayes Classifier Motivating Naïve Bayes Example Naïve Bayes Definition Estimating Probability Distributions Logistic regression Logistic Regression Example Machine Learning: Chenhao Tan | Boulder | 9 of 33

  11. Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem • Suppose that I have two coins, C 1 and C 2 • Now suppose I pull a coin out of my pocket, flip it a bunch of times, record the coin and outcomes, and repeat many times: C1: 0 1 1 1 1 C1: 1 1 0 C2: 1 0 0 0 0 0 0 1 C1: 0 1 C1: 1 1 0 1 1 1 C2: 0 0 1 1 0 1 C2: 1 0 0 0 Machine Learning: Chenhao Tan | Boulder | 10 of 33

  12. Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem • Suppose that I have two coins, C 1 and C 2 • Now suppose I pull a coin out of my pocket, flip it a bunch of times, record the coin and outcomes, and repeat many times: C1: 0 1 1 1 1 C1: 1 1 0 C2: 1 0 0 0 0 0 0 1 C1: 0 1 C1: 1 1 0 1 1 1 C2: 0 0 1 1 0 1 C2: 1 0 0 0 • Now suppose I am given a new sequence, 0 0 1 ; which coin is it from? Machine Learning: Chenhao Tan | Boulder | 10 of 33

  13. Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem This problem has particular challenges: • different numbers of covariates for each observation • number of covariates can be large However, there is some structure: • Easy to get P ( C 1 ) , P ( C 2 ) • Also easy to get P ( X i = 1 | C 1 ) and P ( X i = 1 | C 2 ) • By conditional independence, P ( X = 0 0 1 | C 1 ) = P ( X 1 = 0 | C 1 ) P ( X 2 = 0 | C 1 ) P ( X 2 = 1 | C 1 ) • Can we use these to get P ( C 1 | X = 0 0 1 ) ? Machine Learning: Chenhao Tan | Boulder | 11 of 33

  14. Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem This problem has particular challenges: • different numbers of covariates for each observation • number of covariates can be large However, there is some structure: • Easy to get P ( C 1 )= 4 / 7 , P ( C 2 )= 3 / 7 • Also easy to get P ( X i = 1 | C 1 ) and P ( X i = 1 | C 2 ) • By conditional independence, P ( X = 0 0 1 | C 1 ) = P ( X 1 = 0 | C 1 ) P ( X 2 = 0 | C 1 ) P ( X 2 = 1 | C 1 ) • Can we use these to get P ( C 1 | X = 0 0 1 ) ? Machine Learning: Chenhao Tan | Boulder | 11 of 33

  15. Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem This problem has particular challenges: • different numbers of covariates for each observation • number of covariates can be large However, there is some structure: • Easy to get P ( C 1 )= 4 / 7 , P ( C 2 )= 3 / 7 • Also easy to get P ( X i = 1 | C 1 )= 12 / 16 and P ( X i = 1 | C 2 )= 6 / 18 • By conditional independence, P ( X = 0 0 1 | C 1 ) = P ( X 1 = 0 | C 1 ) P ( X 2 = 0 | C 1 ) P ( X 2 = 1 | C 1 ) • Can we use these to get P ( C 1 | X = 0 0 1 ) ? Machine Learning: Chenhao Tan | Boulder | 11 of 33

  16. Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem Summary: have P ( data | class ) , want P ( class | data ) Solution: Bayes’ rule! P ( class | data ) = P ( data | class ) P ( class ) P ( data ) P ( data | class ) P ( class ) = � C class = 1 P ( data | class ) P ( class ) To compute, we need to estimate P ( data | class ) , P ( class ) for all classes Machine Learning: Chenhao Tan | Boulder | 12 of 33

  17. Naïve Bayes Classifier | Motivating Naïve Bayes Example A Classification Problem However, there is some structure: • Easy to get P ( C 1 )= 4 / 7 , P ( C 2 )= 3 / 7 • Also easy to get P ( X i = 1 | C 1 )= 12 / 16 and P ( X i = 1 | C 2 ) = = 6 / 18 • By conditional independence, P ( X = 0 0 1 | C 1 ) = P ( X 1 = 0 | C 1 ) P ( X 2 = 0 | C 1 ) P ( X 2 = 1 | C 1 ) 4 / 7 × 4 / 16 × 4 / 16 × 12 / 16 P ( C 1 | X = 0 0 1 ) = 4 / 7 × 4 / 16 × 4 / 16 × 12 / 16 + 3 / 7 × 12 / 18 × 12 / 18 × 6 / 18 Machine Learning: Chenhao Tan | Boulder | 13 of 33

  18. Naïve Bayes Classifier | Naïve Bayes Definition The Naïve Bayes classifier • The Naïve Bayes classifier is a probabilistic classifier. • We compute the probability of a document d being in a class c as follows: � P ( c | d ) ∝ P ( c , d ) = P ( c ) P ( w i | c ) 1 ≤ i ≤ n d Machine Learning: Chenhao Tan | Boulder | 14 of 33

  19. Naïve Bayes Classifier | Naïve Bayes Definition The Naïve Bayes classifier • The Naïve Bayes classifier is a probabilistic classifier. • We compute the probability of a document d being in a class c as follows: � P ( c | d ) ∝ P ( c , d ) = P ( c ) P ( w i | c ) 1 ≤ i ≤ n d Machine Learning: Chenhao Tan | Boulder | 14 of 33

  20. Naïve Bayes Classifier | Naïve Bayes Definition The Naïve Bayes classifier • The Naïve Bayes classifier is a probabilistic classifier. • We compute the probability of a document d being in a class c as follows: � P ( c | d ) ∝ P ( c , d ) = P ( c ) P ( w i | c ) 1 ≤ i ≤ n d • n d is the length of the document. (number of tokens) • P ( w i | c ) is the conditional probability of term w i occurring in a document of class c • P ( w i | c ) as a measure of how much evidence w i contributes that c is the correct class. • P ( c ) is the prior probability of c . • If a document’s terms do not provide clear evidence for one class vs. another, we choose the c with higher P ( c ) . Machine Learning: Chenhao Tan | Boulder | 14 of 33

  21. Naïve Bayes Classifier | Naïve Bayes Definition Maximum a posteriori class • Our goal is to find the “best” class. • The best class in Naïve Bayes classification is the most likely or maximum a posteriori (MAP) class c MAP : ˆ ˆ � ˆ c MAP = arg max P ( c j | d ) = arg max P ( c j ) P ( w i | c j ) c j ∈ C c j ∈ C 1 ≤ i ≤ n d • We write ˆ P for P since these values are estimates from the training set. Machine Learning: Chenhao Tan | Boulder | 15 of 33

  22. Naïve Bayes Classifier | Naïve Bayes Definition Naive Bayes Classifier: More examples This works because the coin flips are independent given the coin parameter. What about this case: • want to identify the type of fruit given a set of features: color, shape and size • color: red, green, yellow or orange (discrete) • shape: round, oval or long+skinny (discrete) • size: diameter in inches (continuous) Machine Learning: Chenhao Tan | Boulder | 16 of 33

  23. Naïve Bayes Classifier | Naïve Bayes Definition Naive Bayes Classifier: More examples Conditioned on type of fruit, these features are not necessarily independent: Given category “apple,” the color “green” has a higher probability given “size < 2”: P ( green | size < 2 , apple ) > P ( green | apple ) Machine Learning: Chenhao Tan | Boulder | 17 of 33

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend