introduction to big data and machine learning
play

Introduction to Big Data and Machine Learning Classification Dr. - PowerPoint PPT Presentation

Introduction to Big Data and Machine Learning Classification Dr. Mihail September 19, 2019 (Dr. Mihail) Intro Big Data September 19, 2019 1 / 3 Linear models for classification Goal Goal of classification: take an input vector x and assign


  1. Introduction to Big Data and Machine Learning Classification Dr. Mihail September 19, 2019 (Dr. Mihail) Intro Big Data September 19, 2019 1 / 3

  2. Linear models for classification Goal Goal of classification: take an input vector x and assign it to one of K discrete classes C k where k = 1 , . . . , K The input space is therefore divided into decision regions whose boundaries are called “ decision bounderies ” or “decision surfaces” Here, we will consider linear models where the decision boundaries are linear functions of the input vector “x” and hence are definded by D − 1-dimensional hyperplanes within the D -dimensional input space Data sets that can be separated exactly by linear decision surfaces are said to be “ linearly separable ” (Dr. Mihail) Intro Big Data September 19, 2019 2 / 3

  3. Probabilistic models For probabilistic models, the most convenient, in the case of two-class problems is the binary representation, in which there is a single target variable t = { 0 , 1 } For K > 2 classes, it is convenient to use 1 − of − K coding scheme, in which t is a vector of length K such that if the class is C j , then all elements of t k are zero exacept t j . For instance if we have 5 classes, the a patter from class 2 would be given by the target vector t = (0 , 1 , 0 , 0 , 0) T (Dr. Mihail) Intro Big Data September 19, 2019 3 / 3

  4. Using Bayes Theorem Model posterior class conditional probability: p ( C k | x ) = p ( x |C k ) p ( C k ) p ( x ) Notice the denominator is not a function of C Prior class distribution: p ( C k ) Class conditional density: p ( x |C k ) (Dr. Mihail) Intro Big Data September 19, 2019 4 / 3

  5. Discriminative models Discriminative model P ( c | x ) To train a discriminative classifier, all training examples of different classes must be jointly used to build up a single discriminative classifier Output K probabilities for K class labels in probabilistic classifiers, while a single label is produced by non-probabilistic classifier (Dr. Mihail) Intro Big Data September 19, 2019 5 / 3

  6. Discriminative classifier (Dr. Mihail) Intro Big Data September 19, 2019 6 / 3

  7. Generative classifier P ( x | c ), c = c 1 , . . . , c K , x = ( x 1 , . . . , x n ) K probabilistic models have to be trained independently Each is trained on only the examples of the same label Output K probabilities for a given input with K models “Generative” means that model can produce data via distribution sampling (Dr. Mihail) Intro Big Data September 19, 2019 7 / 3

  8. Generative classifier (Dr. Mihail) Intro Big Data September 19, 2019 8 / 3

  9. Maximum a-posteriori (MAP) For an input x , find the largest one from K probabilities output by a discriminative probabilistic classifier P ( c 1 | x ) , . . . , P ( c K | x ) Assign x to label c ∗ if P ( c ∗ | x ) is the largest Generative classification with the MAP rule: P ( c i | x ) = P ( x | c i ) P ( c i ) ∝ P ( x | c i ) P ( c i ) (1) P ( x ) (Dr. Mihail) Intro Big Data September 19, 2019 9 / 3

  10. Na¨ ıve Bayes Bayes classification P ( c | x ) ∝ P ( x | c ) P ( c ) = P ( x 1 , . . . , x n | c ) P ( c ) (2) for c = c 1 , . . . , c K (Dr. Mihail) Intro Big Data September 19, 2019 10 / 3

  11. Na¨ ıve Bayes Bayes classification P ( c | x ) ∝ P ( x | c ) P ( c ) = P ( x 1 , . . . , x n | c ) P ( c ) (2) for c = c 1 , . . . , c K Problem The joint probability P ( x 1 , . . . , x n | c ) is not feasible to learn. (Dr. Mihail) Intro Big Data September 19, 2019 10 / 3

  12. Na¨ ıve Bayes Bayes classification P ( c | x ) ∝ P ( x | c ) P ( c ) = P ( x 1 , . . . , x n | c ) P ( c ) (2) for c = c 1 , . . . , c K Problem The joint probability P ( x 1 , . . . , x n | c ) is not feasible to learn. Solution Assume all input features are class conditionally independent! (Dr. Mihail) Intro Big Data September 19, 2019 10 / 3

  13. Bayes model P ( x 1 , x 2 , . . . , x n | c ) = P ( x 1 | x 2 , . . . , x n , c ) P ( x 2 , . . . , x n | c ) = P ( x 1 | c ) P ( x 2 , . . . , x n | c ) (3) = P ( x 1 | c ) P ( x 2 | c ) . . . P ( x n | c ) (Dr. Mihail) Intro Big Data September 19, 2019 11 / 3

  14. Algorithm Discrete valued features Learning phase: Given a training set S of F features and K classes, For each target value of c i ( c i = c 1 , . . . , c K ): ˆ P ( c i ) ← estimate P ( c i ) with examples in S For every feature value x jk of each feature x j ( j = 1 , . . . , F ; k = 1 , . . . N ): ˆ P ( x j = x jk | c i ) ← estimate P ( x jk | c i ) with samples in S Output: F × K conditional probabilistic (generative) models. Test phase: Given an unknown instance x ′ = ( a ′ 1 , . . . , a ′ n ) assign label c ∗ to x ′ if [ ˆ 1 | c ∗ ) . . . ˆ n | c ∗ )] ˆ P ( c ∗ ) > [ ˆ 1 | c i ) . . . ˆ n | c i )] ˆ P ( a ′ P ( a ′ P ( a ′ P ( a ′ P ( c i ) (4) for c i � = c ∗ , c i = c 1 , . . . c K (Dr. Mihail) Intro Big Data September 19, 2019 12 / 3

  15. Example (Dr. Mihail) Intro Big Data September 19, 2019 13 / 3

  16. Learning phase (Dr. Mihail) Intro Big Data September 19, 2019 14 / 3

  17. Test phase Given a new instance, predict its label: x ′ = ( Outlook = Sunny , Temperature = Cool , Humidity = High , Wind = Strong ) Look up tables: Make decision with the MAP rule: (Dr. Mihail) Intro Big Data September 19, 2019 15 / 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend