Naรฏve Bayes
Jia-Bin Huang Virginia Tech
Spring 2019
ECE-5424G / CS-5824
Nave Bayes Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / - - PowerPoint PPT Presentation
Nave Bayes Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266 Linear Regression
Jia-Bin Huang Virginia Tech
Spring 2019
ECE-5424G / CS-5824
โ๐ ๐ฆ = ๐0 + ๐1๐ฆ1 + ๐2๐ฆ2 + โฏ + ๐๐๐ฆ๐ = ๐โค๐ฆ
1 2๐ ฯ๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐
2
Repeat until convergence {๐
๐ โ ๐ ๐ โ ๐ฝ 1 ๐ ฯ๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐ ๐ฆ๐
๐ }
Can combine features; can use different functions to generate features (e.g., polynomial)
(๐ฆ0) Size in feet^2 (๐ฆ1) Number of bedrooms (๐ฆ2) Number of floors (๐ฆ3) Age of home (years) (๐ฆ4) Price ($) in 1000โs (y) 1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178 โฆ โฆ
๐ง = 460 232 315 178
Slide credit: Andrew Ng
1 2๐ ฯ๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐
2
=
1 2๐ ฯ๐=1 ๐
๐โค๐ฆ(๐) โ ๐ง ๐
2
=
1 2๐ ๐๐ โ ๐ง 2 2
๐๐ ๐พ ๐ = 0
๐ = 1 โ ๐(1) โ 1 โฎ 1 โ ๐(2) โ โฎ โ ๐(๐) โ = โ ๐๐ โ ๐๐ โ ๐๐ โฏ โ ๐๐ โ โ โ โ
๐
column space of ๐
๐๐พ ๐๐พ โ ๐
๐๐ ๐ง ๐ ๐ฆ ๐ = 1 2๐๐2 exp(โ 1 2๐2 (๐ง ๐ โ ๐โค๐ฆ ๐ )
argmin
๐
เท
๐=1 ๐
๐๐ ๐ง ๐ ๐ฆ ๐ argmin
๐
log(เท
๐=1 ๐
๐ ๐ง ๐ ๐ฆ ๐ ) = argmin
๐
1 2๐2 เท
๐=1 ๐ 1
2 ๐โค๐ฆ ๐ โ ๐ง ๐
2
Image credit: CS 446@UIUC
๐พ ๐ = 1 2๐ เท
๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐
2 = 1
๐ เท
๐=1 ๐
๐๐๐ก โ๐ ๐ฆ ๐ , ๐ง ๐
๐ง =
1 2 ๐ง โ เท
๐ง 2
2: Least squares loss
1 ๐ เท
๐=1 ๐
๐๐๐ก ๐ง ๐ , เท ๐ง
Slide credit: Andrew Ng
โ๐ ๐ฆ = ๐0 + ๐1๐ฆ1 + ๐2๐ฆ2 + โฏ + ๐๐๐ฆ๐ = ๐โค๐ฆ
1 2๐ ฯ๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐
2
Repeat until convergence {๐
๐ โ ๐ ๐ โ ๐ฝ 1 ๐ ฯ๐=1 ๐
โ๐ ๐ฆ ๐ โ ๐ง ๐ ๐ฆ๐
๐ }
Can combine features; can use different functions to generate features (e.g., polynomial)
Sample space Area = 1
๐ ๐ต|๐ถ = ๐ ๐ต, ๐ถ /๐(๐ถ)
Corollary: The chain rule
๐ ๐ต, ๐ถ = ๐ ๐ต|๐ถ ๐(๐ถ)
๐ ๐ต|๐ถ = ๐ ๐ต, ๐ถ ๐ ๐ถ = ๐ ๐ถ ๐ต ๐ ๐ต ๐(๐ถ)
Corollary: The chain rule
๐ ๐ต, ๐ถ = ๐ ๐ต|๐ถ ๐ ๐ถ = P B P A B
Thomas Bayes
๐ ๐ต|๐ถ = ๐ ๐ถ ๐ต ๐ ๐ต ๐(๐ถ) ๐ ๐ต|๐ถ, ๐ = ๐ ๐ถ ๐ต, ๐ ๐ ๐ต, ๐ ๐(๐ถ, ๐) ๐ ๐ต|๐ถ = ๐ ๐ถ ๐ต ๐ ๐ต ๐ ๐ถ ๐ต ๐ ๐ต + ๐ ๐ถ ~๐ต ๐(~๐ต)
๐ ๐ต|๐ถ = ๐ ๐ถ ๐ต ๐ ๐ต ๐ ๐ถ ๐ต ๐ ๐ต + ๐ ๐ถ ~๐ต ๐(~๐ต)
B = you just coughed
๐ ๐ต|๐ถ = 0.8 ร 0.05 0.8 ร 0.05 + 0.2 ร 0.95 ~0.17
Slide credit: Tom Mitchell
Hypothesis
A B C Prob 0.30 1 0.05 1 0.10 1 1 0.05 1 0.05 1 1 0.10 1 1 0.25 1 1 1 0.10
Slide credit: Tom Mitchell
involving these variables
ฯrows matching E1 and ๐น2 ๐ row ฯrows matching ๐น2 ๐ row
Slide credit: Tom Mitchell
2100 โฅ 1030 109
Slide credit: Tom Mitchell
how we estimate probabilities from sparse data
how to represent joint distributions
Slide credit: Tom Mitchell
๐ ๐ = 1 = ?
๐ ๐ = 1 = ?
๐ = 1 ๐ =0
Slide credit: Tom Mitchell
Choose ๐ that maximizes probability of observed data
เทก ๐พMLE = argmax
๐
๐(๐ธ๐๐ข๐|๐)
Choose ๐ that is most probable given prior probability and data
เทก ๐พMAP = argmax
๐
๐ ๐ ๐ธ = argmax
๐
๐ ๐ธ๐๐ข๐ ๐ ๐ ๐ ๐(๐ธ๐๐ข๐)
Slide credit: Tom Mitchell
Choose ๐ that maximizes ๐ ๐ธ๐๐ข๐ ๐ เทก ๐พMLE = ๐ฝ1 ๐ฝ1 + ๐ฝ0
Choose ๐ that maximize ๐ ๐ ๐ธ๐๐ข๐ เทก ๐พMAP = (๐ฝ1 + #halluciated 1s) (๐ฝ1+#halluciated 1๐ก) + (๐ฝ0 + #halluciated 0s)
Slide credit: Tom Mitchell
๐ โผ Bernoulli: ๐ ๐ = ๐๐ 1 โ ๐ 1โ๐
flips, produces ๐ฝ1 ones, ๐ฝ0 zeros ๐ ๐ธ ๐ = ๐ ๐ฝ1, ๐ฝ0 ๐ = ๐๐ฝ1 1 โ ๐ ๐ฝ0 เทก ๐พ = argmax
๐
๐(๐ธ|๐) = ๐ฝ1 ๐ฝ1 + ๐ฝ0
๐ = 1 ๐ =0 ๐ ๐ = 1 = ๐ ๐ ๐ = 0 = 1 โ ๐
Slide credit: Tom Mitchell
1 ๐ถ(๐พ1,๐พ0) ๐๐พ1โ1 1 โ ๐ ๐พ0โ1
Slide credit: Tom Mitchell
produces ๐ฝ1 ones, ๐ฝ0 zeros ๐ ๐ธ ๐ = ๐ ๐ฝ1, ๐ฝ0 ๐ = ๐๐ฝ1 1 โ ๐ ๐ฝ0
๐ ๐ = ๐ถ๐๐ข๐ ๐พ1, ๐พ0 = 1 ๐ถ(๐พ1, ๐พ0) ๐๐พ1โ1 1 โ ๐ ๐พ0โ1 เทก ๐พ = argmax
๐
๐ ๐ธ ๐ P(๐) = ๐ฝ1 + ๐พ1 โ 1 (๐ฝ1 + ๐พ1 โ 1) + (๐ฝ0 + ๐พ0 โ 1)
๐ = 1 ๐ =0
Slide credit: Tom Mitchell
Prior ๐ ๐ is the conjugate prior for a likelihood function๐ ๐ธ๐๐ข๐ ๐ if the prior ๐ ๐ and the posterior ๐ ๐ ๐ธ๐๐ข๐ have the same form.
Likelihood ๐ ๐ธ๐๐ข๐ ๐ : Binomial ๐๐ฝ1 1 โ ๐ ๐ฝ0
Slide credit: Tom Mitchell
๐๐ and ๐ are Boolean random variables To estimate ๐ ๐ ๐1, โฏ , ๐๐) When ๐ = 2 (Gender, Hours-worked)? When ๐ = 30?
Slide credit: Tom Mitchell
๐ ๐|๐ = ๐ ๐ ๐ ๐ ๐ ๐(๐)
2๐ โ 1 ร 2
1
Slide credit: Tom Mitchell
๐ ๐1, โฏ , ๐๐ ๐ = เท
๐=1 ๐
๐(๐
๐|๐)
๐ are conditionally independent
given ๐ for ๐ โ ๐
Slide credit: Tom Mitchell
probability distribution governing ๐ is independent of the value of ๐, given the value of ๐ โ๐, ๐, ๐ ๐ ๐ = ๐ฆ๐ ๐ = ๐ง๐, ๐ = ๐จ๐) = ๐(๐ = ๐ฆ๐|๐๐) ๐ ๐ ๐, ๐ = ๐(๐|๐) Example: ๐ Thunder Rain, Lightning = ๐(Thunder|Lightning)
Slide credit: Tom Mitchell
e.g., ๐ ๐1 ๐2, ๐ = ๐(๐1|๐) ๐ ๐1, ๐2 ๐ = ๐ ๐1 ๐2, ๐ ๐ ๐2 ๐ = ๐ ๐1 ๐ ๐(๐2|๐) General form: ๐ ๐1, โฏ , ๐๐ ๐ = ฯ๐=1
๐
๐(๐
๐|๐)
How many parameters to describe ๐ ๐1, โฏ , ๐๐ ๐ ? ๐(Y)?
Slide credit: Tom Mitchell
๐ ๐ = ๐ง๐ ๐1, โฏ , ๐๐) = ๐(๐ = ๐ง๐)๐(๐1, โฏ , ๐๐ ๐ = ๐ง๐ ฯ๐ ๐ ๐ = ๐ง๐ ๐ ๐1, โฏ , ๐๐ ๐ = ๐ง๐
๐ ๐ = ๐ง๐ ๐1, โฏ , ๐๐) = ๐ ๐ = ๐ง๐ ฮ ๐๐ ๐๐ ๐ = ๐ง๐) ฯ๐ ๐ ๐ = ๐ง๐ ฮ ๐๐ ๐๐ ๐ = ๐ง๐)
เท ๐ โ argmax
๐ง๐
๐ ๐ = ๐ง๐ ฮ ๐๐ ๐๐ ๐ = ๐ง๐)
Slide credit: Tom Mitchell
Estimate ๐๐ = ๐(๐ = ๐ง๐) For each value xij of each attribute Xi
Estimate ๐๐๐๐ = ๐(๐๐ = ๐ฆ๐๐๐|๐ = ๐ง๐)
เท ๐ โ argmax
๐ง๐
๐ ๐ = ๐ง๐ ฮ ๐๐ ๐๐
test ๐ = ๐ง๐)
เท ๐ โ argmax
๐ง๐
๐๐ ฮ ๐๐๐๐๐
เท ๐๐ = เท ๐ ๐ = ๐ง๐ = #๐ธ ๐ = ๐ง๐ ๐ธ แ ๐๐๐๐ = เท ๐ ๐๐ = ๐ฆ๐๐ ๐ = ๐ง๐ = #๐ธ ๐๐ = ๐ฆ๐๐ ^ ๐ = ๐ง๐ #๐ธ{๐ = ๐ง๐}
Slide credit: Tom Mitchell
๐ ๐บ = 1 = ๐ ๐ = 1|๐บ = 1 = ๐ ๐ = 1|๐บ = 0 = ๐ ๐ธ = 1|๐บ = 1 = ๐ ๐ธ = 1|๐บ = 0 = ๐ ๐ป = 1|๐บ = 1 = ๐ ๐ป = 1|๐บ = 0 = ๐ ๐บ = 0 = ๐ ๐ = 0|๐บ = 1 = ๐ ๐ = 0|๐บ = 0 = ๐ ๐ธ = 0|๐บ = 1 = ๐ ๐ธ = 0|๐บ = 0 = ๐ ๐ป = 0|๐บ = 1 = ๐ ๐ป = 0|๐บ = 0 = ๐ ๐บ|๐, ๐ธ, ๐ป = ๐ ๐บ P S F P D F P(G|F)
& Pazzani, 1996])
๐ ๐ = ๐ง๐ ๐1, โฏ , ๐๐) โ ๐ ๐ = ๐ง๐ ฮ ๐๐ ๐๐ ๐ = ๐ง๐)
Slide credit: Tom Mitchell
MLE estimate for ๐ ๐๐ ๐ = ๐ง๐) might be zero. (for example, ๐๐ = birthdate. ๐๐ = Feb_4_1995)
๐ ๐ = ๐ง๐ ๐1, โฏ , ๐๐) โ ๐ ๐ = ๐ง๐ ฮ ๐๐ ๐๐ ๐ = ๐ง๐)
Slide credit: Tom Mitchell
เท ๐๐ = เท ๐ ๐ = ๐ง๐ = #๐ธ ๐ = ๐ง๐ ๐ธ แ ๐๐๐๐ = เท ๐ ๐๐ = ๐ฆ๐๐ ๐ = ๐ง๐ = #๐ธ ๐๐ = ๐ฆ๐๐, ๐ = ๐ง๐ #๐ธ{๐ = ๐ง๐}
เท ๐๐ = เท ๐ ๐ = ๐ง๐ = #๐ธ ๐ = ๐ง๐ + (๐พ๐โ1) ๐ธ + ฯ๐(๐พ๐โ1) แ ๐๐๐๐ = เท ๐ ๐๐ = ๐ฆ๐๐ ๐ = ๐ง๐ = #๐ธ ๐๐ = ๐ฆ๐๐, ๐ = ๐ง๐ + (๐พ๐ โ1) #๐ธ{๐ = ๐ง๐} + ฯ๐(๐พ๐โ1)
Slide credit: Tom Mitchell
๐ ๐๐ = ๐ฆ ๐ = ๐ง๐ = 1 2๐๐๐๐ exp(โ ๐ฆ โ ๐๐๐ 2๐๐๐
2 2
)
Slide credit: Tom Mitchell
Estimate ๐๐ = ๐(๐ = ๐ง๐) For each attribute Xi estimate Class conditional mean ๐๐๐, variance ๐๐๐
เท ๐ โ argmax
๐ง๐
๐ ๐ = ๐ง๐ ฮ ๐๐ ๐๐
test ๐ = ๐ง๐)
เท ๐ โ argmax
๐ง๐
๐๐ ฮ ๐ ๐๐๐ ๐๐๐(๐๐
test, ๐๐๐, ๐๐๐)
Slide credit: Tom Mitchell
๐ ๐ = ๐ง๐ ๐1, โฏ , ๐๐) โ ๐ ๐ = ๐ง๐ ฮ ๐๐ ๐๐ ๐ = ๐ง๐)