Review Linear separability (and use of features) Class - PowerPoint PPT Presentation

Review • Linear separability (and use of features) • Class probabilities for linear discriminants � sigmoid (logistic) function • Applications: USPS, fMRI ' figure from book 1 #$& φ 2 #$% ) 0.5 #$! #$" 0 # ! ! ! " # " ! 0 0.5 1 ( φ 1 1

Review • Generative vs. discriminative � maximum conditional likelihood • Logistic regression • Weight space ! � each example adds a penalty to ' " all weight vectors that ! ,-./0)1/2/*+ & misclassify it # % � penalty is approximately $ piecewise linear ! ! ! " ! # $ # " ! ()*+ 2

Example " #%( #%' #%& #%! # ! ! ! " # " ! $ 3

–log(P(Y 1..3 | X 1..3 , W)) % " ! # ! % ! $ ! & ! $ ! % " % $ & ' ! " 4

Generalization: multiple classes • One weight vector per class: Y � {1,2,…,C} � P(Y=k) = � Z k = • In 2-class case: 5

Multiclass example 6 4 2 0 � 2 figure from book � 4 � 6 � 6 � 4 � 2 0 2 4 6 6

Priors and conditional MAP • P(Y | X, W) = � Z = • As in linear regression, can put prior on W � common priors: L 2 (ridge), L 1 (sparsity) • max w P(W=w | X, Y) 7

Software • Logistic regression software is easily available: most stats packages provide it � e.g., glm function in R � or, http://www.cs.cmu.edu/~ggordon/IRLS-example/ • Most common algorithm: Newton’s method on log-likelihood (or L 2 -penalized version) � called “iteratively reweighted least squares” � for L 1 , slightly harder (less software available) 8

Historical application: Fisher iris data P(I. virginica) petal length 9

Bayesian regression • In linear and logistic regression, we’ve looked at � conditional MLE: max w P(Y | X, w) � conditional MAP: max w P(W=w | X, Y) • But of course, a true Bayesian would turn up nose at both � why? 11

Sample from posterior & ' " ! # ! ' ! & ! % ! $ " $ #" ! " 12

Predictive distribution # "$' "$& "$% "$! " ! !" ! #" " #" !" 13

Overfitting • Overfit: training likelihood � test likelihood � often a result of overconfidence • Overfitting is an indicator that the MLE or MAP approximation is a bad one • Bayesian inference rarely overfits � may still lead to bad results for other reasons! � e.g., not enough data, bad model class, … 14

So, we want the predictive distribution • Most of the time… � Graphical model is big and highly connected � Variables are high-arity or continuous • Can’t afford exact inference � Inference reduces to numerical integration (and/ or summation) � We’ll look at randomized algorithms 15

Numerical integration $ ' ) ,-+.*/ ( ! " ! ! ! "#$ " ! "#$ "#% "#& "#' "#( " ! "#( ! "#' ! "#& ! ! "#% ! ! + * 16

2D is 2 easy! • We care about high-D problems • Often, much of the mass is hidden in a tiny fraction of the volume � simultaneously try to discover it and estimate amount 17

Application: SLAM 18

Integrals in multi-million-D Eliazar and Parr, IJCAI-03 19

Simple 1D problem )" (" $" '" &" %" !" " ! ! ! "#$ " "#$ ! 20

Uniform sampling )" (" $" '" &" %" !" " ! ! ! "#$ " "#$ ! 21

Uniform sampling E(f(X)) = • So, is desired integral • But standard deviation can be big • Can reduce it by averaging many samples • But only at rate 1/ � N 22

Importance sampling • Instead of X ~ uniform, use X ~ Q(x) • Q = • Should have Q(x) large where f(x) is large • Problem: 23

Importance sampling • Instead of X ~ uniform, use X ~ Q(x) • Q = • Should have Q(x) large where f(x) is large • Problem: � E Q ( f ( X )) = Q ( x ) f ( x ) dx 23

Importance sampling h ( x ) f ( x ) /Q ( x ) ≡ � E Q ( h ( X )) = Q ( x ) h ( x ) dx � = Q ( x ) f ( x ) /Q ( x ) dx � = f ( x ) dx 24

Importance sampling • So, take samples of h(X) instead of f(X) • W i = 1/Q(X i ) is importance weight • Q = 1/V yields uniform sampling 25

Importance sampling )" (" $" '" &" %" !" " ! ! ! "#$ " "#$ ! 26

Variance • How does this help us control variance? • Suppose: � f big � Q small • Then h = f/Q: • Variance of each weighted sample is • Optimal Q? 27

Importance sampling, part II • Suppose we want � � f ( x ) dx = P ( x ) g ( x ) dx = E P ( g ( X )) • Pick N samples X i from proposal Q(X) • Average W i g(X i ), where importance weight is � W i = 28

Importance sampling, part II • Suppose we want � � f ( x ) dx = P ( x ) g ( x ) dx = E P ( g ( X )) • Pick N samples X i from proposal Q(X) • Average W i g(X i ), where importance weight is � W i = � � E Q ( Wg ( X )) = Q ( x )[ P ( x ) /Q ( x )] g ( x ) dx = P ( x ) g ( x ) dx 28

Two variants of IS • Same algorithm, different terminology � want � f(x) dx vs. E P (f(X)) � W = 1/Q vs. W = P/Q 29

Parallel importance sampling • Suppose we want � � f ( x ) dx = P ( x ) g ( x ) dx = E P ( g ( X )) • But P(x) is unnormalized (e.g., represented by a factor graph)—know only Z P(x) 30

Parallel IS • Pick N samples X i from proposal Q(X) • If we knew W i = P(X i )/Q(X i ), could do IS • Instead, set � and, • Then: 31

Parallel IS • Final estimate: 32

Review Linear separability (and use of features) Class - PowerPoint PPT Presentation

Review Linear separability (and use of features) Class probabilities for linear discriminants sigmoid (logistic) function Applications: USPS, fMRI ' figure from book 1 #$& 2 #$% ) 0.5 #$! #$" 0 # ! ! ! "

FE Review-Transportation 1 FE Review-Transportation 2 FE Review-Transportation 3 FE

FE Review-Mechanics of Materials 1 FE Review-Mechanics of Materials 2 FE Review-Mechanics of

MTA-RF: Fabrication Readiness Review Bowring Review Daniel Bowring Lawrence Berkeley National

Keeyask Engineering Review Jan 30 2017 Project Design Review Contract Cost Review

Part 1 Part 1 I ntroduction Review of I ntroduction Review of I ntroduction, Review of I

Peer Review Process Boris Sokolov, PhD Scientific Review Officer Center for Scientific Review

SAB Review: SAB Review: IRIS Toxicological Review IRIS Toxicological Review of Acrylamide of

STATE DRUG OVERDOSE REVIEW FATALITY REVIEW TEAM November 28, 2017 Fatality Review Teams The

5-Year Review OCP Monitoring Program 5 Year Review Annual Review Five Year Review

Welcome & Introduction Welcome & Introduction Annual Review 2017 Annual Review 2017

Title I Annual Review July 7, 2014 Goal: Complete Title I Annual Review Outcomes: Review

Virginia Webb, PhD, RD Procurement Review Process First review cycle Review last

Review of the Department of Justice Review of the Department of Justice Review of the Department of

ML&P Sale Worksession #1 Plan for Transaction Review November 2 nd : Review of

SAMHSA GRANT REVIEW THE MYSTERY OF REVIEW REVEALED TENETS OF REVIEW Each application must

London Borough of Croydon Peer Review 20 th 22 nd June 2018 Review team Name Title Review

Depth-First Iterative-Deepening: An Optimal Admissible Tree Search by R. E. Korf Tsan-sheng Hsu

Superiorized Inversion of the Radon Transform Gabor T. Herman Graduate Center, City University of

7. Iterative Methods: Roots and Optima Citius, Altius, Fortius! 7. Iterative Methods: Roots and

Value Iteration 3-21-16 Reading Quiz The Q function learned by Q-learning maps ________ to

Modern Computational Statistics Lecture 2: Optimization Cheng Zhang School of Mathematical

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 4 of Data Mining by

Relational Stacked Denoising Autoencoder for Tag Recommendation Hao Wang Dept. of Computer

Class logistics Tonight midnight, the take-home exam is due. Next week: spring break

Review Linear separability (and use of features) Class - PowerPoint PPT Presentation

Review Linear separability (and use of features) Class probabilities for linear discriminants sigmoid (logistic) function Applications: USPS, fMRI ' figure from book 1 #$& 2 #$% ) 0.5 #$! #$" 0 # ! ! ! "

FE Review-Transportation 1 FE Review-Transportation 2 FE Review-Transportation 3 FE

FE Review-Mechanics of Materials 1 FE Review-Mechanics of Materials 2 FE Review-Mechanics of

MTA-RF: Fabrication Readiness Review Bowring Review Daniel Bowring Lawrence Berkeley National

Keeyask Engineering Review Jan 30 2017 Project Design Review Contract Cost Review

Part 1 Part 1 I ntroduction Review of I ntroduction Review of I ntroduction, Review of I

Peer Review Process Boris Sokolov, PhD Scientific Review Officer Center for Scientific Review

SAB Review: SAB Review: IRIS Toxicological Review IRIS Toxicological Review of Acrylamide of

STATE DRUG OVERDOSE REVIEW FATALITY REVIEW TEAM November 28, 2017 Fatality Review Teams The

5-Year Review OCP Monitoring Program 5 Year Review Annual Review Five Year Review

Welcome &amp; Introduction Welcome &amp; Introduction Annual Review 2017 Annual Review 2017

Title I Annual Review July 7, 2014 Goal: Complete Title I Annual Review Outcomes: Review

Virginia Webb, PhD, RD Procurement Review Process First review cycle Review last

Review of the Department of Justice Review of the Department of Justice Review of the Department of

ML&amp;P Sale Worksession #1 Plan for Transaction Review November 2 nd : Review of

SAMHSA GRANT REVIEW THE MYSTERY OF REVIEW REVEALED TENETS OF REVIEW Each application must

London Borough of Croydon Peer Review 20 th 22 nd June 2018 Review team Name Title Review

Depth-First Iterative-Deepening: An Optimal Admissible Tree Search by R. E. Korf Tsan-sheng Hsu

Superiorized Inversion of the Radon Transform Gabor T. Herman Graduate Center, City University of

7. Iterative Methods: Roots and Optima Citius, Altius, Fortius! 7. Iterative Methods: Roots and

Value Iteration 3-21-16 Reading Quiz The Q function learned by Q-learning maps ________ to

Modern Computational Statistics Lecture 2: Optimization Cheng Zhang School of Mathematical

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 4 of Data Mining by

Relational Stacked Denoising Autoencoder for Tag Recommendation Hao Wang Dept. of Computer

Class logistics Tonight midnight, the take-home exam is due. Next week: spring break

Welcome & Introduction Welcome & Introduction Annual Review 2017 Annual Review 2017

ML&P Sale Worksession #1 Plan for Transaction Review November 2 nd : Review of