1
play

1 Dont Make Me Get Non -Linear! A Grounding Example: Linear - PDF document

What is Machine Learning? A (Very Short) List of Applications Many different forms of Machine Learning Machine learning widely used in many contexts We focus on the problem of prediction Stock price prediction o Using economic


  1. What is Machine Learning? A (Very Short) List of Applications • Many different forms of “Machine Learning” • Machine learning widely used in many contexts  We focus on the problem of prediction  Stock price prediction o Using economic indicators, predict if stock with go up/down • Want to make a prediction based on observations  Computational biology and medical diagnosis  Vector X of m observed variables: <X 1 , X 2 , …, X m > o Predicting gene expression based on DNA o X 1 , X 2 , …, X m are called “input features/variables” o Determine likelihood for cancer using clinical/demographic data o Also called “independent variables,” but this can be misleading!  Predict people likely to purchase product or click on ad • X 1 , X 2 , …, X m need not be (and usually are not) independent o “Based on past purchases, you might want to buy…”  Based on observed X , want to predict unseen variable Y  Credit card fraud and telephone fraud detection o Y called “output feature/variable” (or the “dependent variable”) ˆ o Based on past purchases/phone calls is a new one fraudulent? Y   Seek to “learn” a function g ( X ) to predict Y: g ( X ) • Saves companies billions(!) of dollars annually o When Y is discrete, prediction of Y is called “classification”  Spam E-mail detection (gmail, hotmail, many others) o When Y is continuous, prediction of Y is called “regression” Spam, Spam… Go Away! What is Bayes Doing in My Mail Server? • This is spam: • The constant battle with spam Let’s get Bayesian on your spam: Content analysis details: (49.5 hits, 7.0 required) 0.9 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL [93.40.189.29 listed in zen.spamhaus.org] 1.5 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist [URIs: recragas.cn] 5.0 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist [URIs: recragas.cn] 5.0 URIBL_OB_SURBL Contains an URL listed in the OB SURBL blocklist [URIs: recragas.cn] 5.0 URIBL_SC_SURBL Contains an URL listed in the SC SURBL blocklist [URIs: recragas.cn] 2.0 URIBL_BLACK Contains an URL listed in the URIBL blacklist [URIs: recragas.cn] 8.0 BAYES_99 BODY: Bayesian spam probability is 99 to 100% [score: 1.0000] Who was crazy enough to think of that? “And machine -learning algorithms developed to merge and rank large sets of Google search results allow us to combine hundreds of factors to classify spam.” Source: http://www.google.com/mail/help/fightspam/spamexplained.html Training a Learning Machine The Machine Learning Process • We consider statistical learning paradigm here Learning g ( X ) Output algorithm (Classifier) (Class)  We are given set of N “training” instances Training o Each training instance is pair: (<x 1 , x 2 , …, x m >, y) data o Training instances are previously observed data X o Gives the output value y associated with each observed vector of input values <x 1 , x 2 , …, x m >  Training data: set of N pre-classified data instances  Learning: use training data to specify g ( X ) o N training pairs: (<x> (1) ,y (1) ), (<x> (2) ,y (2) ), …, (<x> ( N ) , y ( N ) ) o Generally, first select a parametric form for g ( X ) • Use superscripts to denote i -th training instance o Then, estimate parameters of model g ( X ) using training data  Learning algorithm: method for determining g ( X ) o For regression, usually want g ( X ) that minimizes E[(Y – g ( X ) ) 2 ] o Given a new input observation of X = <X 1 , X 2 , …, X m > • Mean squared error (MSE) “loss” function. (Others exist.) o Use g ( X ) to compute a corresponding output (prediction) ˆ  ( ) arg max ( | ) o For classification, generally best choice of g X P Y X o When prediction is discrete, we call g ( X ) a “classifier” and call y the output the predicted “class” of the input 1

  2. Don’t Make Me Get Non -Linear! A Grounding Example: Linear Regression   • Predict real value Y based on observing variable X • Minimize objective function 2 E [( Y aX b ) ] ˆ     Assume model is linear: ( )  Compute derivatives w.r.t. a and b Y g X aX b            2 2 E [( Y aX b ) ] E [ 2 X ( Y aX b )] 2 E [ XY ] 2 aE [ X ] 2 bE [ X ]  Training data  a  o Each vector X has one observed variable: <X 1 > (just call it X)   2         E [( Y aX b ) ] E [ 2 ( Y aX b )] 2 E [ Y ] 2 aE [ X ] 2 b  b o Y is continuous output variable  Set derivatives to 0 and solve simultaneous equations: o Given N training pairs: (<x> (1) ,y (1) ), (<x> (2) ,y (2) ), …, (<x> ( N ) , y ( N ) )   E [ XY ] E [ X ] E [ Y ] Cov ( X , Y )     y a ( X , Y )   • Use superscripts to denote i -th training instance [ 2 ] ( [ ]) 2 ( ) E X E X Var X x          Determine a and b minimizing E[(Y – g ( X ) ) 2 ] [ ] [ ] ( , ) y b E Y aE X X Y  y x  x y X  Substitution yields:       Y ( X , Y ) ( ) o First, minimize objective function:  x y x        2 2 2 [( ( )) ] [( ( )) ] [( ) ]  Estimate parameters based on observed training data: E Y g X E Y aX b E Y aX b  ˆ ˆ     ˆ y   ( ) ( , ) ( ) Y g X x X Y x X Y  ˆ x A Simple Classification Example Estimating the Joint PMF • Predict Y based on observing variable X • Given training data, compute joint PMF: p X,Y ( x , y )  MLE : count number of times each pair (x, y) appears  X has discrete value from {1, 2, 3, 4}  MAP using Laplace prior : add 1 to all the MLE counts o X denotes temperature range today: <50, 50-60, 60-70, >70  Normalize to get true distribution (sums to 1)  Y has discrete value from {rain, sun} X  Observed 50 data points: 1 2 3 4 o Y denotes general weather outlook tomorrow Y rain 5 3 2 0 ˆ ( , )  Given training data, estimate joint PMF: p x y , X Y sun 3 7 10 20 p ( x , y ) p ( x | y ) p ( y )  count in cell count in cell 1  ,  , ˆ  ˆ   Note Bayes rule: ( | ) X Y X Y Y p P Y X p MLE Laplace  total # data points total # data points total # cells ( ) ( ) p x p x X X MLE estimate Laplace (MAP) estimate ˆ ˆ    For new X, predict X X Y g ( X ) arg max P ( Y | X ) 1 2 3 4 p Y (y) 1 2 3 4 p Y (y) Y Y y o Note p x (x) is not affected by choice of y , yielding: rain 0.10 0.06 0.04 0.00 0.20 rain 0.103 0.069 0.052 0.017 0.241 ˆ ˆ ˆ ˆ ˆ     sun 0.06 0.14 0.20 0.40 0.80 sun 0.069 0.138 0.190 0.362 0.759 Y g ( X ) arg max P ( Y | X ) arg max P ( X , Y ) arg max P ( X | Y ) P ( Y ) p X (x) 0.16 0.20 0.24 0.40 1.00 y y y p X (x) 0.172 0.207 0.242 0.379 1.00 Classify New Observation Classification with Multiple Observables • Say today’s temperature is 75, so X = 4 • Say, we have m input values X = <X 1 , X 2 , …, X m >  Note that variables X 1 , X 2 , …, X m can de dependent!  Recall X temperature ranges: <50, 50-60, 60-70, >70  Prediction for Y (weather outlook tomorrow)  In theory , could predict Y as before, using ˆ ˆ ˆ ˆ   ˆ ˆ ˆ ˆ   Y arg max P ( X , Y ) arg max P ( X | Y ) P ( Y ) Y arg max P ( X , Y ) arg max P ( X | Y ) P ( Y ) y y y y MLE estimate Laplace (MAP) estimate o Why won’t this necessarily work? X X 1 2 3 4 p Y (y) 1 2 3 4 p Y (y)  Need to estimate P(X 1 , X 2 , …, X m | Y) Y Y rain 0.10 0.06 0.04 0.00 0.20 rain 0.103 0.069 0.052 0.017 0.241 o Fine if m is small, but what if m = 10 or 100 or 10,000? sun 0.06 0.14 0.20 0.40 0.80 sun 0.069 0.138 0.190 0.362 0.759 o Note: size of PMF table is exponential in m (e.g. O(2 m )) p X (x) 0.16 0.20 0.24 0.40 1.00 p X (x) 0.172 0.207 0.242 0.379 1.00 o Need ridiculous amount of data for good probability estimates!  What if we asked what is probability of rain tomorrow? o Likely to have many 0’s in table (bad times) o MLE: absolutely, positively no chance of rain!  Need to consider a simpler model o Laplace estimate: very small (~2%) chance  “never say never” 2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend