Introduction to Data Science: Logistic 0 1 1 according to a data - PowerPoint PPT Presentation

Linear models for classi�cation Classi�er evaluation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Classi�er evaluation Summary Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Classi�er evaluation Linear models for classi�cation Summary Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Classi�er evaluation Linear models for classi�cation Classi�er evaluation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Linear models for classi�cation Classi�er evaluation Linear models for classi�cation An example classi�cation problem Logistic regression Making predictions Why not linear regression? Multiple logistic regression Classi�cation as probability estimation problem and a multiple logistic regression: Instead we build a linear model of log-odds : As in multiple linear regression it is essential to avoid confounding! . Usually, we do this by minimizing the negative of the log likelihood of the In the credit default case we may want to increase TPR (recall, make Here is how we compute a logistic regression model in R This leads to a natural question: Can we adjust our classifiers TPR and How do we determine how well classifiers are performing? Using these we can define statistics that describe classifier performance We need a more precise language to describe classification mistakes: Interpretation of logistic regression models is slightly different than the As before, we can do hypothesis testing of a relationship between As before, the P-value is the probability of seeing a Z-value as large The general classification setting is: can we predict categorical We require an algorithm required to estimate parameters A way of describing the TPR and FPR tradeoff is by using the ROC Consider an example of single logistic regression of default vs. student For binary (0/1) responses, it's a little better. We approach classification as a class probability estimation problem. Error and accuracy statistics are not enough to understand classifier and As before, the accuracy of as an estimate of the population fit <- glm(default ~ balance + income + student, data=Default, family="binomial") ^ Introduction to Data Science: Logistic β 0 β 1 β 1 according to a data fit criterion. account balance and the probability of default. model. I.e.: solve the following optimization problem FPR? linear regression model we looked at. (e.g., 24.95) under the null hypothesis that there is no relationship curve (Receiver Operating Characteristic) and the AUROC (area under response/output sure we catch all defaults) at the expense of FPR (1-Specificity, clients status: performance. , from set of predictors ? parameter is given its standard error. fit %>% Y X 1 , X 2 , … , X p We could use linear regression in this setting and interpret response One way is to compute the error rate of the classifier, the percent of Logistic regression partition predictor space with linear functions. An individual's choice of transportation mode to commute to work. We can use a learned logistic regression model to make predictions. The basic idea behind logistic regression is to build a linear model This is a classification analog to linear regression: Why can't we use linear regression in the classification setting. Instead of modeling classes 0 or 1 directly, we will model the conditional Regression Name Definition True Class + True Class - Synonyms Total we lose because we think they will default) between balance and the probability of defaulting , i.e., the ROC) fit2 <- glm(default ~ balance + income + student, data=Default, family="binomial") default_fit <- glm(default ~ balance, data=Default, family=binomial) in the p ( x ) Y tidy() β 1 = 0 Remember we are classifying Yes if Classifications can be done using probability cutoffs to trade, e.g., TPR- In this case, the odds that a person defaults increase by In logistic regression we use the Bernoulli probability model we saw As in the regression case, we assume training data mistakes it makes when predicting class as a probability (e.g, if predict ) log = β 0 + β 1 x We can again construct a confidence interval for this estimate as we've related to class probability E.g., "on average, the probability that a person with a balance of $1,000 , since linear regression directly (i.e. , and classify based on this ) In this case, we use a -statistic ^ which plays the role of the t- fit2 %>% tidy() default_fit %>% fit1 <- glm(default ~ student, data=Default, family="binomial") e 0.05 ≈ 1.051 population. − y i f ( x i ) + log(1 + e f ( x i ) ) False Positive Rate (FPR) FP / N Predicted Class + True Positive (TP) β 1 Type-I error, 1-Specificity False Positive (FP) P* ^ 1 − p ( x ) y > 0.5 drugoverdose Logistic regression learns parameter using Maximum Likelihood min β 0 , β 1 ∑ p ( x ) p ( Y = 1| X = x ) p ( x ) = β 0 + β 1 x Predictors: income, cost and time required for each of the alternatives: For categorical responses with more than two values, if order and scale Z for every dollar in their account balance. FPR (ROC curve), or precision-recall (PR curve). previously (think of flipping a coin weighted by Another metric that is frequently used to understand classification errors . In this case, however, responses ), and estimate are done before. doesn't work. probability. defaults is": SE(^ p ( x ) tidy() fit1 %>% tidy() Héctor Corrada Bravo β 1 ) ## # A tibble: 4 x 5 i : y i =1 ( x 1 , y 1 ), … , ( x n , y n ) p ( x ) y i statistic in linear regression: a scaled measure of our estimate (signal / (numerical optimization) Predicted Class - False Negative (FN) True Negative (TN) N* 1 - Type-II error, power, driving/carpooling, biking, taking a bus, taking the train. (units) don't make sense, then it's not a regression problem log = β 0 + β 1 x 1 + ⋯ + β p x p and tradeoffs is the precision-recall curve: parameters to maximize the likelihood of the observed training data categorical and take one of a fixed set of values. P ( Y = Yes | X ) ## # A tibble: 4 x 5 True Positive Rate (TPR) TP / P 1 − p ( x ) ## term estimate std.error statistic where . This is a non-linear (but convex) noise). Area under ROC or PR curve summarize classifier performance across log > 0 ⇒ P ( Y = Yes | X ) > 0.5 sensitivity, recall In general, classification approaches use discriminant (think of scoring ) under this coin flipping (binomial) model. Total P N e ^ β 0 +^ f ( x i ) = β 0 + β 1 x i ## term estimate std.error statistic ## # A tibble: 2 x 5 ## # A tibble: 2 x 5 University of Maryland, College Park, USA β 1 ×1000 e −10.6514+0.0055×1000 P ( Y = No | X ) Response: whether the individual makes their commute by car, bike, bus ## <chr> <dbl> <dbl> <dbl> optimization problem. different cutoffs. functions to do classification. Positive Predictive Value precision , 1-false discovery ^ p (1000) = ≈ ≈ 0.00576 2020-04-05 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> ## term estimate std.error statistic What would happen if we use ? 1 + e β 0 + β 1 ×1000 1 + e −10.6514+0.0055×1000 or train. (PPV) TP / P* ## 1 (Int… -1.09e+1 4.92e-1 -22.1 P ( Y = Yes | X ) > 0.2 proportion ## <chr> <dbl> <dbl> <dbl> <dbl> ## <chr> <dbl> <dbl> <dbl> ## 1 (Int… -1.09e+1 4.92e-1 -22.1 Logistic regression is one way of estimating the class probability ## 2 bala… 5.74e-3 2.32e-4 24.7 ## 2 bala… 5.74e-3 2.32e-4 24.7 ## 1 (Int… -3.50 0.0707 -49.6 0. ## 1 (Int… -1.07e+1 0.361 -29.5 Negative Predicitve Value (also denoted ) ## 3 inco… 3.03e-6 8.20e-6 0.370 (NPV) FN / N* p ( Y = 1| X = x ) p ( x ) ## 3 inco… 3.03e-6 8.20e-6 0.370 ## 2 bala… 5.50e-3 0.000220 25.0 ## 2 stud… 0.405 0.115 3.52 4.31e-4 ## 4 stud… -6.47e-1 2.36e-1 -2.74 32 / 33 30 / 33 23 / 33 33 / 33 31 / 33 26 / 33 10 / 33 14 / 33 25 / 33 18 / 33 19 / 33 24 / 33 27 / 33 17 / 33 16 / 33 15 / 33 22 / 33 13 / 33 20 / 33 28 / 33 12 / 33 29 / 33 21 / 33 11 / 33 9 / 33 1 / 33 2 / 33 7 / 33 8 / 33 3 / 33 4 / 33 5 / 33 6 / 33 ## 4 stud… -6.47e-1 2.36e-1 -2.74 ## # … with 1 more variable: p.value <dbl> ## # … with 1 more variable: p.value <dbl>

Linear models for classi�cation The general classification setting is: can we predict categorical response/output , from set of predictors ? Y X 1 , X 2 , … , X p As in the regression case, we assume training data . In this case, however, responses are ( x 1 , y 1 ), … , ( x n , y n ) y i categorical and take one of a fixed set of values. 1 / 33

Linear models for classi�cation 2 / 33

Linear models for classi�cation An example classi�cation problem An individual's choice of transportation mode to commute to work. Predictors: income, cost and time required for each of the alternatives: driving/carpooling, biking, taking a bus, taking the train. Response: whether the individual makes their commute by car, bike, bus or train. 3 / 33

Introduction to Data Science: Logistic 0 1 1 according to a data - PowerPoint PPT Presentation

Linear models for classication Classier evaluation Linear models for classication Linear models for classication Linear models for classication Linear models for classication Linear models for classication Linear models

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

2015 Schield Logistic MLE1C Excel2013 8/18/2016 V0D V0D V0D 2015 Schield Logistic MLE 1C

2015 Schield Logistic MLE1A Excel2013 10/29/2015 V0D V0D V0D 2015 Schield Logistic MLE 1A

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Excel2013: Model Logistic MLE 1Y1X Sept 2015 V1A V1A V1A Excel2013 Model Logistic MLE 1Y1X

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Spruce Budworm Eddie Koch May 14th, 2008 Eddie Koch Spruce Budworm Logistic Equation Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

XL4A: Logistic Model using OLS1A in Excel 2013 1 Mar 2017 V0E 2x XL4A: V0E2x XL4A: V0E2x 2015

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Active Power Balance and Inertia Response Laura Ramrez Elizondo Learning Objectives How

Exelon Corporation (EXC) Jack Thayer, Sr. Executive Vice President and CFO Wolfe e Res esea

BIND configuration Computer Center, CS, NCTU BIND BIND the Berkeley Internet Name

BIND Part 2 pschiu Computer Center, CS, NCTU BIND Configuration named.conf view (1) q The

Mapping pipelined applications with replication to increase throughput and reliability Anne Benoit

New Developments in the Bundesbanks RDSC Linking various sources as a key for unleashing more

Key Topics Today: What changes do you intend on making to your 2017 farmland lease arrangements,

Caution regarding forward-looking statements From time to time, the Bank makes written and oral