Sample stascs and linear regression NEU 466M - PowerPoint PPT Presentation

Sample ¡sta*s*cs ¡and ¡linear ¡regression ¡ ¡ NEU ¡466M ¡ Instructor: ¡Professor ¡Ila ¡R. ¡Fiete ¡ Spring ¡2016 ¡

Mean ¡ { x 1 , · · · , x N } N ¡samples ¡of ¡variable ¡x ¡ N h x i ⌘ 1 X x i sample mean N i =1 mean ( x ) other notation: ¯ x

Binned ¡version ¡of ¡mean ¡ { x 1 , · · · , x N } N ¡samples ¡of ¡variable ¡x ¡ { c 1 , · · · c B } , B bins { n 1 , · · · n B } counts per bin B h x i ⌘ 1 X n i c i sample mean N i =1

Variance ¡ { x 1 , · · · , x N } N 1 X h ( x � h x i ) 2 i ⌘ ( x i � h x i ) 2 sample variance N � 1 a ¡measure ¡of ¡the ¡“scaJer”/spread ¡ ¡ i =1 of ¡the ¡data ¡around ¡its ¡mean ¡value ¡ homework: show that h ( x � h x i ) 2 i = h x 2 i � h x i 2

Standard ¡devia*on ¡ { x 1 , · · · , x N } p h ( x � h x i ) 2 standard deviation

Covariance ¡ { x 1 , · · · , x N } { y 1 , · · · , y N } N ¡samples ¡each ¡of ¡ variables ¡x, ¡y ¡ N 1 X C ( x, y ) ⌘ ( x i � h x i )( y i � h y i ) N � 1 i =1 sample covariance ( C ( x, x ) is simply sample variance of x )

Covariance: ¡what ¡does ¡it ¡measure? ¡ N 1 X C ( x, y ) ⌘ ( x i � h x i )( y i � h y i ) N � 1 i =1 • If ¡x, ¡y ¡ ¡both ¡deviate ¡from ¡their ¡means ¡together ¡(both ¡up ¡then ¡both ¡ down) ¡then ¡terms ¡in ¡sum ¡are ¡posi*ve, ¡C(x,y) ¡> ¡0. ¡ • If ¡x,y ¡deviate ¡from ¡their ¡means ¡independent ¡of ¡each ¡other, ¡then ¡ terms ¡in ¡the ¡sum ¡are ¡randomly ¡posi*ve ¡and ¡nega*ve, ¡C(x,y) ¡~=0. ¡ • If ¡x,y ¡deviate ¡from ¡their ¡means ¡in ¡opposite ¡direc*ons, ¡then ¡terms ¡ in ¡sum ¡are ¡nega*ve, ¡C(x,y) ¡< ¡0. ¡ ¡ Literally, ¡covariance ¡is ¡a ¡measure ¡of ¡co-‑varia*on. ¡ ¡

Covariance ¡example ¡I ¡ x, y independent 4 x = randn ( 1000 , 1 ) y = randn ( 1000 , 1 ) 3 2 C ( x, y ) = 0 . 009; 1 009; C ( x, x ) = 1 . 069 0 y − 1 − 2 − 3 − 4 − 4 − 3 − 2 − 1 0 1 2 3 4 x x > 0 , y around 0 without bias

Covariance ¡example ¡II ¡ x, y independent 4 x = 0 . 2 ∗ randn ( 1000 , 1 ) y = 0 . 2 ∗ randn ( 1000 , 1 ) 3 2 1 0 y − 1 − 2 − 3 − 4 − 4 − 3 − 2 − 1 0 1 2 3 4 x C ( x, y ) = 0 . 001; C ( x, x ) = 0 . 0407

Covariance ¡example ¡III ¡ x, y not independent x = randn ( 1000 , 1 ) y = 0 . 5 ∗ x + 0 . 5 ∗ randn ( 1000 , 1 ) 2.5 2 1.5 1 0.5 y 0 − 0.5 − 1 x > 0 , y > 0 − 1.5 − 2 − 3 − 2 − 1 0 1 2 3 4 x C ( x, x ) = 0 . 907; C ( x, y ) = 0 . 464; C ( y, y ) = 0 . 469

Alterna*ve ¡nota*on ¡ • Mean: ¡ ¡ h x i , ¯ x, µ x , E ( x ) • Variance: ¡ h x 2 i � h x i 2 , x 2 � ¯ x 2 , σ 2 x , var ( x ) , C ( x, x ) • Covariance: ¡ ¡ y, σ 2 h xy i � h x ih y i , xy � ¯ x ¯ xy , cov ( x ) , C ( x, y ) • Standard ¡devia*on ¡ q x 2 � ¯ p h x 2 i � h x i 2 , x 2 , σ x , std ( x )

Pearson’s ¡correla*on ¡coefficient ¡ ⌦ ↵ ( x � h x i )( y � h y i ) ρ ( x, y ) = p h ( x � h x i ) 2 ih ( x � h x i ) 2 i ρ ( x, y ) = C ( x, y ) shorter-‑form ¡nota*on ¡ σ x σ y

Pearson’s ¡correla*on ¡coefficient ¡and ¡ covariance ¡only ¡measure ¡ linear ¡dependency ¡ from: ¡hJps://en.wikipedia.org/wiki/Correla*on_and_dependence ¡ ¡

Robust ¡sta*s*cs? ¡ • Mean, ¡variance ¡are ¡easy ¡to ¡compute, ¡widely ¡ used/useful. ¡ ¡ • But ¡not ¡robust: ¡sensi*ve ¡to ¡outliners. ¡ • More ¡robust ¡alterna*ve ¡to ¡mean: ¡median. ¡ ¡

Applica*on ¡ LINEAR ¡REGRESSION ¡IN ¡TERMS ¡OF ¡ SAMPLE ¡STATISTICS ¡

Regression: ¡curve-‑fi`ng ¡ Scalar ¡explanatory ¡variable ¡(X) ¡and ¡response ¡variable ¡(Y); ¡N ¡samples ¡ { ( x 1 , y 1 ) , ( x 2 , y 2 ) , · · · , ( x N , y N ) } M y ( x ) = w 0 + w 1 x + · · · + w M x M = X w j x j ˜ j =0 free parameters: ( w 0 , w 1 , · · · , w M )

Linear ¡least-‑squares ¡regression ¡ N E = 1 X y ( x n ; w ) − y n ] 2 [˜ 2 n =1 N M = 1 M=1 ¡for ¡linear ¡ ¡ X X w j x j n − y n ] 2 [ regression ¡ 2 n =1 j =0 N = 1 [ w 0 + w 1 x n − y n ] 2 X 2 n =1 To ¡solve ¡for ¡best ¡w 0 , ¡w 1 : ¡ ¡ dE dE dw 0 = 0 , dw 1 = 0

Linear ¡least-‑squares ¡regression ¡ N E = 1 [ w 0 + w 1 x n − y n ] 2 X 2 n =1 N dE [ w 0 + w 1 x n � y n ] X dw 0 = n =1 = Nw 0 + Nw 1 h x i � N h y i = 0 w 0 + w 1 h x i � h y i = 0 (1)

Linear ¡least-‑squares ¡regression ¡ N E = 1 [ w 0 + w 1 x n − y n ] 2 X 2 n =1 N dE [ w 0 + w 1 x n � y n ] x n X dw 1 = n =1 = Nw 0 h x i + Nw 1 h x 2 i � N h xy i = 0 w 0 h x i + w 1 h x 2 i � h xy i = 0 (2)

Linear ¡least-‑squares ¡regression ¡ w 1 = C ( x, y ) slope C ( x, x ) w 0 = h y i � w 1 h x i y − intercept In ¡homework: ¡check ¡matlab’s ¡polyfit ¡with ¡this ¡op*mal ¡expression ¡for ¡linear-‑least ¡squares ¡fi`ng. ¡

Linear ¡least-‑squares ¡regression ¡ w 1 = C ( x, y ) slope C ( x, x ) w 0 = h y i � w 1 h x i y − intercept ρ ( x, y ) = C ( x, y ) Contrast ¡with ¡w 1 : ¡Pearson’s ¡correla*on ¡ σ x σ y Different ¡normaliza*ons: ¡ ¡ • Different ¡correla*on ¡coefficient ¡for ¡same ¡slope ¡but ¡different ¡amounts ¡of ¡x,y-‑scaJer. ¡ ¡ • Same ¡correla*on ¡for ¡different ¡slopes ¡and ¡different ¡x,y ¡scaJer. ¡ ¡ ¡ • Correla*on: ¡more ¡strongly ¡penalizes ¡y-‑scaJer, ¡more ¡weakly ¡penalizes ¡x-‑scaJer. ¡ ¡

Slope ¡versus ¡Pearson’s ¡correla*on ¡coefficient ¡ same ¡slope ¡ different ¡ρ ¡ different ¡ ¡ slope, ¡same ¡ρ ¡ from: ¡hJps://en.wikipedia.org/wiki/Correla*on_and_dependence ¡ ¡

Applica*on ¡ BACK ¡TO ¡SAMPLE ¡STATISTICS: ¡ MULTIVARIATE ¡

Mul*ple ¡variables: ¡covariance ¡matrix ¡ { x α 1 , · · · , x α N } N ¡samples ¡of ¡the ¡αth ¡variable ¡x α ¡ K ¡different ¡variables ¡x α ¡ , ¡labeled ¡by ¡α, ¡β ¡ ¡= ¡{1,…,K}: ¡ ¡ N 1 X C αβ ⌘ ( x α i � h x α i )( x β i � h x β i ) N � 1 i =1 = cov ( x α , x β ) K × K dim since K variables sample covariance matrix

Covariance ¡matrix ¡ • (α,β) ¡element ¡is ¡covariance ¡between ¡x α , ¡x β . ¡ ¡ • Diagonal ¡of ¡covariance ¡matrix ¡is ¡variance ¡of ¡each ¡ variable: ¡ var (x α ) ¡or ¡C(x α , ¡ x α ). ¡ • K 2 ¡entries ¡total, ¡but ¡only ¡half ¡of ¡off-‑diagonal ¡terms ¡are ¡ independent ¡because ¡of ¡symmetry ¡(C(x β , ¡ x α )= ¡C(x α , ¡ x β )). ¡ • Thus ¡only ¡(K 2 -‑K)/2 ¡+ ¡K ¡= ¡K(K+1)/2 ¡independent ¡terms. ¡ ¡ ¡ Q’s: ¡How ¡do ¡do ¡linear ¡regression ¡in ¡mul*variate ¡case? ¡Will ¡it ¡involve ¡covariance ¡matrix? ¡

Covariance ¡example ¡I ¡ x, y independent 4 x = randn ( 1000 , 1 ) y = randn ( 1000 , 1 ) 3 2 1 0 y − 1 − 2  � 0 . 959 0 . 009 − 3 C = 0 . 009 1 . 069 − 4 − 4 − 3 − 2 − 1 0 1 2 3 4 x

Covariance ¡example ¡III ¡ x, y not independent x = randn ( 1000 , 1 ) y = 0 . 5 ∗ x + 0 . 5 ∗ randn ( 1000 , 1 ) 2.5 2 1.5 1 0.5 y 0 − 0.5 − 1  � 0 . 907 0 . 464 C = − 1.5 0 . 464 0 . 469 − 2 − 3 − 2 − 1 0 1 2 3 4 x

Summary ¡ • Defined ¡sample ¡mean ¡and ¡variance ¡of ¡a ¡ variable ¡ • Defined ¡covariance ¡between ¡a ¡pair ¡of ¡ variables ¡ • Solved ¡op*mal ¡(least-‑squares) ¡linear ¡ regression ¡between ¡two ¡variables ¡in ¡terms ¡of ¡ mean, ¡covariance ¡ • Covariance ¡matrix: ¡covariance ¡between ¡all ¡ ¡ ¡ ¡ ¡ ¡K(K+1)/2 ¡unique ¡pairs ¡of ¡K ¡variables ¡

Sample stascs and linear regression NEU 466M - PowerPoint PPT Presentation

Sample stascs and linear regression NEU 466M Instructor: Professor Ila R. Fiete Spring 2016 Mean { x 1 , , x N } N samples of variable x N h x

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Sta 101 - Fall

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Active Regression via Linear-Sample Sparsification Xue Chen Eric Price UT Austin Xue Chen, Eric

Sta$s$cs Sta$s$cs Fourth Dimension of a Sta$s$cal Programmer

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

MA/CSSE 473 Day 06 Euclid's Algorithm MA/CSSE 473 Day 06 Student Questions Odd Pie Fight

CS 301 Lecture 06 Nonregular languages and the pumping lemma Stephen Checkoway February 5,

Used in SQL recursion. 2. Logical rules form the basis for man y information-in

Strings, Languages, and Regular expressions Lecture 2 1 Strings 2 Definitions for strings

Irreducible decomposition of binomial ideals Christopher ONeill Duke University

Lecture 4.6: Some special orthogonal functions Matthew Macauley Department of Mathematical

Leadscrew Slides Steel Extended Contact Bearing Stageses Stainless Steel Extended Contact

Quantum Fluctuation of Conductivities in Quantum Hall Effect Manabu Machida (U. Tokyo, Japan)

Sample sta*s*cs and linear regression NEU 466M - PowerPoint PPT Presentation

Sample sta*s*cs and linear regression NEU 466M Instructor: Professor Ila R. Fiete Spring 2016 Mean { x 1 , , x N } N samples of variable x N h x

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Unit 7: Multiple linear regression 1. Introduction to multiple linear regression Sta 101 - Fall

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Active Regression via Linear-Sample Sparsification Xue Chen Eric Price UT Austin Xue Chen, Eric

Sta$s$cs Sta$s$cs Fourth Dimension of a Sta$s$cal Programmer

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Sample 2 Inlet in western (Sunset) Bay 0 Sample 3 Inlet behind Christian Island 1 Sample

MA/CSSE 473 Day 06 Euclid's Algorithm MA/CSSE 473 Day 06 Student Questions Odd Pie Fight

CS 301 Lecture 06 Nonregular languages and the pumping lemma Stephen Checkoway February 5,

Used in SQL recursion. 2. Logical rules form the basis for man y information-in

Strings, Languages, and Regular expressions Lecture 2 1 Strings 2 Definitions for strings

Irreducible decomposition of binomial ideals Christopher ONeill Duke University

Lecture 4.6: Some special orthogonal functions Matthew Macauley Department of Mathematical

Leadscrew Slides Steel Extended Contact Bearing Stageses Stainless Steel Extended Contact

Quantum Fluctuation of Conductivities in Quantum Hall Effect Manabu Machida (U. Tokyo, Japan)

Sample stascs and linear regression NEU 466M - PowerPoint PPT Presentation

Sample stascs and linear regression NEU 466M Instructor: Professor Ila R. Fiete Spring 2016 Mean { x 1 , , x N } N samples of variable x N h x