Expectation DS GA 1002 Probability and Statistics for Data Science - PowerPoint PPT Presentation

Chebyshev’s inequality Define Y := ( X − E ( X )) 2 By Markov’s inequality � Y ≥ a 2 � P ( | X − E ( X ) | ≥ a ) = P

Chebyshev’s inequality Define Y := ( X − E ( X )) 2 By Markov’s inequality � Y ≥ a 2 � P ( | X − E ( X ) | ≥ a ) = P ≤ E ( Y ) a 2

Chebyshev’s inequality Define Y := ( X − E ( X )) 2 By Markov’s inequality � Y ≥ a 2 � P ( | X − E ( X ) | ≥ a ) = P ≤ E ( Y ) a 2 = Var ( X ) a 2

Age of students at NYU Mean: 20 years, standard deviation: 3 years How many are younger than 30?

Age of students at NYU Mean: 20 years, standard deviation: 3 years How many are younger than 30? P ( A ≥ 30 ) ≤ P ( | A − 20 | ≥ 10 )

Age of students at NYU Mean: 20 years, standard deviation: 3 years How many are younger than 30? P ( A ≥ 30 ) ≤ P ( | A − 20 | ≥ 10 ) ≤ Var ( A ) 100 9 = 100 At least 91 %

Expectation operator Mean and variance Covariance Conditional expectation

Covariance The covariance of X and Y is Cov ( X , Y ) := E (( X − E ( X )) ( Y − E ( Y ))) = E ( XY − Y E ( X ) − X E ( Y ) + E ( X ) E ( Y )) = E ( XY ) − E ( X ) E ( Y ) If Cov ( X , Y ) = 0, X and Y are uncorrelated

Covariance Cov ( X , Y ) 0.5 0.9 0.99 Cov ( X , Y ) 0 -0.9 -0.99

Variance of the sum � ( X + Y − E ( X + Y )) 2 � Var ( X + Y ) = E � ( X − E ( X )) 2 � � ( Y − E ( Y )) 2 � = E + E + 2 E (( X − E ( X )) ( Y − E ( Y ))) = Var ( X ) + Var ( Y ) + 2 Cov ( X , Y )

Variance of the sum � ( X + Y − E ( X + Y )) 2 � Var ( X + Y ) = E � ( X − E ( X )) 2 � � ( Y − E ( Y )) 2 � = E + E + 2 E (( X − E ( X )) ( Y − E ( Y ))) = Var ( X ) + Var ( Y ) + 2 Cov ( X , Y ) If X and Y are uncorrelated, then Var ( X + Y ) = Var ( X ) + Var ( Y )

Independence implies uncorrelation Cov ( X , Y ) = E ( XY ) − E ( X ) E ( Y ) = E ( X ) E ( Y ) − E ( X ) E ( Y ) = 0

Uncorrelation does not imply independence X , Y are independent Bernoulli with parameter 1 2 Let U = X + Y and V = X − Y Are U and V independent? Are they uncorrelated?

Uncorrelation does not imply independence p U ( 0 ) p V ( 0 ) p U , V ( 0 , 0 )

Uncorrelation does not imply independence p U ( 0 ) = P ( X = 0 , Y = 0 ) = 1 4 p V ( 0 ) p U , V ( 0 , 0 )

Uncorrelation does not imply independence p U ( 0 ) = P ( X = 0 , Y = 0 ) = 1 4 p V ( 0 ) = P ( X = 1 , Y = 1 ) + P ( X = 0 , Y = 0 ) = 1 2 p U , V ( 0 , 0 )

Uncorrelation does not imply independence p U ( 0 ) = P ( X = 0 , Y = 0 ) = 1 4 p V ( 0 ) = P ( X = 1 , Y = 1 ) + P ( X = 0 , Y = 0 ) = 1 2 p U , V ( 0 , 0 ) = P ( X = 0 , Y = 0 ) = 1 4

Uncorrelation does not imply independence p U ( 0 ) = P ( X = 0 , Y = 0 ) = 1 4 p V ( 0 ) = P ( X = 1 , Y = 1 ) + P ( X = 0 , Y = 0 ) = 1 2 p U , V ( 0 , 0 ) = P ( X = 0 , Y = 0 ) = 1 4 � = p U ( 0 ) p V ( 0 ) = 1 8

Uncorrelation does not imply independence Cov ( U , V ) = E ( UV ) − E ( U ) E ( V ) = E (( X + Y ) ( X − Y )) − E ( X + Y ) E ( X − Y ) − E 2 ( X ) + E 2 ( Y ) X 2 � Y 2 � � � = E − E

Uncorrelation does not imply independence Cov ( U , V ) = E ( UV ) − E ( U ) E ( V ) = E (( X + Y ) ( X − Y )) − E ( X + Y ) E ( X − Y ) − E 2 ( X ) + E 2 ( Y ) X 2 � Y 2 � � � = E − E = 0

Correlation coefficient Pearson correlation coefficient of X and Y ρ X , Y := Cov ( X , Y ) . σ X σ Y Covariance between X /σ X and Y /σ Y

Correlation coefficient σ Y = 1, σ Y = 3, σ Y = 3, Cov ( X , Y ) = 0 . 9, Cov ( X , Y ) = 0 . 9, Cov ( X , Y ) = 2 . 7, ρ X , Y = 0 . 9 ρ X , Y = 0 . 3 ρ X , Y = 0 . 9

Cauchy-Schwarz inequality For any X and Y � | E ( XY ) | ≤ E ( X 2 ) E ( Y 2 ) . and � E ( Y 2 ) � E ( X 2 ) E ( Y 2 ) ⇐ E ( XY ) = ⇒ Y = E ( X 2 ) X � E ( Y 2 ) � E ( XY ) = − E ( X 2 ) E ( Y 2 ) ⇐ ⇒ Y = − E ( X 2 ) X

Cauchy-Schwarz inequality We have Cov ( X , Y ) ≤ σ X σ Y and equivalently | ρ X , Y | ≤ 1 In addition | ρ X , Y | = 1 ⇐ ⇒ Y = c X + d where � σ Y if ρ X , Y = 1 , σ X c := d := E ( Y ) − c E ( X ) − σ Y if ρ X , Y = − 1 , σ X

Covariance matrix of a random vector The covariance matrix of � X is defined as   Var ( X 1 ) Cov ( X 1 , X 2 ) · · · Cov ( X 1 , X n ) Cov ( X 2 , X 1 ) Var ( X 2 ) · · · Cov ( X 2 , X n )   Σ � X =  . . .  ... . . .   . . .   Cov ( X n , X 2 ) Cov ( X n , X 2 ) · · · Var ( X n ) � T � X T � � � � X � � � � = E − E X E X

Covariance matrix after a linear transformation Σ A � X + � b

Covariance matrix after a linear transformation �� T � � T � � � � � A � X + � A � X + � A � X + � A � X + � Σ A � b = E b b − E b b E X + �

Covariance matrix after a linear transformation �� T � � T � � � � � A � X + � A � X + � A � X + � A � X + � Σ A � b = E b b − E b b E X + � � T � X T � A T + � � A T + A E � � b T + � X � � � � � b � b T = A E b E X X � T � T � � � A T − A E � � b T − � � A T − � � � � � � b � b T − A E X E X X b E X

Covariance matrix after a linear transformation �� T � � T � � � � � A � X + � A � X + � A � X + � A � X + � Σ A � b = E b b − E b b E X + � � T � X T � A T + � � A T + A E � � b T + � X � � � � � b � b T = A E b E X X � T � T � � � A T − A E � � b T − � � A T − � � � � � � b � b T − A E X E X X b E X � � T � � X T � � � � X � � � � A T = A − E E X E X

Covariance matrix after a linear transformation �� T � � T � � � � � A � X + � A � X + � A � X + � A � X + � Σ A � b = E b b − E b b E X + � � T � X T � A T + � � A T + A E � � b T + � X � � � � � b � b T = A E b E X X � T � T � � � A T − A E � � b T − � � A T − � � � � � � b � b T − A E X E X X b E X � � T � � X T � � � � X � � � � A T = A − E E X E X X A T = A Σ �

Variance in a fixed direction For any unit vector � u � u T � � u T Σ � Var � X = � X � u

Direction of maximum variance To find direction of maximum variance we must solve u T Σ � arg max u || 2 = 1 � X � u || �

Linear algebra Symmetric matrices have orthogonal eigenvectors X = U Λ U T Σ �  λ 1 0 · · · 0  0 λ 2 · · · 0 � T   � � � � � � � � � = u 1 u 2 · · · u n u 1 u 2 · · · u n   · · ·   0 0 · · · λ n

Linear algebra || u || 2 = 1 u T Au λ 1 = max || u || 2 = 1 u T Au u 1 = arg max u T Au λ k = max || u || 2 = 1 , u ⊥ u 1 ,..., u k − 1 u T Au u k = arg max || u || 2 = 1 , u ⊥ u 1 ,..., u k − 1

Direction of maximum variance √ λ 1 = 1 . 22, √ λ 1 = 1 . 38, √ λ 1 = 1, √ λ 2 = 1 √ λ 2 = 0 . 71 √ λ 2 = 0 . 32

Coloring Goal: Transform uncorrelated samples with unit variance so that they have a prescribed covariance matrix Σ 1. Compute the eigendecomposition Σ = U Λ U T . 2. Set √ � y := U Λ � x where √ λ 1  0 · · · 0  √ λ 2 √ 0 0 · · ·   Λ :=   · · ·  √ λ n  0 0 · · ·

Coloring Σ � Y

Coloring √ √ T U T Σ � Y = U ΛΣ � Λ X

Coloring √ √ T U T Σ � Y = U ΛΣ � Λ X √ √ T U T = U Λ I Λ

Expectation DS GA 1002 Probability and Statistics for Data Science - PowerPoint PPT Presentation

Expectation DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean, variance, covariance Expectation

more on expectation 1 2 properties of expectation properties of expectation Linearity, II

CS70: Jean Walrand: Lecture 27. Expectation; Conditional Expectation; B(n, p); G(p) 1. Review of

Expectation Will Perkins January 21, 2013 Expectation Definition The expectation of a random

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Expectation Maximization CMSC 691 UMBC Outline EM (Expectation Maximization) Basic idea Three

HIERAR HIERARCHICAL CHICAL LINEAR MODELLING LINEAR MODELLING Expectation Expectation

CS70: Jean Walrand: Lecture 34. Conditional Expectation CS70: Jean Walrand: Lecture 34.

Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9

CS70: Jean Walrand: Lecture 26. Expectation; Geometric & Poisson 1. Random Variables: Brief

CS70: Jean Walrand: Lecture 36. Review: CDF and PDF. Expectation Definitions: (a) The expectation

18.175: Lecture 5 More integration and expectation Scott Sheffield MIT 1 18.175 Lecture 5 Outline

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture

Helping Students Bridge the Expectation Divide A Competency Based Advising Approach The

Approximating Probabilistic Bisimulation by Introduction Background Conditional Expectation

Foundations of Computer Science Lecture 19 Expected Value The Average Over Many Runs of an

Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor,

Frchet Distance Between Uncertain Trajectories Computing Expected Value and Upper Bound Kevin

Expectation of Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of

4. Expectation and Variance Joint PMFs Andrej Bogdanov Expected value The expected value

Lecture 30: Bayes Rules, Expected Value and Variance, and Binormal Distribution Dr. Chengjiang

Recall: random variables A random variable X on a sample space is a function :

Discrete Random Variables; Expectation 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom This

Randomized algorithms Guessing cards Three examples: Median/Select. Inge Li Grtz

Expectation DS GA 1002 Probability and Statistics for Data Science - PowerPoint PPT Presentation

Expectation DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean, variance, covariance Expectation

more on expectation 1 2 properties of expectation properties of expectation Linearity, II

CS70: Jean Walrand: Lecture 27. Expectation; Conditional Expectation; B(n, p); G(p) 1. Review of

Expectation Will Perkins January 21, 2013 Expectation Definition The expectation of a random

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Expectation Maximization CMSC 691 UMBC Outline EM (Expectation Maximization) Basic idea Three

HIERAR HIERARCHICAL CHICAL LINEAR MODELLING LINEAR MODELLING Expectation Expectation

CS70: Jean Walrand: Lecture 34. Conditional Expectation CS70: Jean Walrand: Lecture 34.

Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9

CS70: Jean Walrand: Lecture 26. Expectation; Geometric &amp; Poisson 1. Random Variables: Brief

CS70: Jean Walrand: Lecture 36. Review: CDF and PDF. Expectation Definitions: (a) The expectation

18.175: Lecture 5 More integration and expectation Scott Sheffield MIT 1 18.175 Lecture 5 Outline

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture

Helping Students Bridge the Expectation Divide A Competency Based Advising Approach The

Approximating Probabilistic Bisimulation by Introduction Background Conditional Expectation

Foundations of Computer Science Lecture 19 Expected Value The Average Over Many Runs of an

Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor,

Frchet Distance Between Uncertain Trajectories Computing Expected Value and Upper Bound Kevin

Expectation of Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of

4. Expectation and Variance Joint PMFs Andrej Bogdanov Expected value The expected value

Lecture 30: Bayes Rules, Expected Value and Variance, and Binormal Distribution Dr. Chengjiang

Recall: random variables A random variable X on a sample space is a function :

Discrete Random Variables; Expectation 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom This

Randomized algorithms Guessing cards Three examples: Median/Select. Inge Li Grtz

CS70: Jean Walrand: Lecture 26. Expectation; Geometric & Poisson 1. Random Variables: Brief