Descriptive Statistics DS GA 1002 Probability and Statistics for - - PowerPoint PPT Presentation

descriptive statistics
SMART_READER_LITE
LIVE PREVIEW

Descriptive Statistics DS GA 1002 Probability and Statistics for - - PowerPoint PPT Presentation

Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda Descriptive statistics Techniques to visualize and summarize data Can often be


slide-1
SLIDE 1

Descriptive Statistics

DS GA 1002 Probability and Statistics for Data Science

http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda

slide-2
SLIDE 2

Descriptive statistics

Techniques to visualize and summarize data Can often be interpreted within a probabilistic framework Often probabilistic assumptions do not hold, but techniques are still useful We describe them from a deterministic point of view

slide-3
SLIDE 3

Histogram Empirical mean and variance Order statistics Empirical covariance Empirical covariance matrix

slide-4
SLIDE 4

Histogram

Technique to visualize one-dimensional data Bin range of the data, then count the number of instances in each bin The width of the bins can be adjusted to yield higher or lower resolution Approximation to their pmf or pdf if data are iid

slide-5
SLIDE 5

Temperature in Oxford

5 10 15 20 25 30 Degrees (Celsius) 5 10 15 20 25 30 35 40 45 January August

slide-6
SLIDE 6

GDP per capita of different countries

50 100 150 200 Thousands of dollars 10 20 30 40 50 60 70 80 90

slide-7
SLIDE 7

Histogram Empirical mean and variance Order statistics Empirical covariance Empirical covariance matrix

slide-8
SLIDE 8

Empirical mean

Let {x1, x2, . . . , xn} be a set of real-valued data The empirical mean is defined as av (x1, x2, . . . , xn) := 1 n

n

  • i=1

xi Temperature data: 6.73 ◦C in January and 21.3 ◦C in August GDP per capita: $16 500

slide-9
SLIDE 9

Empirical mean

Let { x1, x2, . . . , xn} be a set of d-dimensional real-valued data The empirical mean is defined as av ( x1, x2, . . . , xn) := 1 n

n

  • i=1
  • xi
slide-10
SLIDE 10

Centering

Let { x1, x2, . . . , xn} be a set of d-dimensional real-valued data To center the data set we:

  • 1. Compute the empirical mean
  • 2. Subtract it from each vector
  • yi :=

xi − av ( x1, x2, . . . , xn) , 1 ≤ i ≤ n

  • y1, . . . ,

yn are centered at the origin

slide-11
SLIDE 11

Centering

Uncentered data Centered data

slide-12
SLIDE 12

Empirical variance

Let {x1, x2, . . . , xn} be a set of real-valued data The empirical variance is defined as var (x1, x2, . . . , xn) := 1 n − 1

n

  • i=1

(xi − av (x1, x2, . . . , xn))2 The empirical standard deviation is the square root of the empirical variance Temperature data: 1.99 ◦C in January and 1.73 ◦C in August GDP per capita: $25 300

slide-13
SLIDE 13

Histogram Empirical mean and variance Order statistics Empirical covariance Empirical covariance matrix

slide-14
SLIDE 14

Temperature dataset

In January the temperature in Oxford is around 6.73 ◦C give or take 2 ◦C

slide-15
SLIDE 15

GDP dataset

Countries typically have a GDP per capita of about $16 500 give or take $25 300

slide-16
SLIDE 16

Quantiles and percentiles

Let x(1) ≤ x(2) ≤ . . . ≤ x(n) denote the ordered elements of a dataset {x1, x2, . . . , xn} The q quantile of the data for 0 < q < 1 is x([q(n+1)]) [q (n + 1)] is the closest integer to q (n + 1) The 100 p quantile is known as the p percentile

slide-17
SLIDE 17

Quartiles and median

The 0.25 and 0.75 quantiles are the first and third quartiles The 0.5 quantile is the empirical median If n is even, the empirical median is usually set to x(n/2) + x(n/2+1) 2 The difference between the 3rd and 1st quartiles is the interquartile range (IQR)

slide-18
SLIDE 18

Quartiles and median

◮ Temperature data (January):

◮ Sample mean: 6.73 ◦C ◮ Median: 6.80 ◦C ◮ Interquartile range: 2.9 ◦C

◮ Temperature data (August):

◮ Sample mean: 21.3 ◦C ◮ Median: 21.2 ◦C ◮ Interquartile range: 2.1 ◦C

slide-19
SLIDE 19

Quartiles and median

◮ GDP per capita:

◮ Sample mean: $16 500

(71% of the countries have lower GDP per capita!)

◮ Median: $6 350 ◮ Interquartile range: $18 200 ◮ Five-number summary: $130, $1 960, $6 350, $20 100, $188 000

slide-20
SLIDE 20

Boxplot of temperature data

January April August November 5 5 10 15 20 25 30 Degrees (Celsius)

slide-21
SLIDE 21

Boxplot of GDP data

10 20 30 40 50 60 Thousands of dollars

slide-22
SLIDE 22

Histogram Empirical mean and variance Order statistics Empirical covariance Empirical covariance matrix

slide-23
SLIDE 23

Multidimensional data

Each dimension represents a feature We can visualize two-dimensional data using scatter plots

slide-24
SLIDE 24

Scatter plot

16 18 20 22 24 26 28

August

8 10 12 14 16 18 20

April

slide-25
SLIDE 25

Scatter plot

5 5 10 15 20 25 30

Maximum temperature

10 5 5 10 15 20

Minimum temperature

slide-26
SLIDE 26

Empirical covariance

Data: {(x1, y1) , (x2, y2) , . . . , (xn, yn)} The empirical covariance is defined as

cov ((x1, y1) , . . . , (xn, yn)) := 1 n − 1

n

  • i=1

(xi − av (x1, . . . , xn)) (yi − av (y1, . . . , yn))

slide-27
SLIDE 27

Empirical correlation coefficient

Data: {(x1, y1) , (x2, y2) , . . . , (xn, yn)} The empirical correlation coefficient is defined as ρ ((x1, y1) , . . . , (xn, yn)) := cov ((x1, y1) , . . . , (xn, yn)) std (x1, . . . , xn) std (y1, . . . , yn) Cauchy-Schwarz inequality: for any a, b −1 ≤

  • aT

b ||a||2 ||b||2 ≤ 1 Consequence: −1 ≤ ρ ((x1, y1) , . . . , (xn, yn)) ≤ 1

slide-28
SLIDE 28

ρ = 0.269

16 18 20 22 24 26 28

August

8 10 12 14 16 18 20

April

slide-29
SLIDE 29

ρ = 0.962

5 5 10 15 20 25 30

Maximum temperature

10 5 5 10 15 20

Minimum temperature

slide-30
SLIDE 30

Histogram Empirical mean and variance Order statistics Empirical covariance Empirical covariance matrix

slide-31
SLIDE 31

Empirical covariance matrix

Data: { x1, x2, . . . , xn} (d features) The empirical covariance matrix is defined as Σ ( x1, . . . , xn) := 1 n − 1

n

  • i=1

( xi − av ( x1, . . . , xn)) ( xi − av ( x1, . . . , xn))T The (i, j) entry, 1 ≤ i, j ≤ d, is given by Σ ( x1, . . . , xn)ij =

  • var ((

x1)i , . . . , ( xn)i) if i = j, cov

  • (

x1)i , ( x1)j

  • , . . . ,
  • (

xn)i , ( xn)j

  • if i = j.
slide-32
SLIDE 32

Empirical variance in a certain direction

Let v be a unit-norm vector aligned with a direction of interest var

  • v T

x1, . . . , v T xn

slide-33
SLIDE 33

Empirical variance in a certain direction

Let v be a unit-norm vector aligned with a direction of interest var

  • v T

x1, . . . , v T xn

  • =

1 n − 1

n

  • i=1
  • v T

xi − av

  • v T

x1, . . . , v T xn 2

slide-34
SLIDE 34

Empirical variance in a certain direction

Let v be a unit-norm vector aligned with a direction of interest var

  • v T

x1, . . . , v T xn

  • =

1 n − 1

n

  • i=1
  • v T

xi − av

  • v T

x1, . . . , v T xn 2 = 1 n − 1

n

  • i=1
  • v T (

xi − av ( x1, . . . , xn)) 2

slide-35
SLIDE 35

Empirical variance in a certain direction

Let v be a unit-norm vector aligned with a direction of interest var

  • v T

x1, . . . , v T xn

  • =

1 n − 1

n

  • i=1
  • v T

xi − av

  • v T

x1, . . . , v T xn 2 = 1 n − 1

n

  • i=1
  • v T (

xi − av ( x1, . . . , xn)) 2 = v T

  • 1

n − 1

n

  • i=1

( xi − av ( x1, . . . , xn)) ( xi − av ( x1, . . . , xn))T

  • v
slide-36
SLIDE 36

Empirical variance in a certain direction

Let v be a unit-norm vector aligned with a direction of interest var

  • v T

x1, . . . , v T xn

  • =

1 n − 1

n

  • i=1
  • v T

xi − av

  • v T

x1, . . . , v T xn 2 = 1 n − 1

n

  • i=1
  • v T (

xi − av ( x1, . . . , xn)) 2 = v T

  • 1

n − 1

n

  • i=1

( xi − av ( x1, . . . , xn)) ( xi − av ( x1, . . . , xn))T

  • v

= v TΣ ( x1, . . . , xn) v

slide-37
SLIDE 37

Eigendecomposition of the covariance matrix

Let v be a unit-norm vector aligned with a direction of interest Σ ( x1, . . . , xn) = UΛUT =

  • u1
  • u2

· · ·

  • un

   λ1 · · · λ2 · · · · · · · · · λn    

  • u1
  • u2

· · ·

  • un

T

slide-38
SLIDE 38

Eigendecomposition of the covariance matrix

For any symmetric matrix A ∈ Rn with normalized eigenvectors

  • u1,

u2, . . . , un and corresponding eigenvalues λ1 ≥ λ2 ≥ . . . ≥ λn λ1 = max

|| v||2=1

v TA v

  • u1 = arg max

|| v||2=1

v TA v λk = max

|| v||2=1, u⊥ u1,..., uk−1

  • v TA

v

  • uk = arg

max

|| v||2=1, u⊥ u1,..., uk−1

  • v TA

v

slide-39
SLIDE 39

Principal component analysis

Compute eigenvectors of empirical covariance matrix to determine directions of maximum variation

slide-40
SLIDE 40

Example: 2D data

σ1 √n = 0.705 σ2 √n = 0.690

u1 u2

slide-41
SLIDE 41

Example: 2D data

σ1 √n = 0.9832 σ2 √n = 0.3559

u1 u2

slide-42
SLIDE 42

Example: 2D data

σ1 √n = 1.3490 σ2 √n = 0.1438

u1 u2

slide-43
SLIDE 43

Centering is important!

σ1 √n = 5.077 σ2 √n = 0.889 u1 u2

slide-44
SLIDE 44

Centering is important!

σ1 √n = 1.261 σ2 √n = 0.139 u2 u1

slide-45
SLIDE 45

Dimensionality reduction

Projection of data onto a lower-dimensional space Applications: Visualization / computational efficiency / denoising Example: Seeds from 3 varieties of wheat (Kama, Rosa and Canadian) 7 features: area, perimeter, compactness, length of kernel, width of kernel, asymmetry coefficient and length of kernel groove

slide-46
SLIDE 46

PCA dimensionality reduction

2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

Projection onto first PC

2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

Projection onto second PC

slide-47
SLIDE 47

PCA dimensionality reduction

2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

Projection onto (d-1)th PC

2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

Projection onto dth PC

slide-48
SLIDE 48

Whitening

Preprocessing procedure Linear transformation to eliminate skew in the data Enhances nonlinear structure After whitening, the data are uncorrelated

slide-49
SLIDE 49

Whitening

Let x1, . . . , xn be a set of d-dimensional centered data with a full-rank covariance matrix. To whiten the data we

  • 1. Compute the eigendecomposition of the empirical covariance matrix

Σ ( x1, . . . , xn) = UΛUT

  • 2. For i = 1, . . . , n set
  • yi :=

√ Λ

−1UT

xi, √ Λ :=     √λ1 · · · √λ2 · · · · · · · · · √λn    

slide-50
SLIDE 50

Whitening

Σ ( y1, . . . , yn)

slide-51
SLIDE 51

Whitening

Σ ( y1, . . . , yn) := 1 n − 1

n

  • i=1
  • yi

yT

i

slide-52
SLIDE 52

Whitening

Σ ( y1, . . . , yn) := 1 n − 1

n

  • i=1
  • yi

yT

i

= 1 n − 1

n

  • i=1

√ Λ

−1UT

xi √ Λ

−1UT

xi T

slide-53
SLIDE 53

Whitening

Σ ( y1, . . . , yn) := 1 n − 1

n

  • i=1
  • yi

yT

i

= 1 n − 1

n

  • i=1

√ Λ

−1UT

xi √ Λ

−1UT

xi T = √ Λ

−1UT

  • 1

n − 1

n

  • i=1
  • xi

xT

i

  • U

√ Λ

−1

slide-54
SLIDE 54

Whitening

Σ ( y1, . . . , yn) := 1 n − 1

n

  • i=1
  • yi

yT

i

= 1 n − 1

n

  • i=1

√ Λ

−1UT

xi √ Λ

−1UT

xi T = √ Λ

−1UT

  • 1

n − 1

n

  • i=1
  • xi

xT

i

  • U

√ Λ

−1

= √ Λ

−1UTΣ (

x1, . . . , xn) U √ Λ

−1

slide-55
SLIDE 55

Whitening

Σ ( y1, . . . , yn) := 1 n − 1

n

  • i=1
  • yi

yT

i

= 1 n − 1

n

  • i=1

√ Λ

−1UT

xi √ Λ

−1UT

xi T = √ Λ

−1UT

  • 1

n − 1

n

  • i=1
  • xi

xT

i

  • U

√ Λ

−1

= √ Λ

−1UTΣ (

x1, . . . , xn) U √ Λ

−1

= √ Λ

−1UTU

√ Λ √ ΛUTU √ Λ

−1

slide-56
SLIDE 56

Whitening

Σ ( y1, . . . , yn) := 1 n − 1

n

  • i=1
  • yi

yT

i

= 1 n − 1

n

  • i=1

√ Λ

−1UT

xi √ Λ

−1UT

xi T = √ Λ

−1UT

  • 1

n − 1

n

  • i=1
  • xi

xT

i

  • U

√ Λ

−1

= √ Λ

−1UTΣ (

x1, . . . , xn) U √ Λ

−1

= √ Λ

−1UTU

√ Λ √ ΛUTU √ Λ

−1

= I

slide-57
SLIDE 57
  • x
slide-58
SLIDE 58

UT x

slide-59
SLIDE 59

√ Λ−1UT x