UQ, STAT2201, 2017, Lecture 5 Unit 4 Joint Distributions and Unit - PowerPoint PPT Presentation

UQ, STAT2201, 2017, Lecture 5 Unit 4 – Joint Distributions and Unit 5 – Descriptive Statistics. 1

Unit 4 - Joint Probability Distributions 2

A joint probability distribution – two (or more) random variables in the experiment. In case of two, referred to as bivariate probability distribution . 3

A joint probability mass function for discrete random variables X and Y , denoted as p XY ( x , y ), satisfies the following properties: (1) p XY ( x , y ) ≥ 0 for all x , y . (2) p XY ( x , y ) = 0 for ( x , y ) not in the range. (3) � � p XY ( x , y ) = 1, where the summation is over all ( x , y ) in the range. (4) p XY ( x , y ) = P ( X = x , Y = y ). 4

Example: Throw two independent dice and look at the, X ≡ Sum , Y ≡ Product . 6

A joint probability density function for continuous random variables X and Y , denoted as f XY ( x , y ), satisfies the following properties: (1) f XY ( x , y ) ≥ 0 for all x , y . (2) f XY ( x , y ) = 0 for ( x , y ) not in the range. ∞ ∞ � � (3) f XY ( x , y ) dx dy = 1. −∞ −∞ (4) For small ∆ x , ∆ y : � � f XY ( x , y ) ∆ x ∆ y ≈ P ( X , Y ) ∈ [ x , x +∆ x ) × [ y , y +∆ y ) . (5) For any region R of two-dimensional space, � � �� P ( X , Y ) ∈ R = f XY ( x , y ) dx dy . R e.g. Height and Weight. 7

A joint probability density function can also be defined for n > 2 random variables (as can be a joint probability mass function ). The following needs to hold: (1) f X 1 X 2 ... X n ( x 1 , x 2 , . . . , x n ) ≥ 0. ∞ ∞ ∞ � � � (2) f X 1 X 2 ... X n ( x 1 , x 2 , . . . , x n ) dx 1 dx 2 . . . dx n = 1. . . . −∞ −∞ −∞ 9

The marginal distributions of X and Y as well as conditional distributions of X given a specific value Y = y and vice versa can be obtained from the joint distribution. 10

If the random variables X and Y are independent, then f XY ( x , y ) = f X ( x ) f Y ( y ) and similarly in the discrete case. 11

Generalized Moments 12

The expected value of a function of two random variables is: �� h ( X , Y ) = h ( x , y ) f XY ( x , y ) dx dy for X , Y continuous . E 13

The covariance is a common measure of the relationship between two random variables (say X and Y ). It is denoted as cov( X , Y ) or σ XY , and is given by: � � σ XY = E ( X − µ X )( Y − µ Y ) = E ( XY ) − µ X µ Y . The covariance of a random variable with itself is its variance. 14

The correlation between the random variables X and Y , denoted as ρ XY , is cov( X , Y ) = σ XY ρ XY = . � σ X σ Y V ( X ) V ( Y ) For any two random variables X and Y , − 1 ≤ ρ XY ≤ 1. 15

If X and Y are independent random variables then σ XY = 0 and ρ XY = 0. The opposite case does not always hold: In general ρ XY = 0 does not imply independence. For jointly Normal random variables it does. In any case, if ρ XY = 0 then the random variables are called uncorrelated . 16

When considering several random variables, it is common to consider the (symmetric) Covariance Matrix , Σ with Σ i , j = cov( X i , X j ). 17

Bivariate Normal 18

The probability density function of a bivariate normal distribution is 1 f XY ( x , y ; σ X , σ Y , µ X , µ Y , ρ ) = � 1 − ρ 2 2 πσ X σ Y � � �� ( x − µ X ) 2 + ( y − µ Y ) 2 − 1 − 2 ρ ( x − µ X )( y − µ Y ) × exp 2(1 − ρ 2 ) σ 2 σ 2 σ X σ Y X Y for −∞ < x < ∞ and −∞ < y < ∞ . The parameters are σ X > 0, σ Y > 0, −∞ < µ X < ∞ , −∞ < µ Y < ∞ , − 1 < ρ < 1. 19

Linear Combinations of Random Variables 21

Given random variables X 1 , X 2 , . . . , X n and constants c 1 , c 2 , . . . , c n , the (scalar) linear combination Y = c 1 X 1 + c 2 X 2 + · · · + c n X n is often a random variable of interest. 22

The mean of the linear combination is the linear combination of the means, E ( Y ) = c 1 E ( X 1 ) + c 2 E ( X 2 ) + · · · + c n E ( X n ) . This holds even if the random variables are not independent. 23

The variance of the linear combination is as follows: V ( Y ) = c 2 1 V ( X 1 )+ c 2 2 V ( X 2 )+ · · · + c 2 � � n V ( X n )+2 c i c j cov( X i , X j ) i < j 24

If X 1 , X 2 , . . . , X n are independent (or even if they are just uncorrelated). V ( Y ) = c 2 1 V ( X 1 ) + c 2 2 V ( X 2 ) + · · · + c 2 n V ( X n ) . 25

Example: Derive Mean and variance of the Binomial Distribution. 26

Linear Combinations of Normal Random Variables 27

Linear combinations of Normal random variables remain Normally distributed : If X 1 , . . . , X n are jointly Normal then, � � Y ∼ Normal E ( Y ) , V ( Y ) . 28

i.i.d. Random Samples 29

A collection of random variables, X 1 , . . . , X n is said to be i.i.d. , or independent and identically distributed if they are mutually independent and identically distributed. The ( n - dimensional) joint probability density is a product of the individual densities. 30

In the context of statistics, a random sample is often modelled as an i.i.d. vector of random variables. X 1 , . . . , X n . An important linear combination associated with a random sample is the sample mean : � n i =1 X i = 1 nX 1 + 1 nX 2 + . . . + 1 X = nX n . n 31

If X i has mean µ and variance σ 2 then sample mean (of an i.i.d. sample) has, V ( X ) = σ 2 E ( X ) = µ, n . 32

Unit 5 – Descriptive Statistics 33

Descriptive statistics deals with summarizing data using numbers, qualitative summaries, tables and graphs. There are many possible data configurations... 34

Single sample: x 1 , x 2 , . . . , x n . 35

Single sample over time (time series): x t 1 , x t 2 , . . . , x t n with t 1 < t 2 < . . . < t n . 36

Two samples: x 1 , . . . , x n and y 1 , . . . , y m . 37

Generalizations from two samples to k samples (each of potentially different sample size, n 1 , . . . , n k ). 38

Observations in tuples: ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ). 39

Generalizations from tuples to vector observations (each vector of length ℓ ), ( x 1 1 , . . . , x ℓ 1 ) , . . . , ( x 1 n , . . . , x ℓ n ) . 40

Individual variables may be categorical or numerical . Categorical variables may be ordinal meaning that they be sorted (e.g. “a”, “b”, “c”, “d”), or not ordinal (e.g. “cat”, “dog”, “fish”). 41

A Statistic 42

A statistic is a quantity computed from a sample (assume here a single sample x 1 , . . . , x n ). 43

n � x i The sample mean : x = x 1 + · · · + x n i =1 = . n n 44

n n ( x i − x ) 2 x 2 i − n x 2 � � The sample variance : s 2 = i =1 i =1 = . n − 1 n − 1 √ s 2 . The sample standard deviation : s = 45

Order Statistics 46

Order statistics : Sort the sample to obtain the sequence of sorted observations, denoted x (1) , . . . , x ( n ) where, x (1) ≤ x (2) ≤ . . . ≤ x ( n ) . Some common order statistics: The minimum min( x 1 , . . . , x n ) = x (1) . The maximum max( x 1 , . . . , x n ) = x ( n ) . The median � x ( n +1 if n is odd , 2 ) median = 1 � � x ( n 2 ) + x ( n if n is even . 2 +1) 2 The median is the 50’th percentile and the 2nd quartile (see below). 47

The q th quantile ( q ∈ [0 , 1]) or alternatively the p = 100 q percentile (measured in percents instead of a decimal), is the observation such that p percent of the observations are less than it and (1 − p ) percent of the observations are greater than it. The first quartile , denoted Q 1 is the 25th percentile. The second quartile ( Q 2) is the median. The third quartile , denoted Q 3 is the 75th percentile. Thus half of the observations lie between Q 1 and Q 3. In other words, the quartiles break the sample into 4 quarters. The difference Q 3 − Q 1 is the interquartile range . The sample range is x ( n ) − x (1) . 48

Interlude: The quantile of a probability distribution? Given α ∈ [0 , 1] : What is x such that P ( X ≤ x ) = α , F ( x ) = α. Or, � x u du = α. −∞ To find the quantile, solve the equation for x . 49

Visualization 50

Histogram (with Equal Bin Widths): (1) Label the bin (class interval) boundaries on a horizontal scale. (2) Mark and label the vertical scale with frequencies or counts . (3) Above each bin, draw a rectangle where height is equal to the frequency (or count). 51

A Kernel Density Estimate (KDE) is a way to construct a Smoothed Histogram . While construction is not as straightforward as steps (1)–(3) above, automated tools can be used. 52

Both the histogram and the KDE are not unique in the way they summarize data. With these methods, different settings (e.g. number of bins in histograms or bandwidth in a KDE) may yield different representations of the same data set. Nevertheless, they are both very common, sensible and useful visualisations of data. 53

UQ, STAT2201, 2017, Lecture 5 Unit 4 Joint Distributions and Unit - PowerPoint PPT Presentation

UQ, STAT2201, 2017, Lecture 5 Unit 4 Joint Distributions and Unit 5 Descriptive Statistics. 1 Unit 4 - Joint Probability Distributions 2 A joint probability distribution two (or more) random variables in the experiment. In case of

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

UQ, STAT2201, 2017, Lecture 8 (and part of 9). Unit 8 Two Sample Inference. Unit 9

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions. 1 Random Variables

UQ, STAT2201, 2017, Lecture 9. Unit 10 Further Stats Overview 1 The Strength of Conditional

UQ, STAT2201, 2017, Lecture 2, Unit 2, Probability and Monte Carlo. 1 Im willing to bet that

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is

UQ, STAT2201, 2017, Lecture 7. Unit 7 Single Sample Inference. 1 Setup: A sample x 1 , . . .

Lecture 5: Probability Distributions Random Variables Probability Distributions

EECS 70: Lecture 27. Joint and Conditional Distributions. EECS 70: Lecture 27. Joint and

STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

STAT2201 Analysis of Engineering & Scientific Data Unit 7 Slava Vaisman The University of

STAT2201 Analysis of Engineering & Scientific Data Unit 8 Slava Vaisman The University of

STAT2201 Analysis of Engineering & Scientific Data Unit 6 Slava Vaisman The University of

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

Visualizing Probabilities A ^ B Sample space of all possible worlds A B Its area is 1 1

0 . 16 c 0 p ( a ) P ( X a ) p ( a , y ) 0 0.16

Flow-based Cost Query draft-gao-alto-fcs-01 Kai Gao 1 J. Jensen Zhang 2 J. Austin Wang 2 Qiao

Cost-Parity and Cost-Streett Games Joint work with Nathana el Fijalkow (LIAFA & University

7. Two Random Variables In many experiments, the observations are expressible not as a single

Optical Propagation, Detection, and Communication Jeffrey H. Shapiro Massachusetts Institute of

CSCE 970 Lecture 4: Introduction to Bayesian Networks E.g. each vector represents a medical

APPLIED MACHINE LEARNING Probability Density Functions Gaussian Mixture Models 1 APPLIED

UQ, STAT2201, 2017, Lecture 5 Unit 4 Joint Distributions and Unit - PowerPoint PPT Presentation

UQ, STAT2201, 2017, Lecture 5 Unit 4 Joint Distributions and Unit 5 Descriptive Statistics. 1 Unit 4 - Joint Probability Distributions 2 A joint probability distribution two (or more) random variables in the experiment. In case of

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

UQ, STAT2201, 2017, Lecture 8 (and part of 9). Unit 8 Two Sample Inference. Unit 9

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions. 1 Random Variables

UQ, STAT2201, 2017, Lecture 9. Unit 10 Further Stats Overview 1 The Strength of Conditional

UQ, STAT2201, 2017, Lecture 2, Unit 2, Probability and Monte Carlo. 1 Im willing to bet that

UQ, STAT2201, 2017, Lecture 6 Unit 6 Statistical Inference Ideas. 1 Statistical Inference is

UQ, STAT2201, 2017, Lecture 7. Unit 7 Single Sample Inference. 1 Setup: A sample x 1 , . . .

Lecture 5: Probability Distributions Random Variables Probability Distributions

EECS 70: Lecture 27. Joint and Conditional Distributions. EECS 70: Lecture 27. Joint and

STAT2201 Analysis of Engineering &amp; Scientific Data Unit 3 Slava Vaisman The University of

? ? ? ? Basic Charts Outline - Distributions &amp; Histograms - Mean, Mode, Average - Chart

Stat 5102 Lecture Slides: Deck 1 Empirical Distributions, Exact Sampling Distributions,

STAT2201 Analysis of Engineering &amp; Scientific Data Unit 7 Slava Vaisman The University of

STAT2201 Analysis of Engineering &amp; Scientific Data Unit 8 Slava Vaisman The University of

STAT2201 Analysis of Engineering &amp; Scientific Data Unit 6 Slava Vaisman The University of

HOUSING PROJECT 1 UNIT 4 UNIT 1 UNIT 6 UNIT 5 UNIT 3 UNIT 2 Application of the Concept

Visualizing Probabilities A ^ B Sample space of all possible worlds A B Its area is 1 1

0 . 16 c 0 p ( a ) P ( X a ) p ( a , y ) 0 0.16

Flow-based Cost Query draft-gao-alto-fcs-01 Kai Gao 1 J. Jensen Zhang 2 J. Austin Wang 2 Qiao

Cost-Parity and Cost-Streett Games Joint work with Nathana el Fijalkow (LIAFA &amp; University

7. Two Random Variables In many experiments, the observations are expressible not as a single

Optical Propagation, Detection, and Communication Jeffrey H. Shapiro Massachusetts Institute of

CSCE 970 Lecture 4: Introduction to Bayesian Networks E.g. each vector represents a medical

APPLIED MACHINE LEARNING Probability Density Functions Gaussian Mixture Models 1 APPLIED

STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

STAT2201 Analysis of Engineering & Scientific Data Unit 7 Slava Vaisman The University of

STAT2201 Analysis of Engineering & Scientific Data Unit 8 Slava Vaisman The University of

STAT2201 Analysis of Engineering & Scientific Data Unit 6 Slava Vaisman The University of

Cost-Parity and Cost-Streett Games Joint work with Nathana el Fijalkow (LIAFA & University