Probabilities and Statistics An introduction to concepts and - - PowerPoint PPT Presentation

probabilities and statistics
SMART_READER_LITE
LIVE PREVIEW

Probabilities and Statistics An introduction to concepts and - - PowerPoint PPT Presentation

Probabilities and Statistics An introduction to concepts and terminology Christoph Rosemann DESY 18. March 2013 CR (DESY) Probabilities and Statistics 18. March 2013 1 / 62 Outline Probability Distributions and their characterization


slide-1
SLIDE 1

Probabilities and Statistics

An introduction to concepts and terminology Christoph Rosemann

DESY

  • 18. March 2013

CR (DESY) Probabilities and Statistics

  • 18. March 2013

1 / 62

slide-2
SLIDE 2

Outline

Probability Distributions and their characterization Examples for specific distributions Central limit theorem From univariate to multivariate Parameter transformation Error propagation

CR (DESY) Probabilities and Statistics

  • 18. March 2013

2 / 62

slide-3
SLIDE 3

Literature

Short summaries: PDG articles on probability and statistics (by Glen Cowan) Introductory books: R.J. Barlow Statistics Glen Cowan Statistical Data Analysis More advanced:

  • V. Blobel/E. Lohrmann Statistische und numerische Methoden der

Datenanalyse (in German), now online, too: http://desy.de/~blobel/ebuch.html Wes Metzger (online book) http://www.hef.ru.nl/~wes/stat_course/statistics.html

CR (DESY) Probabilities and Statistics

  • 18. March 2013

3 / 62

slide-4
SLIDE 4

Probability

Fundamental motivation

In physics it’s impossible to determine (or predict) the outcome of an experiment perfectly, regardless if its classical (deterministic) or quantum

  • physics. The outcome of an experiment is governed more or less by

chance!

Inversion of problem

Determine some physically meaningful quantity including its uncertainty from an experiment with limited accuracy.

Goal of the(se) lecture(s):

to teach the mathematical methods to analyze data including their uncertainties.

CR (DESY) Probabilities and Statistics

  • 18. March 2013

4 / 62

slide-5
SLIDE 5

An example from quantum mechanics

Take a radioactive substance with decay constant λ. Quantum mechanics yield description of decay probability dp of a nucleus within an interval dτ: dp = λdτ For a given number n of these nuclei, the mean number of decays in an interval dτ is: −dn(τ) = n(τ) − n(τ + dτ) = n(τ)λdτ = ⇒ dn(τ) n(τ) = −λdτ Integration from τ = 0 to a time t yields the known decay law: n(t) = n0 e−λ t

CR (DESY) Probabilities and Statistics

  • 18. March 2013

5 / 62

slide-6
SLIDE 6

Remarks on the decay example

λ is a probability (per time) The decay law describes only the average number of decays Actual number of decays is not constant, there are statistical variations How do I determine the decay constant λ (or lifetime τ = 1/λ) from this fluctuating outcome?

The fundamental problem

How do I determine the underlying constant from statistically varying measurements?

CR (DESY) Probabilities and Statistics

  • 18. March 2013

6 / 62

slide-7
SLIDE 7

Statistical and systematic errors

Mathematical theory exists for statistical uncertainties

Single measurements have possibly large uncertainties – determine the size Averaging repeated measurements reduces uncertainty – but that’s not always possible How long/often one has to measure to reach a given accuracy

Different story: systematic errors

E.g. limited accuracy of detector components, limited knowledge No mathematical theory! Usually have to be determined by the experimentalist Once determined, they usually can be incorporated into result like statistical errors = ⇒ Special lecture

CR (DESY) Probabilities and Statistics

  • 18. March 2013

7 / 62

slide-8
SLIDE 8

Definitions of probability

Classical (Laplace) – Principle of symmetry

p = #positive cases #all possible cases

Frequentistic

The relative frequency determines the probability. In the limit of von Mises this the extension of the classical interpretation: p = lim

n→∞

k n, where k counts the number of favorable outcomes in n tries.

CR (DESY) Probabilities and Statistics

  • 18. March 2013

8 / 62

slide-9
SLIDE 9

Further definitions of probability

Subjective (or Bayesian)

Probability is a measure of the degree of belief. Usually this is prior to doing an experiment.

Axiomatic (Kolmogorov)

Let S be a sample space of events and A, B sub samples.

1 p(A) ∈ [0, 1] 2 p(A ∪ B) = p(A) + p(B), if A ∩ B = ∅ 3 p(S) = 1 CR (DESY) Probabilities and Statistics

  • 18. March 2013

9 / 62

slide-10
SLIDE 10

Frequentist vs. Subjectivist

Often calculations and results are the same Frequentist approach sometimes impossible or useless

◮ e.g. the underlying principle of betting ◮ e.g. searches for new particles ◮ harsh requirement: experiment should be repeatable arbitrarily often

under the same circumstances

Still insurance companies are very good at predicting the future from past experience Topic is far from trivial Usually we’ll take a frequentists view

CR (DESY) Probabilities and Statistics

  • 18. March 2013

10 / 62

slide-11
SLIDE 11

Conditional probabilities

The conditional probability p(A|B)

is the probability that an event A occurs under the assumption that event B also occurs, defined by p(A|B) = p(A ∩ B) p(B) . With p(A ∩ B) = p(A|B)p(B) = p(B|A)p(A) follows

Bayes’ Theorem

p(A|B) = p(B|A) p(A) p(B) (absolutely NO relation to Bayesian view of probabilities)

CR (DESY) Probabilities and Statistics

  • 18. March 2013

11 / 62

slide-12
SLIDE 12

Statistical independence

A and B are statistically independent

if the occurrence of B is independent of the occurrence of A: p(B|A) = p(B) and p(A|B) = p(A) In this case the probabilities can be multiplied to yield the joint probability p(A ∩ B) = p(A)p(B)

CR (DESY) Probabilities and Statistics

  • 18. March 2013

12 / 62

slide-13
SLIDE 13

Example for statistical independence

Take the following events picked at random from a long time period: A: it’s Saturday p(A) = 1/7 B: it’s a day with snowfall (in Hamburg) p(B) ≈ 1/40 For a random day these are independent, thus the probability to pick a Saturday on which it snowed is p(A ∩ B) = p(A|B)p(A) = p(A)p(B) ≈ 1/280 Now change change A: it’s January, p(A) ≈ 1/12 These events are not independent – the probability of snowfall depends heavily on the season!

CR (DESY) Probabilities and Statistics

  • 18. March 2013

13 / 62

slide-14
SLIDE 14

Section 2

Description of probability: distributions and their characteristics

CR (DESY) Probabilities and Statistics

  • 18. March 2013

14 / 62

slide-15
SLIDE 15

Description of probability

Most relevant: physics experiments whose outcome varies statistically View the outcome of an experiment as random variable

Two major categories

discrete random variables: enumerable values k ∈ N, k ∈ [a, b] continuous random variables: continuous values x ∈ R

CR (DESY) Probabilities and Statistics

  • 18. March 2013

15 / 62

slide-16
SLIDE 16

Discrete random variables

Probability

The probability to obtain a certain value r is P(r) Axioms demand

b

  • r=a

P(r) = 1 Example: count decay rates in a time interval

CR (DESY) Probabilities and Statistics

  • 18. March 2013

16 / 62

slide-17
SLIDE 17

Continuous random variables

Probability is only defined for an interval

The probability for a ≤ x < b is defined by the integral P(a ≤ x < b) = b

a

f (x)dx f (x) is the probability density function or pdf Axioms demand f (x) ≥ 0, ∀x ∈ def .range ∞

−∞

f (x) = 1 The cumulative probability F(x) is used to describe the probability to

  • btain a value x or smaller

F(x) = x

−∞

f (x′)dx′.

CR (DESY) Probabilities and Statistics

  • 18. March 2013

17 / 62

slide-18
SLIDE 18

Different definitions of probabilities

Discrete random variables

can be directly assigned to probabilities; e.g. a single value within the boundaries has a certain probability.

Continuous random variables

have to be integrated over an interval; it doesn’t make sense to determine the probability for a single value! Fundamental is the probability density function (pdf): f (x). The axioms of probabilities demand that f (x) ≥ 0 and the sum/integral

  • ver all possible values is 1.

CR (DESY) Probabilities and Statistics

  • 18. March 2013

18 / 62

slide-19
SLIDE 19

Characterizations of probability distributions I

The expectation value

with respect to a function h(x) is defined as E[h(x)] = ∞

−∞

h(x) · f (x)dx. If h(x) is a function like h(x) = xn (with n ∈ N), then this expectation value is called the nth moment of the distribution E[h(x)] = E[xn] = ∞

−∞

xn · f (x)dx. The most important is the first moment of the pdf, its mean value x = E[x] = ∞

−∞

x · f (x)dx = µ.

CR (DESY) Probabilities and Statistics

  • 18. March 2013

19 / 62

slide-20
SLIDE 20

Characterizations of probability distributions II

The variance

The width of a distribution can be described as a moment with respect to its mean value. These are functions like h(x) = (x − x)n, called central

  • moments. The most important is the second central moment, the variance

σ2 = E[(x − x)2] = ∞

−∞

(x − x)2f (x)dx. Higher central moments are much less important, but can help sometimes Examples: are the skewness (∼ x3) or the kurtosis (∼ x4)

CR (DESY) Probabilities and Statistics

  • 18. March 2013

20 / 62

slide-21
SLIDE 21

Characterizations of probability distributions III

Other characterizations are also widely in use:

Describing the central value

The median xm is the value, where the probabilities to obtain a larger

  • r a smaller value are equal: F(xm) = 0.5.

The most probable value is the absolute maximum of the pdf.

Describing the width

The FWHM (full width half maximum) is the positive difference of the two x-values, where the pdf has dropped to half of its maximum value.

Describing the width w.r.t the mean

The RMS (root mean square) is the square root of the second moment xrms =

  • E[x2] =
  • σ2 + µ2.

Note: RooT means standard deviation σ when using RMS

CR (DESY) Probabilities and Statistics

  • 18. March 2013

21 / 62

slide-22
SLIDE 22

Section 3

Examples of important probability distributions

CR (DESY) Probabilities and Statistics

  • 18. March 2013

22 / 62

slide-23
SLIDE 23

Examples for some probability distributions

Most relevant in physics:

Discrete probability distributions

Bernoulli, Binomial, Poisson

Continuous probability distributions

Uniform, Gauss, χ2 Notational information: f (x; y, z) and F(x) smaller case letters are usually for pdfs capital case letters are usually for probabilities (and discrete distributions) the first argument of the distribution is the random variable the arguments after the semicolon are the parameters

CR (DESY) Probabilities and Statistics

  • 18. March 2013

23 / 62

slide-24
SLIDE 24

Bernoulli

Description

An experiment with two possible outcomes; let p be the probability of a success (with k = 1), and q = p − 1 be the probability of failure (k = 0). The distribution can be written as P(k; p) = pkq1−k = pk(1 − p)1−k. Mean value: E[k] = 1

k=0 k · P(k; p) = p

Variance: E[(k− < k >)2] = 1

k=0(k− < k >)2 · P(k; p) = p(1 − p)

Examples are: single coin toss (heads/tails), trigger fires (or not), a channel works (or not)

CR (DESY) Probabilities and Statistics

  • 18. March 2013

24 / 62

slide-25
SLIDE 25

Binomial

Description

If a Bernoulli experiment is repeated n times (with constant conditions!), the probability distribution to have success in k times is given by the binomial distribution P(k; p, n) = n k

  • pk(1 − p)n−k

Mean value: E[k] = n

k=0 k · P(k; p, n) = np

Variance: E[(k − k)2] = n

k=0(k − k)2 · P(k; p, n) = np(1 − p)

Examples: Galton board, efficiencies of multi-layer detectors

CR (DESY) Probabilities and Statistics

  • 18. March 2013

25 / 62

slide-26
SLIDE 26

Example to binomial distribution

Bernoulli experiment:

start ball falls either left or right with equal probability falls right is success random variable r counts the number

  • f successes

Result is a binomial distribution

P(r; 1 2, 7)

CR (DESY) Probabilities and Statistics

  • 18. March 2013

26 / 62

slide-27
SLIDE 27

Poisson

Description

For constant mean value µ, small probabilities p and large numbers n the binomial distributions becomes the Poisson distribution with only one parameter P(k; µ) = µk k! e−µ. The single parameter describes both the mean and the variance! Mean value: E[k] = ∞

k=0 k · P(k; µ) = µ

Variance: E[(k − k)2] = ∞

k=0(k − k)2 · P(k; µ) = µ

CR (DESY) Probabilities and Statistics

  • 18. March 2013

27 / 62

slide-28
SLIDE 28

Example to Poisson distribution

Distribution becomes (more and more) symmetric with increasing µ In limit: continuous Gaussian/normal distribution

CR (DESY) Probabilities and Statistics

  • 18. March 2013

28 / 62

slide-29
SLIDE 29

Uniform Distribution

Description

If a value is distributed equally, with constant amplitude within a certain

  • interval. It’s the most simple continuous distribution

f (x) =

  • 1

b−a,

a ≤ x ≤ b 0, else The mean value is µ = a+b

2

The variance is σ2 = (b−a)2

12

Very important distribution as a tool to generate arbitrarily distributed random numbers perform statistical tests (e.g. in Hypotheses tests) In C/C++ the (pseudo-)random number generator function drand48() exists that creates floating point numbers in [0, 1[.

CR (DESY) Probabilities and Statistics

  • 18. March 2013

29 / 62

slide-30
SLIDE 30

Gaussian (Normal) Distribution

By far the most important distribution, its parameters are directly the mean µ and the variance σ2. f (x; µ, σ2) = 1 √ 2πσ2 e− 1

2 (x−µ)2 σ2

One obtains the standard normal distribution with µ = 0 and σ2 = 1: f (x; 0, 1) = 1 √ 2π e− x2

2

The large importance follows from the central limit theorem. Both the binomial and the Poisson distribution can be approximated by a Gaussian (in the central region!), in the limit of n → ∞ (binomial) or µ → ∞ (Poisson) they become identical.

CR (DESY) Probabilities and Statistics

  • 18. March 2013

30 / 62

slide-31
SLIDE 31

Example: approximation with Gaussian distribution

Consider 100 tosses of a true coin What’s the probability to obtain exactly 55 times heads? Solution: Exact: binomial distribution with p = 1/2, n = 100: P(r = 55; 1 2, 100) = 4.85% Approximation with Gauss: P(x) =

  • ∆x

f (x′)dx′ ≈ f (x)∆x = 1 √ 2πσ2 e− (x−µ)2

2σ2 ∆x

with µ = np = 50, σ2 = np(p − 1) = 25 and x = 55, ∆x = 1: P(55) = 4.84%

CR (DESY) Probabilities and Statistics

  • 18. March 2013

31 / 62

slide-32
SLIDE 32

Example: approximation with Gaussian distribution

Good agreement only in central region

For values r far away from mean not a good approximation! Consider r = 90: Binomial : P(r = 90) ≈ 10−17 Gauss : P(90) ≈ 10−15

CR (DESY) Probabilities and Statistics

  • 18. March 2013

32 / 62

slide-33
SLIDE 33

Integrated Gaussian Distribution

The cumulative distribution for the Gaussian has it’s own symbol: Φ(x) = 1 √ 2πσ2 x

−∞

e−

1 2σ2 (t−µ)2dt.

In math packages for programming usually only the error function is given; e.g. erf(x) (and erfc(x) = 1 - erf(x)): erf(x) = 2 √π x e−t2 so that Φ(x) = 1 2

  • 1 + erf

x − µ √ 2σ2

  • .

CR (DESY) Probabilities and Statistics

  • 18. March 2013

33 / 62

slide-34
SLIDE 34

Properties of the Gaussian distribution

Symmetric around mean value – unlike binomial or Poisson

Probabilities are given in symmetric intervals around µ interval probability within

  • utside

|x − µ| ≤ 1σ 68.27 % 31.73 % |x − µ| ≤ 2σ 95.45 % 4.55 % |x − µ| ≤ 3σ 99.73 % .27 % |x − µ| ≤ 4σ 99.9 % .1 % |x − µ| ≤ 5σ ∼ 1 5.7 10−7

CR (DESY) Probabilities and Statistics

  • 18. March 2013

34 / 62

slide-35
SLIDE 35

The Gaussian distribution and its standard deviation

Take an (approximately) Gaussian distributed variable:

Example:

A radioactive decay with known, large mean number µ of decays We observe with a probability of 68% a value x within the interval µ − σ ≤ x ≤ µ + σ

The (statistical) uncertainty (in physics!)

e.g. in the result of a measurement mW = 80.836 ± 0.015 GeV is meant as standard deviation σ – as if it was a Gaussian distributed random variable.

CR (DESY) Probabilities and Statistics

  • 18. March 2013

35 / 62

slide-36
SLIDE 36

Errors/Uncertainties and the standard deviation

Important special case: Gaussian as a limit of binomial/Poisson

e.g. a radioactive decay The expected standard deviation is the square root of the mean expected number of events σ2 = µ = ⇒ σ = √µ

For other distributions

the uncertainty is taken from the respective variance σ2 = ⇒ see also: parameter estimation

CR (DESY) Probabilities and Statistics

  • 18. March 2013

36 / 62

slide-37
SLIDE 37

χ2 Distribution

One of the most important distributions (at least in physics) for parameter

  • estimation. For x1, ..., xn independent, continuous random variables that

are Gaussian distributed the sum of the normalized squares u = χ2 =

n

  • i=1

(xi − µi)2 σ2

i

follows the χ2 distribution with n degrees of freedom, defined by fn(u) = 1 2 1 Γ n

2

  • u

2 n

2 −1

e− u

2 .

The mean value is the number of degrees of freedom µ = n The variance is twice the number of degrees of freedom σ2 = 2n

CR (DESY) Probabilities and Statistics

  • 18. March 2013

37 / 62

slide-38
SLIDE 38

Examples for χ2 distributions

Shown are: The first five (n=1,...,5) Five other (n=10,20,30,40,50) χ2 distributions (top) and the probabilities to find a value > u, given by 1 − Fn(u) = u fn(x)dx

CR (DESY) Probabilities and Statistics

  • 18. March 2013

38 / 62

slide-39
SLIDE 39

Central Limit Theorem

Let x1, x2, ..., xn be independent, continuous random variables that follow arbitrary density functions with finite expectation values µ1, µ2, ..., µn and finite variances σ2

1, σ2 2, ..., σ2

  • n. In the limit of n → ∞ the probability density

function for the sum w =

n

  • i=1

xi will become Gaussian distributed with a mean value of w = µw =

n

  • i=1

µi and a variance (w − w)2 = σ2

w = n

  • i=1

σ2

i .

CR (DESY) Probabilities and Statistics

  • 18. March 2013

39 / 62

slide-40
SLIDE 40

Section 4

Multidimensional Probability Density Functions Up to now only univariate distributions: a single random variable Now multivariate: start with 2D, then generalize to n dimensions Focus on continuous random distributions (no discrete)

CR (DESY) Probabilities and Statistics

  • 18. March 2013

40 / 62

slide-41
SLIDE 41

Two dimensions

Two random variables x, y are mapped to a single, two-dimensional pdf f (x, y). Probability is defined as P(a ≤ x ≤ b, c ≤ y ≤ d) = b

a

d

c

f (x, y) dx dy , which is normalized for a = c = −∞ and b = d = ∞, so that P(x, y) = 1. The two variables are called independent, when the pdf can be written as product f (x, y) = g(x) · h(y)

CR (DESY) Probabilities and Statistics

  • 18. March 2013

41 / 62

slide-42
SLIDE 42

Mean values and variances in two dimensions

The mean values are defined as E[x] = x · f (x, y) dx dy E[y] = y · f (x, y) dx dy and the variances as σ2

x = E[(x − x)2] =

(x − x)2 · f (x, y) dx dy σ2

y = E[(y − y)2] =

(y − y)2 · f (x, y) dx dy

CR (DESY) Probabilities and Statistics

  • 18. March 2013

42 / 62

slide-43
SLIDE 43

Correlation and Covariance

New concept in more than one dimension! The covariance is defined as σxy = cov(x, y) = E[(x − x)(y − y)] = E[xy] − xy The correlation coefficient is defined as ρxy = cov(x, y) σxσy , ρxy ∈ [−1, +1] For independent variables the correlation coefficient is always zero (beware when turning this around).

CR (DESY) Probabilities and Statistics

  • 18. March 2013

43 / 62

slide-44
SLIDE 44

Meaning of covariance

Covariance is the expectation value of (x − x)(y − y)

The product of deviation from mean value! cov(x, y) > 0: for any value of x > µx it’s more probable to find a value y > µy (and vice versa) cov(x, y) < 0: for any value of x > µx it’s less probable to find a value y > µy (and vice versa)

CR (DESY) Probabilities and Statistics

  • 18. March 2013

44 / 62

slide-45
SLIDE 45

Example: 2D Gauss

The most important example for a 2D distribution is a double Gaussian. Develop the general case in two steps Begin with two independent Gauss distributions With mean values x = ξ, y = η, variances σ2

x, σ2 y:

f (x, y) = g(x)h(y) g(x) = 1

  • 2πσ2

x

e

− 1

2

  • (x−ξ)2

σ2 x

  • h(y)

= 1

  • 2πσ2

y

e

− 1

2

  • (y−η)2

σ2 y

  • =

⇒ f (x, y) = 1 2π

  • σ2

xσ2 y

e

− 1

2

  • (x−ξ)2

σ2 x

+ (y−η)2

σ2 y

  • CR (DESY)

Probabilities and Statistics

  • 18. March 2013

45 / 62

slide-46
SLIDE 46

Example: The full 2D Gauss

General 2D Gauss has five parameters

Mean values x = ξ, y = η Variances σ2

x, σ2 y

The correlation coefficient ρ

Describes ellipsis with principal axes x, y when ρ = 0

f (x, y) = 1 2πσxσy

  • 1 − ρ2 e

1 2(1−ρ2)

  • ( x−ξ

σx )2−2ρ( x−ξ σx )( y−η σy )+( y−η σy )2 CR (DESY) Probabilities and Statistics

  • 18. March 2013

46 / 62

slide-47
SLIDE 47

n dimensions I

Define n random variables x1, ..., xn as a (column) vector

  • x =

    x1 . . xn     the pdf is then a (normalized!) vector function f ( x) = f (x1, ..., xn). The mean values xi = E[xi] =

  • ...
  • xi · f (

x)d x can also be written as vector

  • µ =

    x1 . . xn    

CR (DESY) Probabilities and Statistics

  • 18. March 2013

47 / 62

slide-48
SLIDE 48

n dimensions II

The covariance becomes a symmetric matrix (as well as the correlation matrix) V = V ( x) = E[( x − µ)( x − µ)T] The ith diagonal element is the variance of the ith component of x, the

  • ff-diagonal elements are the covariances of xi, xj.

Vii = E[(xi − xi)(xi − xi)] = σ2

i

Vij = E[(xi − xi)(xj − xj)] = cov(xi, xj) = σij i = j (Special importance of the covariance matrix: remember linear algebra and principal axis transformations.)

CR (DESY) Probabilities and Statistics

  • 18. March 2013

48 / 62

slide-49
SLIDE 49

Example: n-dimensional Gauss

The most important example is (again) a Gaussian distribution. With random variables x, mean values µ and the covariance matrix V : f ( x) = 1 (2π)

n 2

|V | e− 1

2 (

x− µ)V −1( x− µ)T

(|V | is the determinant and V −1 the inverse of the covariance matrix V ).

CR (DESY) Probabilities and Statistics

  • 18. March 2013

49 / 62

slide-50
SLIDE 50

Section 5

Parameter transformation and error propagation Two main use cases Very often it makes sense to change the variable – because another parametrization is much more straightforward or has beneficial properties. The inverted case is even more often: one (or several) quantities are

  • bserved or measured, but another one is of interest

CR (DESY) Probabilities and Statistics

  • 18. March 2013

50 / 62

slide-51
SLIDE 51

Linear transformations in one dimension

Linear y = ax + b

The transformation of mean and variance are calculable: y = E[y] = E[ax + b] = ax + b σ2

y = E[(y − y)2] = a2σ2 x

A straightforward scaling and shifting

CR (DESY) Probabilities and Statistics

  • 18. March 2013

51 / 62

slide-52
SLIDE 52

Non-linear one dimensional transformations: y(x)

Perform a Taylor series evolution of y(x) around the mean value x: y(x) = y(x) + (x − x)∂y ∂x

  • x

+ 1 2(x − x)2 ∂2y ∂x2

  • x

+ ... The mean value is (usually the term of O(σ2

x) is omitted)

E[y] = y ≈ y(x) + 1 2σ2

x

∂2y ∂x2

  • x

For the variance the linear element is σ2

y = E[(y − y)2] ≈ σ2 x

∂y ∂x 2 which is the known error propagation for one variable.

CR (DESY) Probabilities and Statistics

  • 18. March 2013

52 / 62

slide-53
SLIDE 53

Linear transformations in n dimensions I

  • y = B

x: exact solution

with x = (x1, ..., xn)T and y = (x1, ..., xm)T (note that it can be that n = m). Again the transformation of mean and covariance can be calculated analytically: µy = E[ y] = E[B x] = BE[ x] = B µx V ′( y) = E[( y − y)( y − y)T] = E[B( x − x)( x − x)TBT] = BE[( x − x)( x − x)T]BT = B · V · BT Since the transformation is linear, the matrix elements are Bij = ∂yi ∂xj = ∂ ∂xj

  • k

Bikxk

CR (DESY) Probabilities and Statistics

  • 18. March 2013

53 / 62

slide-54
SLIDE 54

Linear transformations in n dimensions II

If n = m, the transformation matrix B is quadratic and probabilities can be expressed directly:

  • ...
  • f (

x)d x =

  • ...
  • g(

y( x))|det J|d y where the functional (or Jacobi) determinant det J enters, with Jij = ∂xi

∂yj .

In the case of a linear transformation this becomes B−1 = J, so that: g( y( x)) = f ( y( x)) 1 |det B|

CR (DESY) Probabilities and Statistics

  • 18. March 2013

54 / 62

slide-55
SLIDE 55

Non-linear transformations: yi = yi(x1, ..., xn) in n dimensions – special case

In the special case n = m the pdf is again directly calculable, as above: g( y( x)) = f ( x) 1 |det B| with the transformation matrix B =      

∂y1 ∂x1 ∂y1 ∂x2

. . .

∂y1 ∂xn ∂y2 ∂x1 ∂y2 ∂x2

. . .

∂y2 ∂xn

. . . . . . ... . . .

∂yn ∂x1 ∂yn ∂x2

. . .

∂yn ∂xn

     

CR (DESY) Probabilities and Statistics

  • 18. March 2013

55 / 62

slide-56
SLIDE 56

Non-linear transformations: yi = yi(x1, ..., xn) in n dimensions – general case I

For the mean value of y evolve y( x) in a Taylor series around the mean values: yi( x) = yi( µx) +

  • k

(xk − µx,k) ∂yi ∂xk

  • xk=xk

+ ... As approximation yi( x) ≈ yi( µx) = ⇒

  • µy =

y( µx)

CR (DESY) Probabilities and Statistics

  • 18. March 2013

56 / 62

slide-57
SLIDE 57

Non-linear transformations: yi = yi(x1, ..., xn) in n dimensions – general case II

For the covariance only the linear approximation is taken into account yi( x) − µy,i ≈

  • k

(xk − µx,k) ∂yi ∂xk which results in V ′

il

= E [(yi − µy,i)(yl − µy,l)] ≈ E  

  • k

(xk − µx,k) ∂yi ∂xk  

j

(xj − µx,j)∂yl ∂xj     = E     

  • k
  • j

∂yi ∂xk

  • Bik

(xk − µx,k)(xj − µx,j)

  • Vkj

∂yl ∂xj

  • Blj

    

CR (DESY) Probabilities and Statistics

  • 18. March 2013

57 / 62

slide-58
SLIDE 58

Non-linear transformations: yi = yi(x1, ..., xn) in n dimensions – general case III

The general formula of error propagation

V ′( y) = B · V ( x) · BT For the recurring case of m = 1 this leads to σ2

y = n

  • i

∂y ∂xi 2 σ2

i

CR (DESY) Probabilities and Statistics

  • 18. March 2013

58 / 62

slide-59
SLIDE 59

Example of non-linear transformation with m = n

Cartesian coordinates x1, x2 are measured with uncertainties σ1, σ2. What are the values and uncertainties in polar coordinates? x1 = r cos ϕ x2 = r sin ϕ = ⇒ y1 = r =

  • x2

1 + x2 2

y2 = φ = arctan x2

x1

B = ∂x1

∂y1 ∂x2 ∂y1 ∂x1 ∂y2 ∂x2 ∂y2

  • =
  • x1

r x2 r −x2

2

r2 x1 r2

  • V ′(r, ϕ)

= BV (x1, x2)BT =

  • x1

r x2 r −x2

2

r2 x1 r2

σ2

1

σ2

2 x1 r −x2

2

r2 x2 r x1 r2

  • =
  • 1

r2 (x2 1σ2 1 + x2σ2 x) xy r3 (σ2 2 − σ2 1) xy r3 (σ2 2 − σ2 1) 1 r4 (x2 1σ2 2 + x2 2σ2 2)

  • =
  • σ2

r

cov(r, ϕ) cov(r, ϕ) σ2

ϕ

  • CR (DESY)

Probabilities and Statistics

  • 18. March 2013

59 / 62

slide-60
SLIDE 60

Example of non-linear transformation with m = n

Consider the measurement of the earth acceleration with a pendulum: g = 2π T 2 l x1 = T x2 = l = ⇒ y1 = g = 2π T 2 l Matrix B: B = ∂y1 ∂x1 ∂y1 ∂x2

  • =
  • −2 g

T g l

  • Covariance matrix:

V ′(y1) = BV (x1, x2)BT = ∂y1 ∂x1 ∂y1 ∂x2 σ2

1

σ2

2

∂y1

∂x1 ∂y1 ∂x2

  • =

∂y1 ∂x1 2 σ2

1 +

∂y1 ∂x2 2 σ2

2

CR (DESY) Probabilities and Statistics

  • 18. March 2013

60 / 62

slide-61
SLIDE 61

Final (!) example of non-linear transformation with m = 1

Consider many (n) independent measurements x1, x2, . . . , xn of a single parameter y. Use general formula: σ2

y = V (y)

= ∂y ∂x1 . . . ∂y ∂xn

  σ2

1

. . . . . . ... . . . . . . σ2

n

     

∂y ∂x1

. . .

∂y ∂xn

   =

n

  • i

∂y ∂xi 2 σ2

i

Frequent task: many independent measurements of single parameter

Contributions to total uncertainty are added in quadrature are weighted with the (partial) derivation

CR (DESY) Probabilities and Statistics

  • 18. March 2013

61 / 62

slide-62
SLIDE 62

Summary

Relax! And in a nutshell presented the basic tools to perform operations on distributions presented the most often used distributions presented one of the most important concepts: error propagation/parameter transformation We’ll continue tomorrow at a lot slower pace.

CR (DESY) Probabilities and Statistics

  • 18. March 2013

62 / 62