o Constrained fits with non-normal distributions R. Frhwirth 1 , 2 - - PowerPoint PPT Presentation

o constrained fits with non normal distributions
SMART_READER_LITE
LIVE PREVIEW

o Constrained fits with non-normal distributions R. Frhwirth 1 , 2 - - PowerPoint PPT Presentation

Introduction A Simple Example Other Applications Summary and Outlook o Constrained fits with non-normal distributions R. Frhwirth 1 , 2 and O. Cencic 3 1 Institute of High Energy Physics Austrian Academy of Sciences 2 Institute of Statistics


slide-1
SLIDE 1

Introduction A Simple Example Other Applications Summary and Outlook

  • Constrained fits with non-normal distributions
  • R. Frühwirth1,2 and O. Cencic3

1Institute of High Energy Physics

Austrian Academy of Sciences

2Institute of Statistics and Mathematical Methods in Economics

Vienna University of Technology

3Institute of Water Quality, Resources and Waste Management

Vienna University of Technology

Connecting the Dots 2015 Berkeley, 9–11 February 2015

  • R. Frühwirth

1 CTD 2015

slide-2
SLIDE 2

Introduction A Simple Example Other Applications Summary and Outlook

Outline

1

Introduction

2

A Simple Example

3

Other Applications

4

Summary and Outlook

  • R. Frühwirth

2 CTD 2015

slide-3
SLIDE 3

Introduction A Simple Example Other Applications Summary and Outlook

Outline

1

Introduction

2

A Simple Example

3

Other Applications

4

Summary and Outlook

  • R. Frühwirth

3 CTD 2015

slide-4
SLIDE 4

Introduction A Simple Example Other Applications Summary and Outlook

Motivation

Material flow

Motivation for this work was data reconciliation in material flow Trace valuable materials in cycle of production, consumption, waste deposit, recycling,. . . Data are often just expert assessments and thus not normally distributed, but uniform, triangular, trapezoidal, etc. Improve quality by imposing constraints: conservation of mass

Example flow chart

  • R. Frühwirth

4 CTD 2015

slide-5
SLIDE 5

Introduction A Simple Example Other Applications Summary and Outlook

Motivation

Kinematic fit

Very similar problem Track parameters are not always normally distributed, e.g. for electrons fitted with the Gaussian-sum filter Constraints are given by conservation of momentum and energy

Example process

Scattering Process Particle 1 Particle 2 Particle 3 Particle 4

  • R. Frühwirth

5 CTD 2015

slide-6
SLIDE 6

Introduction A Simple Example Other Applications Summary and Outlook

Motivation

Vertex fit

Again, track parameters are not always normally distributed Constraints are given by common vertex Constraints may contain unobserved variables

Combination of experiments

Measurements are often not normally distributed, in particular if the measured parameter is non-negative Constraints are given by the fact that the same quantity is measured Feasible only if experiments publish full marginal likelihood (posterior density)

  • R. Frühwirth

6 CTD 2015

slide-7
SLIDE 7

Introduction A Simple Example Other Applications Summary and Outlook

Motivation

Vertex fit

Again, track parameters are not always normally distributed Constraints are given by common vertex Constraints may contain unobserved variables

Combination of experiments

Measurements are often not normally distributed, in particular if the measured parameter is non-negative Constraints are given by the fact that the same quantity is measured Feasible only if experiments publish full marginal likelihood (posterior density)

  • R. Frühwirth

7 CTD 2015

slide-8
SLIDE 8

Introduction A Simple Example Other Applications Summary and Outlook

Principle and details

Principle idea

Restrict joint prior density of all observed and unobserved variables to the constraint manifold Unobserved variables are included by an uninformative or weakly informative prior Renormalize restricted posterior density to 1

Details

A detailed description of the algorithm can be found in:

  • O. Cencic and R. Frühwirth, A general framework for data

reconciliation—Part I: Linear constraints Computers and Chemical Engineering, in press Available at: http://tinyurl.com/CencicFruhwirth

  • R. Frühwirth

8 CTD 2015

slide-9
SLIDE 9

Introduction A Simple Example Other Applications Summary and Outlook

Principle and details

Principle idea

Restrict joint prior density of all observed and unobserved variables to the constraint manifold Unobserved variables are included by an uninformative or weakly informative prior Renormalize restricted posterior density to 1

Details

A detailed description of the algorithm can be found in:

  • O. Cencic and R. Frühwirth, A general framework for data

reconciliation—Part I: Linear constraints Computers and Chemical Engineering, in press Available at: http://tinyurl.com/CencicFruhwirth

  • R. Frühwirth

9 CTD 2015

slide-10
SLIDE 10

Introduction A Simple Example Other Applications Summary and Outlook

Graphical illustration

A problem in 2D, x1 = x2

15 10 Prior density of (x1,x2) x1 5 5 10 0.05 0.04 0.03 0.02 0.01 15 x2 15 Prior density cut along x1=x2 10 x1 5 5 10 0.05 0.04 0.03 0.02 0.01 15 x2 x1 5 10 15 0.05 0.1 0.15 0.2 0.25 0.3 7=3.75 <=2.37 7=3.21 <=1.52 Marginal density of x1 prior posterior x2 5 10 15 0.05 0.1 0.15 0.2 0.25 0.3 7=4.00 <=4.00 7=3.21 <=1.52 Marginal density of x2 prior posterior

  • R. Frühwirth

10 CTD 2015

slide-11
SLIDE 11

Introduction A Simple Example Other Applications Summary and Outlook

Graphical illustration

A problem in 3D, x3 = x1 + x2

  • R. Frühwirth

11 CTD 2015

slide-12
SLIDE 12

Introduction A Simple Example Other Applications Summary and Outlook

Outline

1

Introduction

2

A Simple Example

3

Other Applications

4

Summary and Outlook

  • R. Frühwirth

12 CTD 2015

slide-13
SLIDE 13

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Observables

Three measurements X1, X2, X3 of a small cross section x Three experimental densities f1(x1), f2(x2), f3(x3), support restricted to the positive axis Considered as prior densities in this context Densities need not be given in closed form Must be possible to compute the densities and to draw random numbers from them In this simple example: X1 ∼Ex(1.1) Exponential X2 ∼Ga(2,0.5) Gamma X3 ∼TrNorm(0,1.2,0,∞) Half Normal Assume independence at the moment, will allow correlations later

  • R. Frühwirth

13 CTD 2015

slide-14
SLIDE 14

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Prior densities

x

2 4

f(x)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 7=1.10 <=1.10

X1

prior

x

2 4

f(x)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 7=1.00 <=0.71

X2

prior

x

2 4

f(x)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 7=0.96 <=0.72

X3

prior

  • R. Frühwirth

14 CTD 2015

slide-15
SLIDE 15

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Constraints

Want to combine the measurements by imposing X1 = X2 = X3 If the posteriors are normal densities, the combined measurement is the weighted mean If not, we compute the joint density of of (X1, X2, X3) under the constraints X1 = X2 and X1 = X3 The constraint manifold is the line x = λ(1, 1, 1)T There is one free variable, which we choose to be y = x1 The dependent variables z = (x2, x3)T are functions of y: z = −Dy − d

  • r

Iz + Dy + d = 0 with: z = x2 x3

  • ,

y = x1, D = −1 −1

  • ,

d =

  • R. Frühwirth

15 CTD 2015

slide-16
SLIDE 16

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Derivation of the posterior

The joint prior density of the measurements is given by: f(x) = f1(x1) · f2(x2) · f3(x3) = ff(y) · fd(z) We compute the posterior density of x conditional on the constraints The posterior is the prior restricted to the constraint manifold, renormalized to 1 It is easy to show that the posterior of y is given by: π(y) = fd(−Dy − d) · ff(y)

  • fd(−Dy − d) · ff(y) dy

π(y) is also called the target density

  • R. Frühwirth

16 CTD 2015

slide-17
SLIDE 17

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Normalization of the posterior

In this example the integral can be computed by numerical integration In more complex cases this may get rather tedious, in particular if the dimension of y is large Explicit calculation of the integral can be avoided by drawing a random sample from the posterior π(y) by Markov chain Monte Carlo (MCMC) We use the Metropolis-Hastings algorithm for sampling Need a proposal density p(y) to generate values of the free variables No need to draw from the dependent variables Independence sampler most suitable in this context

  • R. Frühwirth

17 CTD 2015

slide-18
SLIDE 18

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Generating the Markov chain

1

Set i = 1, choose the sample size L and draw the starting value y1 from p(y)

2

Draw a proposal value ˆ y from p(y)

3

Compute the acceptance probability α by α(yi, ˆ y) = min

  • 1, π(ˆ

y) p(yi) π(yi) p(ˆ y)

  • 4

Draw a uniform random number u ∈ [0, 1]

5

If u ≤ α, accept the proposal and set yi+1 = ˆ y, otherwise set yi+1 = yi

6

Increase i by 1. If i < L, go to 2, otherwise stop sampling

  • R. Frühwirth

18 CTD 2015

slide-19
SLIDE 19

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Advantages of the independence sampler

There is a natural proposal density: p(y) = ff(y) There is need for “burn-in” If the observations are independent, the acceptance probability has a very simple form: α(y, ˆ y) = min

  • 1, fd(−Dˆ

y − d) fd(−Dy − d)

  • Sampler can be SIMDized by precomputing proposal values and

their pdf values Sampler can be parallelized by generating several independent Markov chains on different cores and combining them afterwards

  • R. Frühwirth

19 CTD 2015

slide-20
SLIDE 20

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Posterior analysis

The generated chain is a non-independent random sample Y from the posterior π(y) of the free variables The corresponding sample Z of the dependent variables z is calculated by zi = −Dyi − d, i = 1, . . . , L Posterior means, variances, correlations, quantiles and credible intervals are estimated from the complete sample X = (Z; Y) Marginal densities are smoothed before graphical representation A measure of goodness can be obtained by computing the discrepancy between the prior densities and the posterior marginals Examples:

Kolmogorov-Smirnov distance dKS Hellinger distance dH = √ 1 − BC (BC=Bhattacharya coefficient)

Small acceptance rate also indicates poor fit

  • R. Frühwirth

20 CTD 2015

slide-21
SLIDE 21

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Posterior density

x

2 4

f(x)

0.2 0.4 0.6 0.8 1 1.2 1.4 7=1.10 <=1.10 7=0.57 <=0.38

X1

prior posterior

x

2 4

f(x)

0.2 0.4 0.6 0.8 1 1.2 1.4 7=1.00 <=0.71 7=0.57 <=0.38

X2

prior posterior

x

2 4

f(x)

0.2 0.4 0.6 0.8 1 1.2 1.4 7=0.96 <=0.72 7=0.57 <=0.38

X3

prior posterior

  • R. Frühwirth

21 CTD 2015

slide-22
SLIDE 22

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Posterior discrepancies

X1 X2 X3 dKS 0.272 0.295 0.280 dH 0.316 0.285 0.287

Key properties of posterior

Mean STD Median 95%-Quantile 0.5721 0.3794 0.4912 1.3081

  • R. Frühwirth

22 CTD 2015

slide-23
SLIDE 23

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Gaussian approximation

Approximate each prior by a Gaussian with the same mean and variance Posterior is then Gaussian too, can be computed explicitely by weighted mean Useful cross-check of sampler Posterior Gaussian extends into negative axis

  • R. Frühwirth

23 CTD 2015

slide-24
SLIDE 24

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Approximating densities

x

  • 2

2 4

f(x)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 7=1.10 <=1.10

X1

prior

  • appr. prior

x

  • 2

2 4

f(x)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 7=1.00 <=0.71

X2

prior

  • appr. prior

x

  • 2

2 4

f(x)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 7=0.96 <=0.72

X3

prior

  • appr. prior
  • R. Frühwirth

24 CTD 2015

slide-25
SLIDE 25

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Posterior density

x

  • 2

2 4

f(x)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 7=1.00 <=0.46

X1

prior

  • appr. prior

posterior

x

  • 2

2 4

f(x)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 7=1.00 <=0.46

X2

prior

  • appr. prior

posterior

x

  • 2

2 4

f(x)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 7=1.00 <=0.46

X3

prior

  • appr. prior

posterior

  • R. Frühwirth

25 CTD 2015

slide-26
SLIDE 26

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Posterior discrepancies

X1 X2 X3 dKS 0.231 0.104 0.126 dH 0.396 0.209 0.220

Key properties of approximate posterior

Mean STD Median 95%-Quantile 1.0027 0.4604 1.0027 1.7577

Key properties of exact posterior

Mean STD Median 95%-Quantile 0.5721 0.3794 0.4912 1.3081

  • R. Frühwirth

26 CTD 2015

slide-27
SLIDE 27

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Posterior discrepancies

X1 X2 X3 dKS 0.231 0.104 0.126 dH 0.396 0.209 0.220

Key properties of approximate posterior

Mean STD Median 95%-Quantile 1.0027 0.4604 1.0027 1.7577

Key properties of exact posterior

Mean STD Median 95%-Quantile 0.5721 0.3794 0.4912 1.3081

  • R. Frühwirth

27 CTD 2015

slide-28
SLIDE 28

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Correlated measurements

The sampler also works with non-independent measurements Construct a joint prior with correlations from the given marginals via a copula distribution Have chosen a Gaussian copula g(u1, u2, u3) with ρ = 0.25 Joint prior: h(x1, x2, x3) = g(F1(x1), F2(x2), F3(x3)) · f1(x1) · f2(x2) · f3(x3) Target density: π(x1) ∝ h(x1, x1, x1) Proposal density is still f1(x1), but computation of α now according to the general formula

  • R. Frühwirth

28 CTD 2015

slide-29
SLIDE 29

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Posterior densities, ρ = 0.25

x

2 4

f(x)

0.5 1 1.5 2 2.5 7=0.43 <=0.41

X1

prior posterior

x

2 4

f(x)

0.5 1 1.5 2 2.5 7=0.43 <=0.41

X2

prior posterior

x

2 4

f(x)

0.5 1 1.5 2 2.5 7=0.43 <=0.41

X3

prior posterior

  • R. Frühwirth

29 CTD 2015

slide-30
SLIDE 30

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Key properties of posterior with ρ = 0.25

Mean STD Median 95%-Quantile 0.4320 0.4148 0.3096 1.2759

Key properties of posterior with ρ = 0

Mean STD Median 95%-Quantile 0.5721 0.3794 0.4912 1.3081

  • R. Frühwirth

30 CTD 2015

slide-31
SLIDE 31

Introduction A Simple Example Other Applications Summary and Outlook

Combination of measurements

Key properties of posterior with ρ = 0.25

Mean STD Median 95%-Quantile 0.4320 0.4148 0.3096 1.2759

Key properties of posterior with ρ = 0

Mean STD Median 95%-Quantile 0.5721 0.3794 0.4912 1.3081

  • R. Frühwirth

31 CTD 2015

slide-32
SLIDE 32

Introduction A Simple Example Other Applications Summary and Outlook

Outline

1

Introduction

2

A Simple Example

3

Other Applications

4

Summary and Outlook

  • R. Frühwirth

32 CTD 2015

slide-33
SLIDE 33

Introduction A Simple Example Other Applications Summary and Outlook

Other applications

Extensions of the algorithm

Unobserved variables get an uninformative or weakly informative prior (see paper) Inequality constraints are reduced to equality constraints by introducing slack variables (see paper) Non-linear constraints are Taylor-expanded to linear ones (work in progress)

Vertex fit

Track parameters can have non-normal track errors, for instance electrons fitted with Gaussian-sum filter Vertex constraints are non-linear Vertex position enters with or without prior information Vertex can be additionally constrained to a line (e.g. beam line), a plane (e.g. target foil) or a volume (e.g. interaction region)

  • R. Frühwirth

33 CTD 2015

slide-34
SLIDE 34

Introduction A Simple Example Other Applications Summary and Outlook

Other applications

Extensions of the algorithm

Unobserved variables get an uninformative or weakly informative prior (see paper) Inequality constraints are reduced to equality constraints by introducing slack variables (see paper) Non-linear constraints are Taylor-expanded to linear ones (work in progress)

Vertex fit

Track parameters can have non-normal track errors, for instance electrons fitted with Gaussian-sum filter Vertex constraints are non-linear Vertex position enters with or without prior information Vertex can be additionally constrained to a line (e.g. beam line), a plane (e.g. target foil) or a volume (e.g. interaction region)

  • R. Frühwirth

34 CTD 2015

slide-35
SLIDE 35

Introduction A Simple Example Other Applications Summary and Outlook

Generalizations

Kinematic fit

Momentum conservation gives linear constraints, if momentum suitably parameterized Energy conservation gives non-linear constraints Missing energy can be given an uninformative or an informative prior, and can be restricted to positive values

  • R. Frühwirth

35 CTD 2015

slide-36
SLIDE 36

Introduction A Simple Example Other Applications Summary and Outlook

Outline

1

Introduction

2

A Simple Example

3

Other Applications

4

Summary and Outlook

  • R. Frühwirth

36 CTD 2015

slide-37
SLIDE 37

Introduction A Simple Example Other Applications Summary and Outlook

Summary and Outlook

Summary

Have developed method to impose linear or non-linear constraints on non-normal observations Constraints may include unobserved variables Inequality constraints can be dealt with Independence sampler easy to vectorize and to parallelize

Outlook

Will study non-linear constraints in more detail, in particular performance penalty Will study gross error detection and robustification in case

  • f poor fit or zero acceptance rate

Intend to apply the method to electrons fitted with GSF: vertex fit, kinematic fit

  • R. Frühwirth

37 CTD 2015

slide-38
SLIDE 38

Introduction A Simple Example Other Applications Summary and Outlook

Summary and Outlook

Summary

Have developed method to impose linear or non-linear constraints on non-normal observations Constraints may include unobserved variables Inequality constraints can be dealt with Independence sampler easy to vectorize and to parallelize

Outlook

Will study non-linear constraints in more detail, in particular performance penalty Will study gross error detection and robustification in case

  • f poor fit or zero acceptance rate

Intend to apply the method to electrons fitted with GSF: vertex fit, kinematic fit

  • R. Frühwirth

38 CTD 2015

slide-39
SLIDE 39

Introduction A Simple Example Other Applications Summary and Outlook

Summary and Outlook

Thank you!

  • R. Frühwirth

39 CTD 2015