Bayesian perspective on QCD global analysis In collaboration with: - - PowerPoint PPT Presentation

bayesian perspective on qcd global analysis
SMART_READER_LITE
LIVE PREVIEW

Bayesian perspective on QCD global analysis In collaboration with: - - PowerPoint PPT Presentation

Bayesian perspective on QCD global analysis In collaboration with: Nobuo Sato University of Connecticut/JLab A. Accardi DIS18, E. Nocera Kobe, Japan, W. Melnitchouk April 16-20, 2018 1 / 17 Bayesian methodology in a nutshell In QCD


slide-1
SLIDE 1

1 / 17

Bayesian perspective on QCD global analysis Nobuo Sato

University of Connecticut/JLab DIS18, Kobe, Japan, April 16-20, 2018 In collaboration with:

  • A. Accardi
  • E. Nocera
  • W. Melnitchouk
slide-2
SLIDE 2

Bayesian methodology in a nutshell

2 / 17

In QCD global analysis PDFs are parametrized at some scale Q0. e.g. f(x) = Nxa(1 − x)b(1 + c√x + dx + ...) f(x) = Nxa(1 − x)bNN(x; {θ, wi}) “fitting” is essentially estimation of E[f] =

  • dna P(a|data) f(a)

V[f] =

  • dna P(a|data) (f(a) − E[f])2

The probability density P is given by the Bayes’ theorem P(f|data) = 1 Z L(data|f)π(f) a = (N, a, b, c, d, ...)

slide-3
SLIDE 3

Bayesian methodology in a nutshell

3 / 17

The likelihood function is not unique. A standard choice is the Gaussian likelihood L(d|a) = exp

  • −1

2

  • i

di − thyi(a)

δdi

2

Priors are design to veto unphysical regions in parameter

  • space. e.g.

π(a) =

  • i

θ(ai − amin

i

)θ(amax

i

− ai) How do we compute E[f], V[f]?

+ Maximum likelihood + Monte Carlo methods

slide-4
SLIDE 4

Maximum Likelihood

4 / 17

Estimation of expectation value E[f] =

  • dna P(a|data) f(a) ≃ f(a0)

a0 is estimated from optimization algorithm max [P(a|data)] = P(a0|data) max [L(data|a)π(a)] = L(data|a0)π(a0)

  • r equivalently Chi-squared minimization

min [−2 log (L(data|a)π(a))] = −2 log (L(data|a0)π(a0)) =

  • i

di − thyi(a0)

δdi

2

− 2 log (π(a0)) = χ2(a0) − 2 log (π(a0))

slide-5
SLIDE 5

Maximum Likelihood

5 / 17

Estimation of variance (Hessian method) V[f] =

  • dna P(a|data) (f(a) − E[f])2

  • k

f(tk = 1) − f(tk = −1)

2

2

It relies on factorization of P(a|data) along eigen directions P(a|data) ∝

  • k

exp

  • −1

2t2

k

  • + O(∆a3)

and linear approximation of f(a) (f(a) − E[f])2 =

  • k

∂f ∂tk tk

2

+ O(a3)

slide-6
SLIDE 6

Maximum Likelihood

6 / 17

pros + Very practical. Most PDF groups use this method + It is computationally inexpensive + f and its eigen directions can be precalculated/tabulated cons + Assumes local Gaussian approximation of the likelihood + Assumes linear approximation of the observables O around a0 + The assumptions are strictly valid for linear models. + Computation of the Hessian matrix is numerically unstable if flat directions are present examples → if f(x) = a + bx + cx2 then E[f(x)] = E[a] + E[b]x + E[c]x2 → but f(x) = Nxa(1 − x)b then E[f(x)] = E[N]xE[a](1 − x)E[b]

slide-7
SLIDE 7

Monte Carlo Methods

7 / 17

Recall that we are interested in computing E[f] =

  • dna P(a|data) f(a)

V[f] =

  • dna P(a|data) (f(a) − E[f])2

Any MC method attempts to do this using MC sampling E[f] ≃

  • k

wkf(ak) V[f] ≃

  • k

wk(f(ak) − E[f])2 i.e to construct the sample distribution {wk, ak} of the parent distribution P(a|data)

slide-8
SLIDE 8

Monte Carlo Methods

8 / 17

Resampling + cross validation Nested Sampling (NS) Hybrid Markov chain (HMC); Gabin Gbedo, Mangin-Brinet (2017)

0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4

x∆u+

JAM17 JAM15 0.2 0.4 0.6 0.8 1 −0.15 −0.10 −0.05

x∆d+

0.4 0.8 10−3 10−2 10−1 −0.04 −0.02 0.02 0.04 x(∆¯

u + ∆ ¯ d)

DSSV09 0.4 0.8 10−3 10−2 10−1 −0.04 −0.02 0.02 0.04

x(∆¯ u − ∆ ¯ d)

0.4 0.8 x 10−3 10−2 10−1 −0.04 −0.02 0.02 0.04 x∆s+ JAM17 + SU(3) 0.4 0.8 x 10−3 10−2 10−1 −0.1 −0.05 0.05 0.1 x∆s−

resampling + CV, Ethier et al (2017)

0.2 0.4 0.6

x

–3 –2 –1 1

hu

1

hd

1 0.2 0.4 0.6

z

–0.4 –0.2 0.2 0.4

zH⊥(1)

1(fav)

zH⊥(1)

1(unf)

0.2 0.4

δu

–1.2 –0.8 –0.4

δd

SIDIS SIDIS+lattice (a) 0.5 1

gT

2 4 6

normalized yield (b)

SIDIS+lattice SIDIS

Nested Sampling, Lin et al (2018)

slide-9
SLIDE 9

Resampling+cross validation (R+CV)

9 / 17

Resample the data points within quoted uncertainties using Gaussian statistics d(pseudo)

k,i

= d(exp)

i

+ σ(exp)

i

Rk,i Fit each pseudo data sample k = 1, .., N to obtain parameter vectors ak: P(a|data) → {wk = 1/N, ak} For large number of parameters, split the data into tranining and validation sets and find ak that best describes the validation sample

sampler priors fit fit fit posteriors

  • riginal data

pseudo data training data fit parameters from minimization steps validation data validation posterior as initial guess prior

slide-10
SLIDE 10

Nested Sampling (NS)

10 / 17

The basic idea: compute Z =

  • L(data|a)π(a)dna =

1

L(X)dX + The procedure collects samples from isolikelihoods and they are weighted by their likelihood values + Insensitive to local minima → faithful conversion of P(a|data) → {wk, ak} + Multiple runs can be combined into one single run → the procedure can be parallelized

L(data|a) in a space L(X) in X space

  • arXiv:astro-ph/0508461v2
  • arXiv:astro-ph/0701867v2
  • arxiv.org/abs/1703.09701
slide-11
SLIDE 11

Comparison between the methods

11 / 17

Given a likelihood, does the evaluation of E[f] and V[f] depend on the method? → use stress testing numerical example Setup:

+ Simulate a synthetic data via rejection sampling + Estimate E[f] and V[f] using different methods

10−2 10−1 100

x

1 2 3

f(x)

10−2 10−1 100

x

1 2 3

f(x)

slide-12
SLIDE 12

Comparison between the methods

12 / 17

10−3 10−2 10−1 100

x

1 2 3

f(x)

NS

0.00 0.25 0.50 0.75 1.00

x

0.00 0.01 0.02 0.03 0.04 0.05

δf(x)

HESS NS R RCV(50/50)

40 50 60 70 80

tf

0.6 0.8 1.0 1.2 1.4 (δf/f)/(δf/f)NS

x = 0.1 x = 0.3 x = 0.5 x = 0.7

HESS, NS and R provide the same uncertainty R+CV over estimates the uncertainty by roughly a factor of 2 Uncertainties also depends on training fraction (tf) The results confirmed also within a neural net parametrization

slide-13
SLIDE 13

Beyond gaussian likelihood

13 / 17

The Gaussian likelihoods are not adequate to describe uncertainties in the presence of incompatible data sets Example:

+ Two measurements of a quantity m: (m1, δm1), (m2, δm2) + The expectation value and variance can be computed exactly E[m] = m1δm2 + m2δm1 δm2

2 + δm2 1

V[m] = δm2

2δm2 1

δm2

2 + δm2 1

+ note: V[m] is independent of |m1 − m2|

To obtain more realistic uncertainties, the likelihood function needs to be modified. (e.g. Tolerance criterion)

slide-14
SLIDE 14

Likelihood profile in CJ15

14 / 17

−100 −50 50 100 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Likelihood

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

−100 −50 50 100 2000 4000 6000 8000 10000 ∆χ2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

−10 −5 5 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Likelihood

1 2 3 5 8 12 17 18 27 28 29 30 31 34

−10 −5 5 10 20 40 60 80 100 ∆χ2

(0) TOTAL (1) HerF2pCut (2) slac p (3) d0Lasy13 (4) e866pd06xf (5) BNS F2nd (6) NmcRatCor (7) slac d (8) D0 Z (9) H2 NC ep 3 (10) H2 NC ep 2 (11) H2 NC ep 1 (12) H2 NC ep 4 (13) CDF Wasy (14) H2 CC ep (15) cdfLasy05 (16) NmcF2pCor (17) e866pp06xf (18) H2 CC em (19) d0run2cone (20) d0 gamjet1 (21) CDFrun2jet (22) d0 gamjet3 (23) d0 gamjet2 (24) d0 gamjet4 (25) jl00106F2d (26) HerF2dCut (27) BcdF2dCor (28) CDF Z (29) D0 Wasy (30) H2 NC em (31) jl00106F2p (32) d0Lasy e15 (33) BcdF2pCor

−1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 Projection

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(0) a1uv (1) a2uv (2) a4uv (3) a1dv (4) a2dv (5) a3dv (6) a4dv (7) a0ud (8) a1ud (9) a2ud (10) a4ud (11) a1du (12) a2du (13) a4du (14) a1g (15) a2g (16) a3g (17) a4g (18) a6dv (19) off1 (20) off2 (21) ht1 (22) ht2 (23) ht3

24 parameters, 33 data sets Eigen direction without incompatibilities

slide-15
SLIDE 15

Likelihood profile in CJ15

15 / 17

−100 −50 50 100 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Likelihood

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

−100 −50 50 100 2000 4000 6000 8000 10000 ∆χ2

9 10 11 13 15 16 18 19 20 21 22 23 24 25

−10 −5 5 10 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Likelihood

1 2 3 4 5 6 7 8 9 10 11 12 13 14 16 17 18 19 20 22 23 26 27 28 29 30 31 32 33 34

−10 −5 5 10 20 40 60 80 100 ∆χ2

(0) TOTAL (1) HerF2pCut (2) slac p (3) d0Lasy13 (4) e866pd06xf (5) BNS F2nd (6) NmcRatCor (7) slac d (8) D0 Z (9) H2 NC ep 3 (10) H2 NC ep 2 (11) H2 NC ep 1 (12) H2 NC ep 4 (13) CDF Wasy (14) H2 CC ep (15) cdfLasy05 (16) NmcF2pCor (17) e866pp06xf (18) H2 CC em (19) d0run2cone (20) d0 gamjet1 (21) CDFrun2jet (22) d0 gamjet3 (23) d0 gamjet2 (24) d0 gamjet4 (25) jl00106F2d (26) HerF2dCut (27) BcdF2dCor (28) CDF Z (29) D0 Wasy (30) H2 NC em (31) jl00106F2p (32) d0Lasy e15 (33) BcdF2pCor

−0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 Projection

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(0) a1uv (1) a2uv (2) a4uv (3) a1dv (4) a2dv (5) a3dv (6) a4dv (7) a0ud (8) a1ud (9) a2ud (10) a4ud (11) a1du (12) a2du (13) a4du (14) a1g (15) a2g (16) a3g (17) a4g (18) a6dv (19) off1 (20) off2 (21) ht1 (22) ht2 (23) ht3

24 parameters, 33 data sets Eigen direction with incompatibilities Modified likelihood function is needed

slide-16
SLIDE 16

Beyond gaussian likelihood

16 / 17

Tolerance criterion (standard choice) Disjoint likelihood function. e.g. joint: L(m1, m2|m; δm1δm2) = L(m1|m; δm1)L(m2|m; δm2) E[m] = m1δm2 + m2δm1 δm2

2 + δm2 1

V[m] = δm2

2δm2 1

δm2

2 + δm2 1

disjoint: L(m1, m2|m; δm1δm2) = 1 2(L(m1|m; δm1) + L(m2|m; δm2)) E[m] = 1 2(m1 + m2) V[m] = 1 2(δm2

1 + δm2 2) +

m1 − m2

2

2

Empirical Bayes, hierarchical Bayes ... Many alternatives still to be explored

slide-17
SLIDE 17

Summary and outlook

17 / 17

+ Bayesian formulation for global analysis provides a more general perspective for global fits than the traditional chi-squared minimization + MC approaches are useful to explore new likelihood functions and priors + Uncertainties on PDFs depend on parametrization as well as assumptions about the likelihood function and the priors + Given the likelihood function and priors, uncertainties on PDFs should be independent of the parametrization in the region where PDFs can be constrained + Also the results should be independent of the MC sampling method