MaCh3 and Bayesian Analysis Patrick Dunne Outline Introduce T2K - - PowerPoint PPT Presentation

mach3 and bayesian analysis
SMART_READER_LITE
LIVE PREVIEW

MaCh3 and Bayesian Analysis Patrick Dunne Outline Introduce T2K - - PowerPoint PPT Presentation

MaCh3 and Bayesian Analysis Patrick Dunne Outline Introduce T2K method for analysis How to interpret Bayesian results Describe MaCh3 framework How do you put a new detector in? How long does it take to run? P. Dunne


slide-1
SLIDE 1

MaCh3 and Bayesian Analysis

Patrick Dunne

slide-2
SLIDE 2

Outline

  • Introduce T2K method for analysis
  • How to interpret Bayesian results
  • Describe MaCh3 framework
  • How do you put a new detector in?
  • How long does it take to run?

11/12/2019

  • P. Dunne

2

slide-3
SLIDE 3

T2K oscillation parameter fit

11/12/2019

  • P. Dunne

3

Near Detector Model Flux Model Cross-section Model Far Detector Model Event Rate and Distribution Model Oscillation Parameters

  • Apply oscillation effects to Monte Carlo as a function of true Eν
  • Construct model to predict event rates and distributions at near and far detectors
  • Need to ensure experiment can constrain non-oscillation elements of model
  • Cross-section model highly dependent on nuclear effects
  • Do this by fitting rather than direct ND distribution to FD distribution extrapolation

Interaction rates Detector uncertainties Unoscillated Monte Carlo Oscillation Probability Near Detector Prediction Far Detector Prediction

slide-4
SLIDE 4

Reweighting

  • MaCh3 has full access to event-by-event

kinematic information

  • Enables reweights with functional forms on any

event variable

  • Also enables shifts of variables e.g. Removal energy

can actually change event momentum and put in a different bin

  • We also have standard bin by bin (separated by

mode, flavor) response functions and linear normalisations implemented

12/12/2019

  • P. Dunne

4

slide-5
SLIDE 5

Fitting to data

11/12/2019

  • P. Dunne

5

ND280 Detector Model Flux Model Hadron Production Data INGRID/Beam Monitor Data External Cross- section Data Cross-section Model Super-K Detector Model Super-K Atmospheric Data Far detector fit Oscillation Parameters Near detector fit ND280 Data Super-K Data Samples Two approaches used by T2K for fitting: 1. Use ND data fit to constrain flux and cross-section models first then fit far detector

  • Computationally easier
  • Makes more assumptions

2. Perform simultaneous fit of both detectors

  • Computationally more demanding
  • Makes fewer assumptions
slide-6
SLIDE 6

Fitting to data

11/12/2019

  • P. Dunne

6

ND280 Detector Model Flux Model Hadron Production Data INGRID/Beam Monitor Data External Cross- section Data Cross-section Model Super-K Detector Model Super-K Atmospheric Data Simultaneous fit Oscillation Parameters ND280 Data Super-K Data Samples Two approaches used for fitting: 1. Use ND data fit to constrain flux and cross-section models first then fit far detector

  • Computationally easier
  • Makes more assumptions

2. Perform simultaneous fit of both detectors (MaCh3 does this)

  • Computationally more demanding
  • Makes fewer assumptions
slide-7
SLIDE 7

T2K analyses

  • T2K has three separate analysis frameworks: two fit near detector first and propagate, one does joint fit
  • Joint fit analysis is Bayesian, one of separate fitters is frequentist and the other is a mix
  • All three able to construct frequentist confidence intervals for comparisons
  • Very good agreement is seen, cross-validation highly useful for debugging

11/12/2019

  • P. Dunne

7

Bayesian analysis shows posterior probability density (high values mean more likely this is the “correct” parameter value) Frequentist analyses show Δχ2 (low values mean better agreement with the data for this parameter value)

slide-8
SLIDE 8

Dealing with nuisance parameters

  • Likelihood has >750 parameters but want

plots in ≤2 of them at once

  • Two main options:
  • Profiling: Pick values of nuisance params that

maximise likelihood for each set of values of parameters of interest

  • Marginalisation: Integrate over nuisance

parameters (Bayesian so MaCh3 does this)

  • Also finding maximum likelihood point for

given osc par values is hard in 750 dimensions

11/12/2019

  • P. Dunne

8

slide-9
SLIDE 9

16/12/2019

  • Other T2K analyses use random

throws of nuisance parameters from covariance matrices to marginalise

  • Then do a grid search in 1D/2D

calculating average Δ𝜓2 across ensemble of marginalisation throws

  • Use Feldman-Cousins to find critical

Δ𝜓2 values for δCP

11/12/2019

  • P. Dunne

MCMC vs grid search

slide-10
SLIDE 10

16/12/2019

MCMC vs grid search

  • MaCh3 samples likelihood space with

Markov Chain MC

  • Rule for stepping in parameter space

ensures distribution of parameter values proportional to marginalised posterior probability

  • Targets likelihood evaluations in regions
  • f space where likelihood is high
  • Several algorithms to choose from e.g.

Metropolis-Hastings or Hamiltonian

  • MaCh3 currently uses Metropolis-

Hastings and upgrading to Hamiltonian

11/12/2019

  • P. Dunne

10

slide-11
SLIDE 11

16/12/2019

MCMC vs grid search

  • Output of MCMC is a large number of

‘steps’

  • Each step is a vector of the values of all

parameter for the step

  • Creating 1D/2D histograms of any

combinations of parameters gives the posterior probability for those parameters

11/12/2019

  • P. Dunne

11

slide-12
SLIDE 12

(rad.)

CP

d

3

  • 2
  • 1
  • 1

2 3

Posterior probability density

4

  • 10

3

  • 10

2

  • 10

1

  • 10

1

Credible Interval s 3 Credible Interval s 2 Credible Interval s 1

T2K Run 1-9d preliminary

13

q

2

sin

0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

CP

d

3

  • 2
  • 1
  • 1

2 3

68% credible interval 90% credible interval MaCh3 best fit

T2K Run 1-9d preliminary

Appearance parameter constraints

11/12/2019

  • P. Dunne

12

  • Make contours by taking bins with most steps (i.e. highest prob) until you

have 68/95/99.73…% of the probability inside your contour

  • Don’t get a multidimensional best fit, but do get highest prob bin per par
slide-13
SLIDE 13

Other variable combinations

  • Markov Chain samples all variables simultaneously
  • Can compare other combinations with no extra computing

11/12/2019

  • P. Dunne

13

Data T2K+Reactor

slide-14
SLIDE 14

23

q

2

sin

0.4 0.5 0.6 0.7 0.8 0.9 1

CP

d

3

  • 2
  • 1
  • 1

2 3

credible interval s 1 credible interval s 2 credible interval s 3 MaCh3 best fit

T2K Run 1-9d preliminary

Other variable combinations

  • Markov Chain samples all variables simultaneously
  • Can compare other combinations with no extra computing

16/12/2019

  • P. Dunne

14

Asimov B T2K+Reactor Data T2K+Reactor

slide-15
SLIDE 15

Mass Hierarchy results

  • On T2K for each step we assign a

50% probability that the proposal will be in the other hierarchy

  • With enough data this 50% is
  • vercome by data preference
  • Preference for each hierarchy is given

by fraction of steps that lie in each

  • Both hierarchies run in a single fit
  • Also don’t have to choose octant

11/12/2019

  • P. Dunne

15

)

2

(eV

32 2

m D

0.003

  • 0.002
  • 0.001
  • 0.001

0.002 0.003

Posterior probability density

4

  • 10

3

  • 10

2

  • 10

1

  • 10

1 10

2

10

3

10

Credible Interval s 3 Credible Interval s 2 Credible Interval s 1

T2K Run 1-9d preliminary

slide-16
SLIDE 16

Priors

  • Bayesian analysis requires a choice of prior (quite a few frequentist
  • nes do too)
  • As long as prior doesn’t strongly favour region with no steps you can

reweight existing chain to change prior

11/12/2019

  • P. Dunne

16

)

CP

d sin(

1

  • 0.8
  • 0.6
  • 0.4
  • 0.2
  • 0.2

0.4 0.6 0.8 1

Posterior probability density

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

99.7% Credible Interval 95.5% Credible Interval 90% Credible Interval 68% Credible Interval

T2K Run 1-9d preliminary

slide-17
SLIDE 17

Code Structure

  • Code is modular, can add any

number of samples/parameters to the fit (defined in executable)

  • Sample spectrum generator:
  • Code that gives expected distribution
  • f a sample as a function of

parameters

  • Parameter tracker:
  • Calculates systematic penalty terms

and tells spectrum generators what parameter values are

  • We’d need to make DUNE sample

spectrum generators ~few thousand lines of code per detector

Patrick Dunne 17

MaCh3

Markov Chain Fitter Other Diagnostic Executables

Likelihood Calculator

Sample Spectrum Generators Parameter Value and Prior Penalty Tracker

slide-18
SLIDE 18

Computing time needed

  • Determined by three things:
  • Time to perform a likelihood

evaluation

  • Autocorrelation between parameters
  • Desired number of steps in excluded

region

  • Also chains take some time to start

properly sampling from the likelihood if started at a random point called “burn in”

11/12/2019

  • P. Dunne

18

slide-19
SLIDE 19

Computing time

  • Time per step for T2K is ~0.05s for ND and 0.5 for SK
  • We have ~750 parameters, 19 samples
  • Depends heavily on how LLH evaluation is implemented
  • Autocorrelation is the number of steps before you have an uncorrelated

sample from the likelihood

  • ~10,000 steps for T2K
  • Depends on number of parameters and tuning of step proposal function
  • Number of uncorrelated steps in excluded region
  • If you want to do a result at X% significance need statistical errors on number of

steps outside interval to be small enough

  • Total time for all MaCh3 fits for Nature 3 sigma results 30,000 CPU hours

11/12/2019

  • P. Dunne

19

slide-20
SLIDE 20

Scaling with parameters

  • Time to fit does increase

with number of parameters

  • Hamiltonian is

approximately linear so not too bad and Metropolis- Hastings not much worse

  • Both increase less quickly

than profile likelihood

12/12/2019

  • P. Dunne

20

slide-21
SLIDE 21

Summary

  • Markov Chains provide significant efficiency

improvements by targeting where throws are carried out (usually used for Bayesian results)

  • MaCh3 is a flexible analysis framework for

Bayesian oscillation analyses

  • All current MaCh3 institutes are on DUNE
  • Three most experienced developers are all DUNE

collaborators and several existing group members have expressed interest in DUNE-MaCh3

  • T2K has used it for inclusion of a high dimensional,

many sample ND likelihood

12/12/2019

  • P. Dunne

21

slide-22
SLIDE 22

Backup

22 11/12/2019

  • P. Dunne
slide-23
SLIDE 23

11/12/2019

  • P. Dunne

23

T2K +reactor constraint IH

slide-24
SLIDE 24

Dcp split by hierarchy- T2K+reactor

11/12/2019

  • P. Dunne

24

Inverted hierarchy Normal hierarchy

slide-25
SLIDE 25

T2K data only disappearance parameters

11/12/2019

  • P. Dunne

25

slide-26
SLIDE 26

Simulated data method

  • Check robustness of results to neutrino interaction

model by using our model to fit ``simulated data”

  • Simulated data are generated in two ways

1. `Data-driven’: Inflate one interaction mode to account for differences between current model prediction and existing data 2. Model choices: generate data using other models implemented in generator but not used in oscillation analysis and refit

  • Fit simulated ND data, propagate constraint to SK
  • Fit SK simulated data using ND constrained xsec model
  • Compare fit to simulated data to nominal model Asimov
  • If getting the interaction model wrong leads to

significantly different constraints: further investigation

11/12/2019

  • P. Dunne

26