MaCh3 and Bayesian Analysis Patrick Dunne Outline Introduce T2K - PowerPoint PPT Presentation

MaCh3 and Bayesian Analysis Patrick Dunne

Outline • Introduce T2K method for analysis • How to interpret Bayesian results • Describe MaCh3 framework • How do you put a new detector in? • How long does it take to run? P. Dunne 11/12/2019 2

T2K oscillation parameter fit • Apply oscillation effects to Monte Carlo as a function of true E ν • Construct model to predict event rates and distributions at near and far detectors • Need to ensure experiment can constrain non-oscillation elements of model • Cross-section model highly dependent on nuclear effects • Do this by fitting rather than direct ND distribution to FD distribution extrapolation Cross-section Flux Model Model Near Detector Interaction rates Prediction Event Rate and Unoscillated Oscillation Distribution Monte Carlo Probability Model Far Detector Prediction Detector uncertainties Far Detector Near Detector Oscillation Model Model Parameters P. Dunne 11/12/2019 3

Reweighting • MaCh3 has full access to event-by-event kinematic information • Enables reweights with functional forms on any event variable • Also enables shifts of variables e.g. Removal energy can actually change event momentum and put in a different bin • We also have standard bin by bin (separated by mode, flavor) response functions and linear normalisations implemented P. Dunne 12/12/2019 4

Fitting to data Hadron Production Data ND280 Detector Near detector fit Model Flux Model INGRID/Beam Monitor Data Cross-section ND280 Data Model External Cross- section Data Far detector fit Super-K Data Samples Two approaches used by T2K for fitting: Super-K Detector 1. Use ND data fit to constrain flux and cross-section Model models first then fit far detector • Computationally easier Oscillation Super-K Makes more assumptions • Parameters Atmospheric 2. Perform simultaneous fit of both detectors Data Computationally more demanding • Makes fewer assumptions • P. Dunne 11/12/2019 5

Fitting to data Hadron Production Data ND280 Detector Model Flux Model INGRID/Beam Monitor Data Simultaneous fit Cross-section ND280 Data Model External Cross- section Data Super-K Data Samples Two approaches used for fitting: Super-K Detector 1. Use ND data fit to constrain flux and cross-section models first Model then fit far detector • Computationally easier Oscillation Super-K Makes more assumptions • Parameters Atmospheric 2. Perform simultaneous fit of both detectors (MaCh3 does this) Data Computationally more demanding • Makes fewer assumptions • P. Dunne 11/12/2019 6

T2K analyses • T2K has three separate analysis frameworks: two fit near detector first and propagate, one does joint fit • Joint fit analysis is Bayesian, one of separate fitters is frequentist and the other is a mix • All three able to construct frequentist confidence intervals for comparisons • Very good agreement is seen, cross-validation highly useful for debugging Bayesian analysis shows posterior probability density Frequentist analyses show Δχ 2 (high values mean more likely this is the “correct” (low values mean better agreement with the data for parameter value) this parameter value) P. Dunne 11/12/2019 7

Dealing with nuisance parameters • Likelihood has >750 parameters but want plots in ≤2 of them at once • Two main options: • Profiling: Pick values of nuisance params that maximise likelihood for each set of values of parameters of interest • Marginalisation: Integrate over nuisance parameters (Bayesian so MaCh3 does this) • Also finding maximum likelihood point for given osc par values is hard in 750 dimensions 11/12/2019 P. Dunne 8

MCMC vs grid search • Other T2K analyses use random throws of nuisance parameters from covariance matrices to marginalise • Then do a grid search in 1D/2D calculating average Δ 𝜓 2 across ensemble of marginalisation throws • Use Feldman-Cousins to find critical Δ 𝜓 2 values for δ CP P. Dunne 16/12/2019 11/12/2019

MCMC vs grid search • MaCh3 samples likelihood space with Markov Chain MC • Rule for stepping in parameter space ensures distribution of parameter values proportional to marginalised posterior probability • Targets likelihood evaluations in regions of space where likelihood is high • Several algorithms to choose from e.g. Metropolis-Hastings or Hamiltonian • MaCh3 currently uses Metropolis- Hastings and upgrading to Hamiltonian P. Dunne 16/12/2019 11/12/2019 10

MCMC vs grid search • Output of MCMC is a large number of ‘steps’ • Each step is a vector of the values of all parameter for the step • Creating 1D/2D histograms of any combinations of parameters gives the posterior probability for those parameters P. Dunne 16/12/2019 11/12/2019 11

Appearance parameter constraints Make contours by taking bins with most steps (i.e. highest prob) until you • have 68/95/99.73…% of the probability inside your contour Don’t get a multidimensional best fit, but do get highest prob bin per par • 1 Posterior probability density CP 3 68% credible interval s d 3 Credible Interval 90% credible interval s 2 Credible Interval 2 s - 1 Credible Interval 1 MaCh3 best fit 10 T2K Run 1-9d preliminary 1 T2K Run 1-9d preliminary - 2 10 0 - 1 - 3 10 - 2 - 3 - 4 10 - - - 3 2 1 0 1 2 3 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 P. Dunne d 11/12/2019 12 q 2 (rad.) sin CP 13

Other variable combinations • Markov Chain samples all variables simultaneously • Can compare other combinations with no extra computing Data T2K+Reactor P. Dunne 11/12/2019 13

Other variable combinations • Markov Chain samples all variables simultaneously • Can compare other combinations with no extra computing Asimov B T2K+Reactor Data T2K+Reactor 3 CP s 1 credible interval d s 2 credible interval s 2 3 credible interval MaCh3 best fit 1 T2K Run 1-9d preliminary 0 - 1 - 2 - 3 0.4 0.5 0.6 0.7 0.8 0.9 1 q 2 sin P. Dunne 16/12/2019 14 23

Mass Hierarchy results Posterior probability density 3 s 10 3 Credible Interval s 2 2 Credible Interval 10 s 1 Credible Interval 10 • On T2K for each step we assign a T2K Run 1-9d preliminary 1 50% probability that the proposal - 1 10 - 2 will be in the other hierarchy 10 - 3 10 • With enough data this 50% is - 4 10 - - - overcome by data preference 0.003 0.002 0.001 0 0.001 0.002 0.003 D 2 2 m (eV ) • Preference for each hierarchy is given 32 by fraction of steps that lie in each • Both hierarchies run in a single fit • Also don’t have to choose octant P. Dunne 11/12/2019 15

Priors • Bayesian analysis requires a choice of prior (quite a few frequentist ones do too) • As long as prior doesn’t strongly favour region with no steps you can reweight existing chain to change prior 2 Posterior probability density 99.7% Credible Interval 1.8 95.5% Credible Interval 1.6 90% Credible Interval 68% Credible Interval 1.4 T2K Run 1-9d preliminary 1.2 1 0.8 0.6 0.4 0.2 0 - - - - - 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 P. Dunne 11/12/2019 16 d sin( ) CP

Code Structure • Code is modular, can add any MaCh3 number of samples/parameters to the fit (defined in executable) Markov Chain Other Diagnostic • Sample spectrum generator: Fitter Executables • Code that gives expected distribution of a sample as a function of parameters • Parameter tracker: Likelihood Calculator • Calculates systematic penalty terms and tells spectrum generators what parameter values are Parameter Value Sample Spectrum • We’d need to make DUNE sample and Prior Penalty Generators spectrum generators ~few Tracker thousand lines of code per detector Patrick Dunne 17

Computing time needed • Determined by three things: • Time to perform a likelihood evaluation • Autocorrelation between parameters • Desired number of steps in excluded region • Also chains take some time to start properly sampling from the likelihood if started at a random point called “burn in” P. Dunne 11/12/2019 18

Computing time • Time per step for T2K is ~0.05s for ND and 0.5 for SK • We have ~750 parameters, 19 samples • Depends heavily on how LLH evaluation is implemented • Autocorrelation is the number of steps before you have an uncorrelated sample from the likelihood • ~10,000 steps for T2K • Depends on number of parameters and tuning of step proposal function • Number of uncorrelated steps in excluded region • If you want to do a result at X% significance need statistical errors on number of steps outside interval to be small enough • Total time for all MaCh3 fits for Nature 3 sigma results 30,000 CPU hours P. Dunne 11/12/2019 19

Scaling with parameters • Time to fit does increase with number of parameters • Hamiltonian is approximately linear so not too bad and Metropolis- Hastings not much worse • Both increase less quickly than profile likelihood P. Dunne 12/12/2019 20

MaCh3 and Bayesian Analysis Patrick Dunne Outline Introduce T2K - PowerPoint PPT Presentation

MaCh3 and Bayesian Analysis Patrick Dunne Outline Introduce T2K method for analysis How to interpret Bayesian results Describe MaCh3 framework How do you put a new detector in? How long does it take to run? P. Dunne

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

Introduction to Bayesian Inference Frank Wood April 6, 2010 Introduction Overview of Topics

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Bayesian Phylogenetics Mark Holder (with big thanks to Paul Lewis) Outline Intro What is

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Part 3 Robust Bayesian statistics & applications in reliability networks by Gero Walter 69

Analytics, Inference and Computation in Cosmology: Exercises on Bayesian Inference Roberto

Bayesian inference and mathematical imaging. Part I: Bayesian analysis and decision theory. Dr.

Lecture 6: Examples of Bayesian Networks and Markov Networks Zhenke Wu Department of

Welcome to the co u rse ! FU N DAME N TAL S OF BAYE SIAN DATA AN ALYSIS IN R Rasm u s Bth

Bayesian Inference and Traffic Analysis Carmela Troncoso George Danezis September-November

Introduction to Bayesian Statistics Lecture 4: Multiparameter models (I) Rung-Ching Tsai

Bayesian Fitting Probabilistic Morphable Models Summer School, June 2017 Sandro Schnborn

MaCh3 and Bayesian Analysis Patrick Dunne Outline Introduce T2K - PowerPoint PPT Presentation

MaCh3 and Bayesian Analysis Patrick Dunne Outline Introduce T2K method for analysis How to interpret Bayesian results Describe MaCh3 framework How do you put a new detector in? How long does it take to run? P. Dunne

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Case Study: Bayesian Linear Regression and Sparse Bayesian Models Piyush Rai Dept. of CSE, IIT

Introduction to Bayesian Inference Frank Wood April 6, 2010 Introduction Overview of Topics

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

A simple Bayesian regression model Alicia Johnson Associate Professor, Macalester College

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

Bayesian Phylogenetics Mark Holder (with big thanks to Paul Lewis) Outline Intro What is

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Part 3 Robust Bayesian statistics &amp; applications in reliability networks by Gero Walter 69

Analytics, Inference and Computation in Cosmology: Exercises on Bayesian Inference Roberto

Bayesian inference and mathematical imaging. Part I: Bayesian analysis and decision theory. Dr.

Lecture 6: Examples of Bayesian Networks and Markov Networks Zhenke Wu Department of

Welcome to the co u rse ! FU N DAME N TAL S OF BAYE SIAN DATA AN ALYSIS IN R Rasm u s Bth

Bayesian Inference and Traffic Analysis Carmela Troncoso George Danezis September-November

Introduction to Bayesian Statistics Lecture 4: Multiparameter models (I) Rung-Ching Tsai

Bayesian Fitting Probabilistic Morphable Models Summer School, June 2017 Sandro Schnborn

Part 3 Robust Bayesian statistics & applications in reliability networks by Gero Walter 69