87944 - STATISTICAL DATA ANALYSIS FOR NUCLEAR AND SUBNUCLEAR PHYSICS - PowerPoint PPT Presentation

87944 - STATISTICAL DATA ANALYSIS FOR NUCLEAR AND SUBNUCLEAR PHYSICS [Module 3] Gabriele Sirri Istituto Nazionale di Fisica Nucleare 1

RECAP • We talk about – Data Modelling (a scientific narrative…) – Conceptual building blocks for modeling in High En. Phys. •  Documentation: – K. Cranmer, «Practical Statistic for the LHC», https://arxiv.org/abs/1503.07622 • “Introduction” • “Conceptual blocks for modeling” • Topics discussed during past examinations (just examples and not exhaustive): – Terminology: Ensamble, Experiment, Channel, Observable, Parameters of Interest, Nuisance, Distribution, ... – The Marked Poisson Model – Extended Likelihood, .. 2

RECAP • We talk about – Modeling with RooFit – RooFit Workspace •  Documentation: http://root.cern.ch/drupal/content/roofit – Quick start guide (20 pages) (Includes Workspace & Factory – Users Manual – RooFit macros • root.cern.ch → documentation → tutorials → roofit • There are over 80 macros illustrating many aspects of RooFit functionality • Topics discussed during past examinations (just examples and not exhaustive): – difference between variables, observables, parameters – difference between binned and unbinned datasets, … – sometimes we asked something like which Roofit class do you use for variables, observables, parameters, datasets 3

Tip of the week http://www.isocpp.org Teach yourself Modern C++ • Stack-based scope instead of heap or static global scope. • Auto type inference instead of explicit type names. • Smart pointers instead of raw pointers. • std::string instead of raw char[] arrays. • Standard template library (STL) containers like vector, list, and map instead of raw arrays or custom containers. See <vector>, <list>, and <map>. • STL algorithms instead of manually coded ones. • Inline lambda functions instead of small functions implemented separately. • Range-based for loops to write more robust loops that work with arrays, STL containers in the form for ( for-range-declaration : expression ). • Exceptions, to report and handle error conditions. • Lock-free inter-thread communication using STL std::atomic<> (see <atomic>) instead of other inter-thread communication mechanisms. 4

Further about C++ Programming • Force yourself coding with a style: ex: Google C++ Style Guide https://google.github.io/styleguide/cppguide.html – install clang-format or astyle • Set up a repository. Always. Also for small project. – – Simple, powerful Git GUI: SourceTree • boost C++ libraries: http://www.boost.org/ – Lot of work already done for you !!! 5

 RECAP… Tutorial [fit gaus model to unbinned dataset] Edit the macro roofit_empty.cpp and, following the comments inside, create a Gaussian p.d.f. with mean = 0, and sigma = 1. Change the sigma to 3. Visualize the p.d.f. . Generate an unbinned dataset of 10000 events. Make a Fit with Maximum Likelihood. Visualize the results. Tips: Use information from the slides shown during the lecture or from RooFit Manual at par 2 (Submit ex11.cxx and the image of the canvas in png format) Laboratorio Analisi Statistica dei Dati per HEP - G.Sirri 6

ROOT RooFit

 ROOT vs RooFit (2) type of dataset ROOT fit after booking and filling an histogram ( binned data ). ROOT ROOFIT fit using the whole dataset ( unbinned data ) . Fit method (default) ROOT Least Squares, 𝑦 2 ROOFIT maximum likelihood, − log ℒ can be done also with a binned dataset Error bars ROOT by default gaussian approximation √𝑜 ROOFIT ROOFIT exact limits based on Poisson distribution (asymmetric bars when 𝑜 is small !!) Axis Label left to user  …. ROOT ROOFIT X taken from the variable definition Y is automatically: ex Events / BIN SIZE !!!! Laboratorio Analisi Statistica dei Dati per HEP - G.Sirri 8

void rooofit_ex1 () ROOT ROOFIT void recap_root() { { // Setup model // define a gaussian p.d.f (mean 1 and sigma 1) // ============ // gaus(0) is a substitute for [0]*exp(-0.5*((x-[1])/[2])**2) // Declare variables x,mean,sigma with associated name, title, initial // and (0) means start numbering parameters at 0. value and allowed range auto g1 = new TF1{"gaus_1", "gaus(0)", -10, 10 }; RooRealVar x( "x" , "x" ,-10,10) ; double mean = 1 ; RooRealVar mean( "mean" , "mean of gaussian" ,1,-10,10) ; double sigma = 1 ; RooRealVar sigma( "sigma" , "width of gaussian" ,1,0.1,10) ; g1->SetParameter( 0, 1./( sigma * sqrt(2* 3.1415926 ) ) ); // Build gaussian p.d.f in terms of x,mean and sigma g1->SetParameter( 1, mean ) ; // set mean RooGaussian gauss( "gauss" , "gaussian PDF" ,x,mean,sigma) ; g1->SetParameter( 2, sigma) ; // set sigma // Construct plot frame in 'x' g1->SetLineColor( kBlue ) ; RooPlot* xframe = x.frame(Title( "Gaussian p.d.f." )) ; // change sigma to 3 // Plot model and change parameter values TF1* g2 = new TF1{"gaus_2", "gaus(0)", -10, 10 }; // ======================================= mean = 1 ; // Plot gauss in frame (i.e. in x) sigma = 3 ; gauss.plotOn(xframe) ; g2->SetParameter( 0, 1./( sigma * sqrt(2* 3.1415926 ) ) ); // Change the value of sigma to 3 g2->SetParameter( 1, mean ) ; // set mean sigma.setVal(3) ; g2->SetParameter( 2, sigma) ; // set sigma // Plot gauss in frame (i.e. in x) and draw frame on canvas g2->SetLineColor( kRed ) ; gauss.plotOn(xframe,LineColor(kRed)) ; // Book histograms // Generate events auto h_data = // =============== new TH1D{"h_data", "gaussian random numbers", 100, -10, 10}; // Generate a dataset of 1000 events in x from gauss RooDataSet* data = gauss.generate(x,1000) ; // Create a TRandom3 object to generate random numbers // Make a second plot frame in x and draw both the auto seed = 12345; // data and the p.d.f in the frame TRandom3 ran{seed}; RooPlot* xframe2 = x.frame(Title( "Gaussian p.d.f. with data" )) ; data->plotOn(xframe2) ; // Generate some random numbers and fill histograms gauss.plotOn(xframe2) ; auto const nvalues = 1000; // Fit model to data for (int i=0; i<nvalues; i++){ // ================== auto x = ran->Gaus(1,3); // gaussian in mean = 1 sigma = 3 // Fit pdf to data h_data->Fill(x); gauss.fitTo(*data) ; } // Print values of mean and sigma (that now reflect fitted values and errors) h_data->SetXTitle("x"); mean.Print() ; h_data->SetYTitle("f(x)"); sigma.Print() ; h_data->SetMarkerStyle(20); // Draw all frames on a canvas h_data->Fit("gaus") ; TCanvas* c = new TCanvas( "rf101_basics" , "rf101_basics" ,800,400) ; c->Divide(2) ; // Draw the canvas c->cd(1) ; c = new TCanvas("c1","c1",800,400) ; gPad->SetLeftMargin(0.15) ; c->Divide(2,1) ; xframe->GetYaxis()->SetTitleOffset(1.6) ; c->cd(1); xframe->Draw() ; g1->Draw() ; c->cd(2) ; g2->Draw("SAME") ; gPad->SetLeftMargin(0.15) ; c->cd(2); xframe2->GetYaxis()->SetTitleOffset(1.6) ; h_data->Draw("E1"); xframe2->Draw() ; } }

RECAP • We talk about – Composite Models – Common Fitting Errors: • Understanding MINUIT output • Instabilities and correlation coefficients •  Documentation: http://root.cern.ch/drupal/content/roofit – Composite Models • RooFit User Manual pp17-28 – Common Fitting Errors • Slides • F. James, Interpretation of the errors on parameters as given by MINUIT http://lmu.web.psi.ch/docu/manuals/software_manuals/minuit2/mnerror.pdf • Topics discussed during past examinations (just examples and not exhaustive): – The p.d.f. of a generic extended (signal+background) composite model – The 4 MINUIT algorithms – The difference MINOS vs MIGRAD/HESSE – The theory behind the error computation by HESSE – MINOS and Profile Likelihood Interval – How to recognize/Which strategy to mitigate fit stabilities problems 10

Tip of the week http://root.cern ROOT has a modern tool to manipulate and analyze datasets RDataFrame offers a high level interface for analyses of data stored in TTrees, CSV files and other data formats. In addition, multi-threading and other low-level optimisations allow users to exploit all the resources available on their machines transparently. In a nutshell: ROOT::EnableImplicitMT(); // Tell ROOT you want to go parallel ROOT::RDataFrame d("myTree", "file_*.root"); // Interface to TTree and CSV auto d2 = d.Filter( “x>0" ) // only accept events for which x>0 .Define(“r2”,”x* x+y *y”) ; // define a custom variable r2 auto myHisto = d2.Histo1D( “r2" ); // This happens in parallel! myHisto->Draw(); Lazy execution guarantees that all operations are performed in one event loop less typing, better expressivity, … in place of explicity for-loops on TTree events (SetBranchAddress(..), GetEntry (..), …) Good for histograms, profiles, data reductions, user-defined operations on events. 11

87944 - STATISTICAL DATA ANALYSIS FOR NUCLEAR AND SUBNUCLEAR PHYSICS - PowerPoint PPT Presentation

87944 - STATISTICAL DATA ANALYSIS FOR NUCLEAR AND SUBNUCLEAR PHYSICS [Module 3] Gabriele Sirri Istituto Nazionale di Fisica Nucleare 1 RECAP We talk about Data Modelling (a scientific narrative) Conceptual building blocks for

Data and Analysis Part V Statistical Analysis of Data Alex Simpson Part V: Statistical Analysis

NUCLEAR AND SUBNUCLEAR PHYSICS [Module 3] Gabriele Sirri Istituto Nazionale di Fisica Nucleare

Nuclear Magnetic Nuclear Magnetic Nuclear Magnetic Nuclear Magnetic Resonance of Proteins

Chapter 17 Chapter 17 Nuclear Chemistry uclear Chemistry Nuclear Decay Nuclear

Data and Analysis Note 12 Statistical Analysis of Data I Alex Simpson Note 12 Statistical

NEEDS AND APPLICATION OF COVARIANCE DATA Go CHIBA Hokkaido university 1 Evaluated Nuclear Data

Nuclear Imaging Medical Imaging Medical Imaging Nuclear Imaging Nuclear Imaging Nuclear

Pakistan Nuclear Regulatory Authority 1 Contents Nuclear Regulatory Framework Nuclear

Nuclear Leadership World Nuclear University Summer Radiation Short Course Nuclear Nuclear

NUCLEAR WASTE IN FRANCE Nuclear Waste Disposal Storage of nuclear waste in France Two

Nuclear Weapons 101 Nuclear Smuggling p. Nuclear Weapons 101 Fissile materials ( 235 U , 233

Nuclear Weapons Nuclear Weapons They produce incredible temperatures They produce incredible

Nuclear data required for measurements of reactivity and nuclear material composition Nuclear

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

MPI fr Physik, Mnchen 2015 Nuclear Ground State Properties and their Importance for Nuclear

STA 214: Probability & Statistical Models STA 214: Analysis of Statistical Models

Four-dimensional homogeneous Lorentzian manifolds Giovanni Calvaruso 1 1 Universit del Salento,

Cleantech&Building 2017 Sharing experiences to shape the future Opening Seminar November

Russells Paradox and free zig zag solutions Ludovica Conti FINO- Northwestern Philosophy

Small footprint inspection techniques for Android Damien Cauquil, Pierre Jaury 29C3 December 29,

Introduction to the analysis of Cherenkov Telescope data From raw data to shower images Marcos

Virtualizzazione Corso di Sistemi Distribuiti e Cloud Computing A.A. 2019/20 Valeria Cardellini

Lo Strumento PMI ed il Fast Track to Innovation Bologna, 25 febbraio 2015 Struttura H2020

PROGRAMMAZIONE PROCEDURALE A.A. 2020/2021 TYPE CONVERSIONS CONVERSIONS Operands of different

87944 - STATISTICAL DATA ANALYSIS FOR NUCLEAR AND SUBNUCLEAR PHYSICS - PowerPoint PPT Presentation

87944 - STATISTICAL DATA ANALYSIS FOR NUCLEAR AND SUBNUCLEAR PHYSICS [Module 3] Gabriele Sirri Istituto Nazionale di Fisica Nucleare 1 RECAP We talk about Data Modelling (a scientific narrative) Conceptual building blocks for

Data and Analysis Part V Statistical Analysis of Data Alex Simpson Part V: Statistical Analysis

NUCLEAR AND SUBNUCLEAR PHYSICS [Module 3] Gabriele Sirri Istituto Nazionale di Fisica Nucleare

Nuclear Magnetic Nuclear Magnetic Nuclear Magnetic Nuclear Magnetic Resonance of Proteins

Chapter 17 Chapter 17 Nuclear Chemistry uclear Chemistry Nuclear Decay Nuclear

Data and Analysis Note 12 Statistical Analysis of Data I Alex Simpson Note 12 Statistical

NEEDS AND APPLICATION OF COVARIANCE DATA Go CHIBA Hokkaido university 1 Evaluated Nuclear Data

Nuclear Imaging Medical Imaging Medical Imaging Nuclear Imaging Nuclear Imaging Nuclear

Pakistan Nuclear Regulatory Authority 1 Contents Nuclear Regulatory Framework Nuclear

Nuclear Leadership World Nuclear University Summer Radiation Short Course Nuclear Nuclear

NUCLEAR WASTE IN FRANCE Nuclear Waste Disposal Storage of nuclear waste in France Two

Nuclear Weapons 101 Nuclear Smuggling p. Nuclear Weapons 101 Fissile materials ( 235 U , 233

Nuclear Weapons Nuclear Weapons They produce incredible temperatures They produce incredible

Nuclear data required for measurements of reactivity and nuclear material composition Nuclear

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

MPI fr Physik, Mnchen 2015 Nuclear Ground State Properties and their Importance for Nuclear

STA 214: Probability &amp; Statistical Models STA 214: Analysis of Statistical Models

Four-dimensional homogeneous Lorentzian manifolds Giovanni Calvaruso 1 1 Universit del Salento,

Cleantech&amp;Building 2017 Sharing experiences to shape the future Opening Seminar November

Russells Paradox and free zig zag solutions Ludovica Conti FINO- Northwestern Philosophy

Small footprint inspection techniques for Android Damien Cauquil, Pierre Jaury 29C3 December 29,

Introduction to the analysis of Cherenkov Telescope data From raw data to shower images Marcos

Virtualizzazione Corso di Sistemi Distribuiti e Cloud Computing A.A. 2019/20 Valeria Cardellini

Lo Strumento PMI ed il Fast Track to Innovation Bologna, 25 febbraio 2015 Struttura H2020

PROGRAMMAZIONE PROCEDURALE A.A. 2020/2021 TYPE CONVERSIONS CONVERSIONS Operands of different

STA 214: Probability & Statistical Models STA 214: Analysis of Statistical Models

Cleantech&Building 2017 Sharing experiences to shape the future Opening Seminar November