87944 statistical data analysis for nuclear and
play

87944 - STATISTICAL DATA ANALYSIS FOR NUCLEAR AND SUBNUCLEAR PHYSICS - PowerPoint PPT Presentation

87944 - STATISTICAL DATA ANALYSIS FOR NUCLEAR AND SUBNUCLEAR PHYSICS [Module 3] Gabriele Sirri Istituto Nazionale di Fisica Nucleare 1 RECAP We talk about Data Modelling (a scientific narrative) Conceptual building blocks for


  1. 87944 - STATISTICAL DATA ANALYSIS FOR NUCLEAR AND SUBNUCLEAR PHYSICS [Module 3] Gabriele Sirri Istituto Nazionale di Fisica Nucleare 1

  2. RECAP • We talk about – Data Modelling (a scientific narrative…) – Conceptual building blocks for modeling in High En. Phys. •  Documentation: – K. Cranmer, «Practical Statistic for the LHC», https://arxiv.org/abs/1503.07622 • “Introduction” • “Conceptual blocks for modeling” • Topics discussed during past examinations (just examples and not exhaustive): – Terminology: Ensamble, Experiment, Channel, Observable, Parameters of Interest, Nuisance, Distribution, ... – The Marked Poisson Model – Extended Likelihood, .. 2

  3. RECAP • We talk about – Modeling with RooFit – RooFit Workspace •  Documentation: http://root.cern.ch/drupal/content/roofit – Quick start guide (20 pages) (Includes Workspace & Factory – Users Manual – RooFit macros • root.cern.ch → documentation → tutorials → roofit • There are over 80 macros illustrating many aspects of RooFit functionality • Topics discussed during past examinations (just examples and not exhaustive): – difference between variables, observables, parameters – difference between binned and unbinned datasets, … – sometimes we asked something like which Roofit class do you use for variables, observables, parameters, datasets 3

  4. Tip of the week http://www.isocpp.org Teach yourself Modern C++ • Stack-based scope instead of heap or static global scope. • Auto type inference instead of explicit type names. • Smart pointers instead of raw pointers. • std::string instead of raw char[] arrays. • Standard template library (STL) containers like vector, list, and map instead of raw arrays or custom containers. See <vector>, <list>, and <map>. • STL algorithms instead of manually coded ones. • Inline lambda functions instead of small functions implemented separately. • Range-based for loops to write more robust loops that work with arrays, STL containers in the form for ( for-range-declaration : expression ). • Exceptions, to report and handle error conditions. • Lock-free inter-thread communication using STL std::atomic<> (see <atomic>) instead of other inter-thread communication mechanisms. 4

  5. Further about C++ Programming • Force yourself coding with a style: ex: Google C++ Style Guide https://google.github.io/styleguide/cppguide.html – install clang-format or astyle • Set up a repository. Always. Also for small project. – – Simple, powerful Git GUI: SourceTree • boost C++ libraries: http://www.boost.org/ – Lot of work already done for you !!! 5

  6.  RECAP… Tutorial [fit gaus model to unbinned dataset] Edit the macro roofit_empty.cpp and, following the comments inside, create a Gaussian p.d.f. with mean = 0, and sigma = 1. Change the sigma to 3. Visualize the p.d.f. . Generate an unbinned dataset of 10000 events. Make a Fit with Maximum Likelihood. Visualize the results. Tips: Use information from the slides shown during the lecture or from RooFit Manual at par 2 (Submit ex11.cxx and the image of the canvas in png format) Laboratorio Analisi Statistica dei Dati per HEP - G.Sirri 6

  7. ROOT RooFit

  8.  ROOT vs RooFit (2) type of dataset ROOT fit after booking and filling an histogram ( binned data ). ROOT ROOFIT fit using the whole dataset ( unbinned data ) . Fit method (default) ROOT Least Squares, 𝑦 2 ROOFIT maximum likelihood, − log ℒ can be done also with a binned dataset Error bars ROOT by default gaussian approximation √𝑜 ROOFIT ROOFIT exact limits based on Poisson distribution (asymmetric bars when 𝑜 is small !!) Axis Label left to user  …. ROOT ROOFIT X taken from the variable definition Y is automatically: ex Events / BIN SIZE !!!! Laboratorio Analisi Statistica dei Dati per HEP - G.Sirri 8

  9. void rooofit_ex1 () ROOT ROOFIT void recap_root() { { // Setup model // define a gaussian p.d.f (mean 1 and sigma 1) // ============ // gaus(0) is a substitute for [0]*exp(-0.5*((x-[1])/[2])**2) // Declare variables x,mean,sigma with associated name, title, initial // and (0) means start numbering parameters at 0. value and allowed range auto g1 = new TF1{"gaus_1", "gaus(0)", -10, 10 }; RooRealVar x( "x" , "x" ,-10,10) ; double mean = 1 ; RooRealVar mean( "mean" , "mean of gaussian" ,1,-10,10) ; double sigma = 1 ; RooRealVar sigma( "sigma" , "width of gaussian" ,1,0.1,10) ; g1->SetParameter( 0, 1./( sigma * sqrt(2* 3.1415926 ) ) ); // Build gaussian p.d.f in terms of x,mean and sigma g1->SetParameter( 1, mean ) ; // set mean RooGaussian gauss( "gauss" , "gaussian PDF" ,x,mean,sigma) ; g1->SetParameter( 2, sigma) ; // set sigma // Construct plot frame in 'x' g1->SetLineColor( kBlue ) ; RooPlot* xframe = x.frame(Title( "Gaussian p.d.f." )) ; // change sigma to 3 // Plot model and change parameter values TF1* g2 = new TF1{"gaus_2", "gaus(0)", -10, 10 }; // ======================================= mean = 1 ; // Plot gauss in frame (i.e. in x) sigma = 3 ; gauss.plotOn(xframe) ; g2->SetParameter( 0, 1./( sigma * sqrt(2* 3.1415926 ) ) ); // Change the value of sigma to 3 g2->SetParameter( 1, mean ) ; // set mean sigma.setVal(3) ; g2->SetParameter( 2, sigma) ; // set sigma // Plot gauss in frame (i.e. in x) and draw frame on canvas g2->SetLineColor( kRed ) ; gauss.plotOn(xframe,LineColor(kRed)) ; // Book histograms // Generate events auto h_data = // =============== new TH1D{"h_data", "gaussian random numbers", 100, -10, 10}; // Generate a dataset of 1000 events in x from gauss RooDataSet* data = gauss.generate(x,1000) ; // Create a TRandom3 object to generate random numbers // Make a second plot frame in x and draw both the auto seed = 12345; // data and the p.d.f in the frame TRandom3 ran{seed}; RooPlot* xframe2 = x.frame(Title( "Gaussian p.d.f. with data" )) ; data->plotOn(xframe2) ; // Generate some random numbers and fill histograms gauss.plotOn(xframe2) ; auto const nvalues = 1000; // Fit model to data for (int i=0; i<nvalues; i++){ // ================== auto x = ran->Gaus(1,3); // gaussian in mean = 1 sigma = 3 // Fit pdf to data h_data->Fill(x); gauss.fitTo(*data) ; } // Print values of mean and sigma (that now reflect fitted values and errors) h_data->SetXTitle("x"); mean.Print() ; h_data->SetYTitle("f(x)"); sigma.Print() ; h_data->SetMarkerStyle(20); // Draw all frames on a canvas h_data->Fit("gaus") ; TCanvas* c = new TCanvas( "rf101_basics" , "rf101_basics" ,800,400) ; c->Divide(2) ; // Draw the canvas c->cd(1) ; c = new TCanvas("c1","c1",800,400) ; gPad->SetLeftMargin(0.15) ; c->Divide(2,1) ; xframe->GetYaxis()->SetTitleOffset(1.6) ; c->cd(1); xframe->Draw() ; g1->Draw() ; c->cd(2) ; g2->Draw("SAME") ; gPad->SetLeftMargin(0.15) ; c->cd(2); xframe2->GetYaxis()->SetTitleOffset(1.6) ; h_data->Draw("E1"); xframe2->Draw() ; } }

  10. RECAP • We talk about – Composite Models – Common Fitting Errors: • Understanding MINUIT output • Instabilities and correlation coefficients •  Documentation: http://root.cern.ch/drupal/content/roofit – Composite Models • RooFit User Manual pp17-28 – Common Fitting Errors • Slides • F. James, Interpretation of the errors on parameters as given by MINUIT http://lmu.web.psi.ch/docu/manuals/software_manuals/minuit2/mnerror.pdf • Topics discussed during past examinations (just examples and not exhaustive): – The p.d.f. of a generic extended (signal+background) composite model – The 4 MINUIT algorithms – The difference MINOS vs MIGRAD/HESSE – The theory behind the error computation by HESSE – MINOS and Profile Likelihood Interval – How to recognize/Which strategy to mitigate fit stabilities problems 10

  11. Tip of the week http://root.cern ROOT has a modern tool to manipulate and analyze datasets RDataFrame offers a high level interface for analyses of data stored in TTrees, CSV files and other data formats. In addition, multi-threading and other low-level optimisations allow users to exploit all the resources available on their machines transparently. In a nutshell: ROOT::EnableImplicitMT(); // Tell ROOT you want to go parallel ROOT::RDataFrame d("myTree", "file_*.root"); // Interface to TTree and CSV auto d2 = d.Filter( “x>0" ) // only accept events for which x>0 .Define(“r2”,”x* x+y *y”) ; // define a custom variable r2 auto myHisto = d2.Histo1D( “r2" ); // This happens in parallel! myHisto->Draw(); Lazy execution guarantees that all operations are performed in one event loop less typing, better expressivity, … in place of explicity for-loops on TTree events (SetBranchAddress(..), GetEntry (..), …) Good for histograms, profiles, data reductions, user-defined operations on events. 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend