Measuring PDFs by QCD fitting Jon Pumplin PDF School (DESY 2023 - PDF document

Measuring PDFs by QCD fitting Jon Pumplin PDF School (DESY 20–23 October 2009) Hadrons interact at large momentum transfer (= short distance) through their quark and gluon constituents. Owing to the asymptotic freedom property of QCD, α s ( µ ) is small so most hard pp collisions at the LHC will be described by the interaction of a single quark or gluon from one of the protons with a single quark or gluon from the other. Hence the subject of this school: we study the PDFs f a ( x, µ ) which describe the “1-body” probability u , d , ¯ c , b , ¯ densities for a = u , ¯ d , s , ¯ s , c , ¯ b , (or γ ) with the spin structure and correlations integrated out. 1

The PDFs f a ( x, µ ) for each flavor a are functions of two variables: • x = light-cone momentum fraction • µ = QCD factorization scale ( ≈ 1 / distance), typically Q for DIS; E T or E T / 2 for inclusive jet production. However, the evolution in µ is computable at NLO or NNLO by the QCD renormalization group DGLAP equations. Hence the problem of determining the PDFs reduces to a problem of determining the x -dependence for each flavor at a chosen small scale µ 0 (e.g. ∼ 1 . 4 GeV). The PDFs can be extracted from experiment using the requirement that they must agree with a large body of data that are dependent on them. These PDFs are then available for use in predicting production rates and backgrounds for new measurements. 2

Two points of view The PDFs are a Necessary Evil — essential phenomenological tools to make perturbative calculations of signals and backgrounds at hadron colliders. It is of essential practical importance to measure the PDFS in order to make use of data from the Tevatron and LHC. Along with this comes the difficult task of assessing the uncertainty range of the answers obtained. The PDFs are a Fundamental Measurement — an opportunity to interplay with knowledge from the nonperturbative arenas of QCD, e.g., • Regge theory • Lightcone physics • Lattice gauge These connections have been too-much neglected in my opinion. Even the assumption of independent flavor distributions might be improved upon. 3

The QCD fitting programme (brief version 1. Parametrize the PDFs f a ( x, µ 0 ) at a small µ 0 by smooth functions with lots of free parameters. 2. Calculate f a ( x, µ ) at all µ > µ 0 by DGLAP. 3. Calculate χ 2 = � i [(data i − theory i ) / error i ] 2 to measure of the quality of fit to a large variety of experiments. 4. Obtain the best estimate of the true PDFs by varying the free parameters to minimize χ 2 . 4

Theoretical basis for PDF fitting • Factorization Theorem – Short distance and long distance are separable, and PDFs are “universal,” i.e., process independent. • Asymptotic Freedom – Hard scattering is weak at short distance, and hence perturbatively calculable. • DGLAP Evolution – Evolution in µ is perturbatively calculable, so the functions to be determined depend only on x . Factorization Theorem � A ( x, m Q, M A ( x, m a ( x, Q µ , M Q ) + O ((Λ F λ f a µ ) ⊗ ˆ F λ Q ) 2 ) Q ) = a H H Hard Scattering: ^ F a (Perturbatively = F A a � Calculable) A a f A A Experimental Parton Distributions: � Input Nonperturbative parametrization at Q 0 � � DGLAP Evolution to Q 5

The PDF fitting Paradigm 1. Parameterize x -dependence of each flavor at fixed µ 0 (= 1 . 4 GeV). Thus f a ( x, µ 0 ) depend on “shape parameters” A 1 , . . . , A N ( N ∼ 25 − 30). Example: current CTEQ gluon form √ x + a 4 x + a 5 x 2 ) x g ( x, µ 0 ) = a 0 x a 1 (1 − x ) a 2 exp( a 3 subject to number sum rule and momentum sum rule constraints. 2. Compute PDFs f a ( x, µ ) at µ > µ 0 by NLO or NNLO DGLAP. 3. Compute cross sections for DIS( e , µ , ν ), Drell-Yan, Inclusive Jets, W-production,. . . at NLO or NNLO. 4. Compute χ 2 measure of agreement between predictions and measurements: � � 2 � data i − theory i χ 2 = error i i with appropriate generalizations to include published correlated systematic errors in the experiments, and theoretical “penalties”. 6

PDF fitting Paradigm — continued 5. Minimize χ 2 with respect to the shape parameters { A i } to obtain Best Fit PDFs. 6. The PDF Uncertainty Range is assumed to be the region in { A i } space where χ 2 is sufficiently close to the minimum: χ 2 < χ 2 min + ∆ χ 2 . The proper choice for the “tolerance condition” ∆ χ 2 is a perennial hot topic for discussion. Some recent progress on it will be described later, and at PDF4LHC. Using the Hessian Method, the uncertainty range can be represented by 2 N alternative PDF sets which are obtained by displacements from the minimum point in { A i } space along each of the directions that are defined by the eigenvectors of the Hessian matrix, where the size of each displacement is determined by ∆ χ 2 . 7

PDF fitting Paradigm — continued 7. When large values of ∆ χ 2 are assumed, additional conditions are imposed by adding weights or penalties to χ 2 (CTEQ) or adjusting the lengths eigenvector displacements (MSTW) to force acceptable fits to each of the data sets over the entire uncertainty range. 8. The Best Fit, and the Uncertainty Eigenvector Sets which map out the uncertainty range, are made available for applications at http://projects.hepforge.org/lhapdf/ 9. Predicted central value for a cross section of interest is obtained by calculating it using the Best Fit. The uncertainty range of the prediction is obtained by the combining predictions made using the uncertainty sets in quadrature. 8

Handling systematic errors The simplest definition  D i = data  N � ( D i − T i ) 2 χ 2 T i = theory 0 = σ 2  σ i = “expt. error” i i =1 is optimal for random Gaussian errors: P ( r ) = e − r 2 / 2 D i = T i + σ i r i with √ . 2 π With systematic errors, K � D i = T i ( A ) + α i r stat ,i + r k β ki . k =1 The fitting parameters are A = { A λ } (theoretical model) and { r k } (corrections for systematic errors). Published experimental errors: • α i is the ‘standard deviation’ of the random uncorrelated error. • β ki is the ‘standard deviation’ of the k th (completely correlated!) systematic error on D i . 9

To take into account the systematic errors, we define � � 2 D i − � N � � k r k β ki − T i χ ′ 2 ( A , r k ) = r 2 + k , α 2 i i =1 k and minimize with respect to { r k } . The result is � � a − 1 � r k = kk ′ b k ′ , (systematic shift) � k ′ where N � β ki β k ′ i δ kk ′ + a kk ′ = α 2 i i =1 N � β ki ( D i − T i ) = b k . α 2 i i =1 The � r k ’s depend on the PDF model parameters A . We can solve for them explicitly since the dependence is quadratic. We then minimize the remaining χ 2 ( A ) with respect to the model parameters A = { A λ } . • { a λ } determine f i ( x, Q 2 0 ). • { � r k } are are the optimal “corrections” for systematic errors; i.e., systematic shifts to be applied to the data points to bring the data from different experiments into compatibility, within the framework of the theoretical model. • A similar treatment could be used for parametrized systematic errors in the theory — e.g. from scale choices. 10

Kinematic region of ep and µp data ep → eX (H1 = ∆, ZEUS = ∇ ) µp → µX (BCDMS= box, NMC = ◦ ) Drell-Yan data, neutrino DIS, and Tevatron W and Z data are also very important for differentiating among different flavors. Tevatron inclusive jet data are very important for constraining the gluon distribution. HERA II (not yet included in CTEQ fits), more Tevatron run II, and eventually the LHC will dramatically extend the range and accuracy. 11

Kinematic Map for LHC 9 10 x 1,2 = (M/14 TeV) exp( ± y) 8 10 M = 10 TeV Q = M 7 10 6 M = 1 TeV 10 5 10 2 ) 2 (GeV M = 100 GeV 4 10 Q 3 10 y = 6 4 2 0 2 4 6 2 10 M = 10 GeV fixed HERA 1 10 target 0 10 -7 -6 -5 -4 -3 -2 -1 0 10 10 10 10 10 10 10 10 x LHC will explore new territory in x and µ (= Q ). DGLAP evolution at large µ should be very reliable, so the PDFs needed to calculate the production of new heavy objects are in pretty good shape. Significant new territory will come into play at small x when forward Z 0 or lower-mass ℓ + ℓ − pairs are measured. Large x is important: the difference between central collisions at x = 0 . 20 vs. x = 0 . 28 is the same as the difference between running LHC at √ s = 7 vs. √ s = 14 TeV! 12

At the same time, one of the delights at the LHC is that it will the allow the study of PDFs at very small x — where interesting effects like BFKL are predicted — since the large s allows x 1 or x 2 to be very small while M is large enough for a perturbative calculation to be reliable, in accord with s = x 1 x 2 M 2 . 13

Evolutionary influences of quarks Regions of PDF change > 0 . 2% (solid) or > 0 . 05% (dotted) produced by a 1% change at Q 0 = 1 . 3 GeV in a narrow band of x : ¯ d + ¯ u u v • Valence quarks are unimportant at small x as expected. • Quark evolution is mostly at constant x , with a bit of feed-down toward smaller x . 14

Evolutionary influences of gluon Regions of PDF change > 0 . 2% (solid) or > 0 . 05% (dotted) caused by a 1% change in gluon at Q 0 = 1 . 3 GeV in a narrow band of x : • Influence of input g ( x ) spreads in x much more than quarks. • Small-x gluon at Q 0 = 1 . 3 GeV has little direct influence ⇒ gluons at moderate and high Q are radiatively generated. 15

Measuring PDFs by QCD fitting Jon Pumplin PDF School (DESY 2023 - PDF document

Measuring PDFs by QCD fitting Jon Pumplin PDF School (DESY 2023 October 2009) Hadrons interact at large momentum transfer (= short distance) through their quark and gluon constituents. Owing to the asymptotic freedom property of QCD, s (

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Using PDFs Student Web Presence Guidelines Summary 1. PDFs are for printing 2. Why people use

5.2 Joint Continuous Distributions Anna Karlin Most slides by Alex Tsun recap Joint PDFs

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

QCD Evolution 2019 3-D STRUCTURE OF THE PION AND KAON FROM QCD'S DYSON-SCHWINGER EQUATIONS.

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Extracting Tables from PDFs Extracting Tables from PDFs Using Camelot and Excalibur to

PDFs in Drupal Making PDFs work well in Drupal 8 Presented by Dan Hansen - Sevaa Group, Inc.

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Z c (3900) from lattice QCD based on Y. Ikeda et al., (HAL QCD), arXiv.1602.03465(hep-lat).

Extreme Event Modelling Zhou Introduction Theory and Liwei Wu Methods Asymptotic Supervisor:

Quick detection of popular entities in large on-line networks Nelly Litvak University of Twente,

Directed Polymers in Random Environment with Heavy Tails A. Auffinger O. Louidor Courant (New

Zeros of random analytic functions and extreme value theory Zakhar Kabluchko University of Ulm

On the nature of financial risk: Why risk is so hard to measure and why risk models fail so often

Game Theory 1 A game has two players, A and B and a matrix . This is called a a ij

COP 3014: Fall 2019 Final Study Guide December 3, 2019 The test consists of 1. 15 multiple

UC.yber; Meeting 24 Meet New Leadership and Wireshark If Youre New! Join our Slack

Measuring PDFs by QCD fitting Jon Pumplin PDF School (DESY 2023 - PDF document

Measuring PDFs by QCD fitting Jon Pumplin PDF School (DESY 2023 October 2009) Hadrons interact at large momentum transfer (= short distance) through their quark and gluon constituents. Owing to the asymptotic freedom property of QCD, s (

Track fitting, vertex fitting and Track fitting, vertex fitting and Track fitting, vertex fitting

Using PDFs Student Web Presence Guidelines Summary 1. PDFs are for printing 2. Why people use

5.2 Joint Continuous Distributions Anna Karlin Most slides by Alex Tsun recap Joint PDFs

Week 2 Video 5 Cross-Validation and Over-Fitting Over-Fitting Ive mentioned over-fitting a

QCD Evolution 2019 3-D STRUCTURE OF THE PION AND KAON FROM QCD'S DYSON-SCHWINGER EQUATIONS.

Lecture 11 Fitting ARIMA Models 10/10/2018 1 Model Fitting Fitting ARIMA For an

Extracting Tables from PDFs Extracting Tables from PDFs Using Camelot and Excalibur to

PDFs in Drupal Making PDFs work well in Drupal 8 Presented by Dan Hansen - Sevaa Group, Inc.

Functions and Data Fitting COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning

Over fitting distribution functions over Bayesian Regression / &quot; ' i diggllloise dist

Fitting high resolution structures into low resolution EM maps Michael Rossmann Purdue

Fitting a Line, Residuals, and Correlation October 28, 2019 October 28, 2019 1 / 36 Fitting a

Unit 1: Data Fitting Motivation Data fitting: Construct a continuous function that represents

Fitting a Line, Residuals, and Correlation August 27, 2019 August 27, 2019 1 / 54 Fitting a

Least Squares and Data Fitting Data fitting How do we best fit a set of data points? Linear

Z c (3900) from lattice QCD based on Y. Ikeda et al., (HAL QCD), arXiv.1602.03465(hep-lat).

Extreme Event Modelling Zhou Introduction Theory and Liwei Wu Methods Asymptotic Supervisor:

Quick detection of popular entities in large on-line networks Nelly Litvak University of Twente,

Directed Polymers in Random Environment with Heavy Tails A. Auffinger O. Louidor Courant (New

Zeros of random analytic functions and extreme value theory Zakhar Kabluchko University of Ulm

On the nature of financial risk: Why risk is so hard to measure and why risk models fail so often

Game Theory 1 A game has two players, A and B and a matrix . This is called a a ij

COP 3014: Fall 2019 Final Study Guide December 3, 2019 The test consists of 1. 15 multiple

UC.yber; Meeting 24 Meet New Leadership and Wireshark If Youre New! Join our Slack

Over fitting distribution functions over Bayesian Regression / " ' i diggllloise dist