 
              A Test Statistic for Weighted Runs Frederik Beaujean, Allen Caldwell http://arxiv.org/abs/1005.3233v2 COMPSTAT 2010 Paris, 23.8.2010
Motivating example Suppose: y i Measurements with Gaussian uncertainty ● Standard Model (SM) background is quadratic ● New physics (NP) predicts signal peak ● 23.08.2010 Frederik Beaujean #2
Goodness of Fit: standard approach Test statistic: Any scalar function of data, T(D) ● Interpret: large T(D) = poor model ● ∝ ∏ exp { −  y i − f  x i ∣ } = exp { Example:   2 2 } 2 − P  D ∣ Prob. density of the data ● 2 2  i 2  D  T  D ≡ Familiar choice ● 23.08.2010 Frederik Beaujean #3
p-value Def: p ≡ P  T  T  D  p T(D) Assuming the model and before data is taken: ● p uniform in [0,1] Critical values: p  0.05,0.01 ⇒ reject model ● Warning: p-value not the P . that the model is true ● Example: p SM = 10%, p NP = 37% ⇒ both OK 23.08.2010 Frederik Beaujean #4
Runs Most statistics disrespect order ● of data, information wasted Human brain good for simple ● problems Example: N=25 datapoints ● Each Gaussian with mean = 0 ● and variance = 1 Can we combine information about order and magnitude of deviation ? 23.08.2010 Frederik Beaujean #5
Runs statistic Proposal: Split data into runs ● Each run has a weight ● Gaussian case: T est statistic: largest weight of ● any run p-value becomes ● 23.08.2010 Frederik Beaujean #6
Runs distribution Gaussian case: Distribution of T exactly ● calculated for any N (non- parametric) Requires sum over integer ● partitions N = 25 23.08.2010 Frederik Beaujean #7
Power 5% level New physics contribution: ● T up to 35% more ● powerful than classic in detecting departures of type y(x) Lorentz peak with amplitude A 23.08.2010 Frederik Beaujean #8
Conclusions choose statistic with specific alternative models in mind ● Runs statistic T excellent for “bump hunting” ● FINIS FINIS 23.08.2010 Frederik Beaujean #9
Backup 23.08.2010 Frederik Beaujean #10
Exact runs distribution I 23.08.2010 Frederik Beaujean #11
Exact runs distribution II 23.08.2010 Frederik Beaujean #12
Exact runs distribution III 23.08.2010 Frederik Beaujean #13
Computational complexity: Integer partitions 23.08.2010 Frederik Beaujean #14
Goodness of Fit: Bayesian approach Model selection: Need explicit alternatives M 1 , M 2 P  M 1 ∣ D  P  M 2 ∣ D = P  M 1  P  M 2 × P  D ∣ M 1  ● P  D ∣ M 2  Posterior odds ● Bayes factor: (very) sensitive to parameter range ● P  D ∣ M 1 = ∫ p  D ∣  p 0   d   Occam's razor built in ● Example: P  SM ∣ D  P  NP ∣ D  = P  SM  P  NP  × 61.7 Six (NP) vs three (SM) parameters ● 23.08.2010 Frederik Beaujean #15
Recommend
More recommend