Approximating likelihood ratios with calibrated classifiers Gilles - PowerPoint PPT Presentation

Approximating likelihood ratios with calibrated classifiers Gilles Louppe April 13, 2016

Joint work with Kyle Cranmer Juan Pavez New York University Federico Santa Mar´ ıa University See paper (Cranmer et al., 2015) for full details. 2 / 16

Studying the constituents of the universe (c) Jorge Cham 3 / 16

Collecting data (c) Jorge Cham 4 / 16

Testing for new physics p (data | theory + X ) p (data | theory) (c) Jorge Cham 5 / 16

Likelihood-free setup • Complex simulator p parameterized by θ ; • Samples x ∼ p can be generated on-demand; • ... but the likelihood p ( x | θ ) cannot be evaluated! p = ⊗ 6 / 16

Simple hypothesis testing • Assume some observed data D = { x 1 , . . . , x n } ; • Test a null θ = θ 0 against an alternative θ = θ 1 ; • The Neyman-Pearson lemma states that the most powerful test statistic is p X ( x | θ 0 ) � λ ( D ; θ 0 , θ 1 ) = p X ( x | θ 1 ) . x ∈D • ... but neither p X ( x | θ 0 ) nor p X ( x | θ 1 ) can be evaluated! 7 / 16

Straight approximation 1. Approximate p X ( x | θ 0 ) and p X ( x | θ 1 ) individually, using density estimation algorithms; 2. Evaluate their ratio r ( x ; θ 0 , θ 1 ). Works fine for low-dimensional data, but because of the curse of dimensionality, this is in general a difficult problem! Moreover, it is not even necessary! p X ( x | θ 0 ) p X ( x | θ 1 ) = r ( x ; θ 0 , θ 1 ) / When solving a problem of interest, do not solve a more general problem as an intermediate step. – Vladimir Vapnik 8 / 16

Likehood ratio invariance under change of variable Theorem. The likelihood ratio is invariant under the change of variable U = s ( X ), provided s ( x ) is monotonic with r ( x ). r ( x ) = p X ( x | θ 0 ) p X ( x | θ 1 ) = p U ( s ( x ) | θ 0 ) p U ( s ( x ) | θ 1 ) 9 / 16

Approximating likelihood ratios with classifiers • Well, a classifier trained to distinguish x ∼ p 0 from x ∼ p 1 approximates p X ( x | θ 1 ) s ∗ ( x ) = p X ( x | θ 0 ) + p X ( x | θ 1 ) , which is monotonic with r ( x ). • Estimating p ( s ( x ) | θ ) is now easy, since the change of variable s ( x ) projects x in a 1D space, where only the informative content of the ratio is preserved. This can be carried out using density estimation or calibration algorithms (histograms, KDE, isotonic regression, etc). • Disentangle training from calibration. 10 / 16

Inference and composite hypothesis testing Approximated likelihood ratios can be used for inference, since ˆ θ = arg max p ( D| θ ) θ p ( x | θ ) � = arg max p ( x | θ 1 ) θ x ∈D p ( s ( x ; θ, θ 1 ) | θ ) � = arg max (1) p ( s ( x ; θ, θ 1 ) | θ 1 ) θ x ∈D where θ 1 is fixed and s ( x ; θ, θ 1 ) is a family of classifiers parameterized by ( θ, θ 1 ). Accordingly, generalized (or profile) likelihood ratio tests can be evaluated in the same way. 11 / 16

Parameterized learning For inference, we need to build a family s ( x ; θ, θ 1 ) of classifiers. • One could build a classifier s independently for all θ, θ 1 . But this is computationally expensive and would not guarantee a smooth evolution of s ( x ; θ, θ 1 ) as θ varies. • Solution: build a single parameterized classifier instead, where parameters are additional input features (Cranmer et al., 2015; Baldi et al., 2016). T := {} ; while size( T ) < N do Draw θ 0 ∼ π Θ 0 ; Draw x ∼ p ( x | θ 0 ); T := T ∪ { (( x , θ 0 , θ 1 ) , y = 0) } ; Draw θ 1 ∼ π Θ 1 ; Draw x ∼ p ( x | θ 1 ); T := T ∪ { (( x , θ 0 , θ 1 ) , y = 1) } ; end while Learn a single classifier s ( x ; θ 0 , θ 1 ) from T . 12 / 16

Example: Inference from multidimensional data Let assume 5D data x generated from the following process p 0 : 1. z := ( z 0 , z 1 , z 2 , z 3 , z 4 ), such that z 0 ∼ N ( µ = α, σ = 1), 5 0 X1 z 1 ∼ N ( µ = β, σ = 3), 5 0 1 z 2 ∼ Mixture( 1 2 N ( µ = − 2 , σ = 6 1) , 1 0 X2 2 N ( µ = 2 , σ = 0 . 5)), 6 1 2 z 3 ∼ Exponential( λ = 3), and 3 0 z 4 ∼ Exponential( λ = 0 . 5); X3 3 6 2. x := R z , where R is a fixed semi-positive 2 1 definite 5 × 5 matrix defining a fixed 8 X4 4 projection of z into the observed space. 0 3 0 3 0 5 0 5 2 6 0 6 6 3 0 3 0 4 8 2 1 1 1 X0 X1 X2 X3 X4 Our goal is to infer the values α and Observed data D β based on D . Check out (Louppe et al., 2016) to reproduce this example. 13 / 16

Example: Inference from multidimensional data Recipe: 1. Build a single parameterized classifier s ( x ; θ 0 , θ 1 ), in this case a 2-layer NN trained on 5+2 features, with the alternative fixed to θ 1 = ( α = 0 , β = 0). α, ˆ 2. Find the approximated MLE ˆ β by solving Eqn. 1. Solve Eqn. 1 using likelihood scans or through optimization. Since the generator is inexpensive, p ( s ( x ; θ 0 , θ 1 ) | θ ) can be calibrated on-the-fly, for every candidate ( α, β ), e.g. using histograms. 3. Construct the log-likelihood ratio (LLR) statistic − 2 log Λ( α, β ) = − 2 log p ( D| α, β ) α, ˆ p ( D| ˆ β ) 14 / 16

Approx. LLR (smoothed by a Exact − 2 log Λ( α, β ) Gaussian Process) 0.6 0.6 0.8 0.8 β β 1.0 1.0 1.2 1.2 1.4 1.4 0.90 0.95 1.00 1.05 1.10 1.15 0.90 0.95 1.00 1.05 1.10 1.15 α α 0 4 8 12 16 20 24 28 32 α =1, β = − 1 Exact MLE Approx. MLE 15 / 16

Diagnostics In practice ˆ r (ˆ s ( x ; θ 0 , θ 1 )) will not be exact. Diagnostic procedures are needed to assess the quality of this approximation. 1. For inference, the value of the MLE ˆ θ should be independent of the value of θ 1 used in the denominator of the ratio. 2. Train a classifier to distinguish between unweighted samples from p ( x | θ 0 ) and samples from p ( x | θ 1 ) weighted by ˆ r (ˆ s ( x ; θ 0 , θ 1 )). 14 1.0 Exact 12 Approx., θ 1 =( α =0 ,β =1) Approx., θ 1 =( α =1 ,β = − 1) 0.8 10 Approx., θ 1 =( α =0 ,β = − 1) True Positive Rate ± 1 σ , θ 1 =( α =0 ,β = − 1) − 2logΛ( θ ) 8 0.6 6 0.4 4 p ( x | θ 1 ) r ( x | θ 0 ,θ 1 ) exact 0.2 p ( x | θ 1 ) no weights 2 p ( x | θ 1 ) r ( x | θ 0 ,θ 1 ) approx. 0 0.0 0.7 0.8 0.9 1.0 1.1 1.2 1.3 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate α

Density ratio estimation Approximating likelihood ratios relates to many other fundamental statistical inference problems, including • transfer learning, • outlier detection, • divergence estimation, • ...

Transfer learning: p train � = p test As training data increases, i.e. as N → ∞ , 1 � � L ( ϕ ( x i )) → L ( ϕ ( x )) p train ( x ) d x . N x i We want to be good on test data, i.e., minimize � L ( ϕ ( x )) p test ( x ) d x . Solution: importance weighting . 1 p test ( x i ) ϕ ∗ = arg min � p train ( x i ) L ( ϕ ( x i )) N ϕ x i

Summary • We proposed an approach for approximating LR in the likelihood-free setup. • Evaluating likelihood ratios reduces to supervised learning. Both problems are deeply connected. • Alternative to Approximate Bayesian Computation, without the need to define a prior over parameters. 16 / 16

References Baldi, P., Cranmer, K., Faucett, T., Sadowski, P., and Whiteson, D. (2016). Parameterized Machine Learning for High-Energy Physics. arXiv preprint arXiv:1601.07913 . Cranmer, K., Pavez, J., and Louppe, G. (2015). Approximating likelihood ratios with calibrated discriminative classifiers. arXiv preprint arXiv:1506.02169 . Louppe, G., Cranmer, K., and Pavez, J. (2016). carl: a likelihood-free inference toolbox. http://dx.doi.org/10.5281/zenodo.47798 , https://github.com/diana-hep/carl .

Approximating likelihood ratios with calibrated classifiers Gilles - PowerPoint PPT Presentation

Approximating likelihood ratios with calibrated classifiers Gilles Louppe April 13, 2016 Joint work with Kyle Cranmer Juan Pavez New York University Federico Santa Mar a University See paper (Cranmer et al., 2015) for full details. 2

Approximating likelihood ratios with calibrated classifiers Gilles Louppe DIANA meeting

Approximating likelihood ratios with calibrated classifiers Gilles Louppe June 22, 2016 MLHEP,

The distribution of calibrated likelihood-ratios in speaker recognition David van Leeuwen and

Writing Ratios Return to Table of Contents Slide 5 / 206 Ratios What do you know about

Max. likelihood & Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

On-demand radio imaging On-demand radio imaging access to calibrated data for all astronomers

Calibrated Bayes, and Inferential Paradigm for Of7icial Statistics in the Era of Big Data Rod

Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking

The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University

PLURISUBHARMONICITY and PSEUDOCONVEXITY IN CALIBRATED (and other) GEOMETRIES with REESE HARVEY

Optimization of Stocking Ratios of the GIFT Optimization of Stocking Ratios of the GIFT Strain of

Ratios, Rates & Proportions Slide 2 / 130 Table of Contents Click on the topic to go to that

Financial Stat Financial Statement Analysis- ement Analysis- Ratios Ratios Christina Bradbury,

Writing Ratios Direct & Indirect Relationships in Tables & Graphs Constant of

6th Grade Ratios, Proportions & Percents 2015-11-16 www.njctl.org Slide 3 / 208 Slide 4 /

INTERNATIONAL RATIOS INTERNATIONAL RATIOS TELL A STORY: 2005 by Mark E. Haskins 1 Reasons for

A linear time algorithm for constrained optimal segmentation Toby Dylan Hocking

Advanced Analytics in Business [D0S07a] Big Data Platforms & Technologies [D0S06a] Model

A new polynomial algorithm for nested resource allocation, speed optimization and other related

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 7 of Data Mining by

Testing for Parametric Orderings Efficiency Sergio Ortobelli 1 , 3 Nikolas Topaloglou 4 Matteo

Constructing optimal designs on finite experimental domains using methods of mathematical

Display Advertising Weinan Zhang, Shanghai Jiao Tong University Jian Xu, TouchPal Inc.

Sustainability Report 2019 Table of 01 3 Win in the marketplace contents 4 Coca-Cola in

Approximating likelihood ratios with calibrated classifiers Gilles - PowerPoint PPT Presentation

Approximating likelihood ratios with calibrated classifiers Gilles Louppe April 13, 2016 Joint work with Kyle Cranmer Juan Pavez New York University Federico Santa Mar a University See paper (Cranmer et al., 2015) for full details. 2

Approximating likelihood ratios with calibrated classifiers Gilles Louppe DIANA meeting

Approximating likelihood ratios with calibrated classifiers Gilles Louppe June 22, 2016 MLHEP,

The distribution of calibrated likelihood-ratios in speaker recognition David van Leeuwen and

Writing Ratios Return to Table of Contents Slide 5 / 206 Ratios What do you know about

Max. likelihood &amp; Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

On-demand radio imaging On-demand radio imaging access to calibrated data for all astronomers

Calibrated Bayes, and Inferential Paradigm for Of7icial Statistics in the Era of Big Data Rod

Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking

The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University

PLURISUBHARMONICITY and PSEUDOCONVEXITY IN CALIBRATED (and other) GEOMETRIES with REESE HARVEY

Optimization of Stocking Ratios of the GIFT Optimization of Stocking Ratios of the GIFT Strain of

Ratios, Rates &amp; Proportions Slide 2 / 130 Table of Contents Click on the topic to go to that

Financial Stat Financial Statement Analysis- ement Analysis- Ratios Ratios Christina Bradbury,

Writing Ratios Direct &amp; Indirect Relationships in Tables &amp; Graphs Constant of

6th Grade Ratios, Proportions &amp; Percents 2015-11-16 www.njctl.org Slide 3 / 208 Slide 4 /

INTERNATIONAL RATIOS INTERNATIONAL RATIOS TELL A STORY: 2005 by Mark E. Haskins 1 Reasons for

A linear time algorithm for constrained optimal segmentation Toby Dylan Hocking

Advanced Analytics in Business [D0S07a] Big Data Platforms &amp; Technologies [D0S06a] Model

A new polynomial algorithm for nested resource allocation, speed optimization and other related

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 7 of Data Mining by

Testing for Parametric Orderings Efficiency Sergio Ortobelli 1 , 3 Nikolas Topaloglou 4 Matteo

Constructing optimal designs on finite experimental domains using methods of mathematical

Display Advertising Weinan Zhang, Shanghai Jiao Tong University Jian Xu, TouchPal Inc.

Sustainability Report 2019 Table of 01 3 Win in the marketplace contents 4 Coca-Cola in

Max. likelihood & Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Ratios, Rates & Proportions Slide 2 / 130 Table of Contents Click on the topic to go to that

Writing Ratios Direct & Indirect Relationships in Tables & Graphs Constant of

6th Grade Ratios, Proportions & Percents 2015-11-16 www.njctl.org Slide 3 / 208 Slide 4 /

Advanced Analytics in Business [D0S07a] Big Data Platforms & Technologies [D0S06a] Model