Likelihood-free gravitational-wave parameter estimation with neural - PowerPoint PPT Presentation

Likelihood-free gravitational-wave parameter estimation with neural networks Stephen R. Green Albert Einstein Institute Potsdam based on arXiv:2002.07656 with C. Simpson and J. Gair Gravity Seminar University of Southampton February 27, 2020 � 1

Outline 1. Introduction to Bayesian inference for compact binaries 2. Likelihood-free inference with neural networks (a) Basic approach (b) Normalizing flows (c) Variational autoencoders 3. Results � 2

                Introduction to parameter estimation • Bayesian inference for compact binaries:   Sample posterior distribution for system parameters � (masses, spins, sky θ position, etc.) given detector strain data � .   s likelihood prior p ( θ | s ) = p ( s | θ ) p ( θ ) p ( s ) evidence (normalizing factor) • Once likelihood and prior are defined, right hand side can be evaluated (up to normalization). � 3

    ̂               Introduction to parameter estimation • Likelihood based on assumption that if the gravitational-wave signal were subtracted from � , then what remains must be noise. s • Noise � assumed to follow stationary Gaussian distribution, i.e.,   n n ∼ p ( n ) ∝ exp ( − 1 2 ( n | n ) ) where the noise-weighted inner product is   a ( f ) ̂ a ( f )* ̂ ∞ ( a | b ) = 2 ∫ b ( f )* + ̂ b ( f ) d f S n ( f ) 0 detector noise power spectral density (PSD) • Summed over detectors, this gives the likelihood,   p ( s | θ ) ∝ exp ( − 1 ( s I − h I ( θ ) | s I − h I ( θ ) ) ) 2 ∑ I � 4

    Introduction to parameter estimation • Prior � based on beliefs about system p ( θ ) before looking at data,   e.g., uniform in � over some range,   m 1 , m 2 uniform in spatial volume,   etc. • With prior and likelihood defined, the posterior can be evaluated up to normalization. • Method such as Markov chain Monte Carlo (MCMC) is used to obtain posterior samples.   Move around parameter space, and compare strain data � against waveform model � . s h ( θ ) Image: Abbott et al (2016) � 5

  Need for new methods • Standard method expensive: • Many likelihood evaluations required for each independent sample • Likelihood evaluation slow, requires a waveform to be generated • Various waveform models (EOBNR, Phenom, …) created as faster alternatives to numerical relativity; reduced-order surrogate models for even faster evaluation. • Days to months for parameter estimation of a single event, depending on type of event and waveform model. Goal of this work:   Develop deep learning methods to do parameter estimation much faster. Model the posterior distribution � with a neural network. p ( θ | s ) � 6

Main result: very fast posterior sampling 5 5 0 m 2 / M � 5 5 4 0 4 5 3 0 6 . Rest of this talk:   5 4 . φ 0 0 3 . How did we do this? 5 1 . 0 0 . 5 6 8 6 0 . 0 5 t c / s 8 6 0 . 5 3 8 6 0 . 0 2 8 6 0 . 0 0 4 2 d L / Mpc 0 0 0 2 0 0 6 1 0 0 2 1 9 0 . 6 0 . χ 1 z 3 0 . 0 0 . 3 0 . − 0 1 . 5 0 . χ 2 z 0 0 . 5 0 . − 0 2 1 3 . . − 4 2 . θ JN 6 1 . 8 0 . 0 0 . 4 0 6 2 8 5 0 5 0 5 0 5 0 5 0 0 5 0 5 0 0 0 0 3 0 3 6 9 0 5 0 5 0 0 8 6 4 2 5 6 6 7 7 3 4 4 5 5 2 3 5 6 0 0 0 0 0 . 1 . 3 . 4 . 6 . 0 . 0 . 0 . 0 . 0 . 1 . 0 . 0 . 0 . 1 0 . . 0 . 1 . 2 . 3 . 8 8 8 8 2 6 0 4 6 6 6 6 1 1 2 2 − − − m 1 / M � m 2 / M � 0 0 0 0 φ 0 . . . . θ JN � 7 d L / Mpc χ 1 z χ 2 z t c / s

Two key ideas 1. A conditional probability distribution can be described by a neural network.   2. The network can be trained to model a gravitational wave posterior distribution without ever evaluating a likelihood. Instead, it only requires samples � from the data generating process. ( θ , s ) � 8

  <latexit sha1_base64="lLl4U0dj1wGiFMGs9y0nSvg/7rM=">ACOnicbVDBThsxEPVCgRAKBDhysRpRgYQiL0KCA0ioXDgSqQGkOIq8zuzGwutd2bMo0Srf1Uu/ghuHXjiAql7AXWSPbTQJ1l6fm9m7HlRrpVDxp6ChcUPS8srtdX62sf1jc3G1vaNyworoSMzndm7SDjQykAHFWq4y2INJwG91fTv3bB7BOZeYrjnPopSIxKlZSoJf6jTZ3KklFP9wfHdBzyjXEyEvKI0iUKYW1YjwprZ7Q0SH9TEc8AcZ5nc0uZ4xyMIOqinKrkiG2+o0ma7EZ6HsSVqRJKlz3G498kMkiBYNSC+e6Icux56eikhomdV4yIW8Fwl0PTUiBdcrZ6tP6J5XBjTOrD8G6Uz9u6MUqXPjNPKVqcChe+tNxf953QLj016pTF4gGDl/KC40xYxOc6QDZUGiHnsipFX+r1QOhRUSfdp1H0L4duX35OaoFbJW2D5uXnyp4qiRXfKJ7JOQnJALckWuSYdI8o38IC/kNfgePAc/g1/z0oWg6tkh/yD4/QdYeatY</latexit> <latexit sha1_base64="lLl4U0dj1wGiFMGs9y0nSvg/7rM=">ACOnicbVDBThsxEPVCgRAKBDhysRpRgYQiL0KCA0ioXDgSqQGkOIq8zuzGwutd2bMo0Srf1Uu/ghuHXjiAql7AXWSPbTQJ1l6fm9m7HlRrpVDxp6ChcUPS8srtdX62sf1jc3G1vaNyworoSMzndm7SDjQykAHFWq4y2INJwG91fTv3bB7BOZeYrjnPopSIxKlZSoJf6jTZ3KklFP9wfHdBzyjXEyEvKI0iUKYW1YjwprZ7Q0SH9TEc8AcZ5nc0uZ4xyMIOqinKrkiG2+o0ma7EZ6HsSVqRJKlz3G498kMkiBYNSC+e6Icux56eikhomdV4yIW8Fwl0PTUiBdcrZ6tP6J5XBjTOrD8G6Uz9u6MUqXPjNPKVqcChe+tNxf953QLj016pTF4gGDl/KC40xYxOc6QDZUGiHnsipFX+r1QOhRUSfdp1H0L4duX35OaoFbJW2D5uXnyp4qiRXfKJ7JOQnJALckWuSYdI8o38IC/kNfgePAc/g1/z0oWg6tkh/yD4/QdYeatY</latexit> <latexit sha1_base64="lLl4U0dj1wGiFMGs9y0nSvg/7rM=">ACOnicbVDBThsxEPVCgRAKBDhysRpRgYQiL0KCA0ioXDgSqQGkOIq8zuzGwutd2bMo0Srf1Uu/ghuHXjiAql7AXWSPbTQJ1l6fm9m7HlRrpVDxp6ChcUPS8srtdX62sf1jc3G1vaNyworoSMzndm7SDjQykAHFWq4y2INJwG91fTv3bB7BOZeYrjnPopSIxKlZSoJf6jTZ3KklFP9wfHdBzyjXEyEvKI0iUKYW1YjwprZ7Q0SH9TEc8AcZ5nc0uZ4xyMIOqinKrkiG2+o0ma7EZ6HsSVqRJKlz3G498kMkiBYNSC+e6Icux56eikhomdV4yIW8Fwl0PTUiBdcrZ6tP6J5XBjTOrD8G6Uz9u6MUqXPjNPKVqcChe+tNxf953QLj016pTF4gGDl/KC40xYxOc6QDZUGiHnsipFX+r1QOhRUSfdp1H0L4duX35OaoFbJW2D5uXnyp4qiRXfKJ7JOQnJALckWuSYdI8o38IC/kNfgePAc/g1/z0oWg6tkh/yD4/QdYeatY</latexit> <latexit sha1_base64="lLl4U0dj1wGiFMGs9y0nSvg/7rM=">ACOnicbVDBThsxEPVCgRAKBDhysRpRgYQiL0KCA0ioXDgSqQGkOIq8zuzGwutd2bMo0Srf1Uu/ghuHXjiAql7AXWSPbTQJ1l6fm9m7HlRrpVDxp6ChcUPS8srtdX62sf1jc3G1vaNyworoSMzndm7SDjQykAHFWq4y2INJwG91fTv3bB7BOZeYrjnPopSIxKlZSoJf6jTZ3KklFP9wfHdBzyjXEyEvKI0iUKYW1YjwprZ7Q0SH9TEc8AcZ5nc0uZ4xyMIOqinKrkiG2+o0ma7EZ6HsSVqRJKlz3G498kMkiBYNSC+e6Icux56eikhomdV4yIW8Fwl0PTUiBdcrZ6tP6J5XBjTOrD8G6Uz9u6MUqXPjNPKVqcChe+tNxf953QLj016pTF4gGDl/KC40xYxOc6QDZUGiHnsipFX+r1QOhRUSfdp1H0L4duX35OaoFbJW2D5uXnyp4qiRXfKJ7JOQnJALckWuSYdI8o38IC/kNfgePAc/g1/z0oWg6tkh/yD4/QdYeatY</latexit>   Introduction to neural networks • Nonlinear functions constructed as composition of mappings: Consists of: First hidden Input layer layer 1. Linear transformation   W 1 x + b 1 h 1 x 2. Simple element-wise σ 1 ( W 1 x + b 1 ) nonlinear mapping.   ⇢ x, x ≥ 0 E.g., x ∈ ℝ N h 1 ∈ ℝ N 1 σ 1 ( x ) = 0 , x < 0 � 9

Introduction to neural networks First hidden Final hidden Second hidden Output layer Input layer layer layer layer … y h p x h 1 h 2 σ out ( W out h p + b out ) σ 1 ( W 1 x + b 1 ) σ 2 ( W 2 h 1 + b 2 ) y ∈ ℝ N out x ∈ ℝ N • Training/test data consist of (x, y) pairs. • Train network by tuning the weights W and biases b to minimize loss function � L ( y , y out ) • Stochastic gradient descent combined with chain rule (“backpropagation”) to adjust weights and biases. � 10

<latexit sha1_base64="E86g1g0Oj0vjL8P28h4md/eBhr4=">ACpXicbVFdb9MwFHXC1yhfBR5sVYxEolWSTUEL5UmeNkLaIO1m1S3keM6rTvbCbaDGrn5Z/wK3vg3OF3Ex8aVLB2de6PfW5acKZNFP30/Fu379y9t3e/8+Dho8dPuk+fTXReKkLHJOe5ukixpxJOjbMcHpRKIpFyul5evmh6Z9/o0qzXJ6ZqAzgZeSZYxg46ik+70INtsqhAcjiAQ2K4K5/VQHSJRBFb6G6AtbCuxgGxCiFDnYIQyhYmNa4v0V2VsMEQFC+dyixbU/NFv6xrRTYE4zUwA+7+HhjXSpUgsW4/iei6DTcJg31mXCWtc2gvmth/Xjah2JHSidSta70QKbZcmTDp9qJBtCt4E8Qt6IG2TpLuD7TISmoNIRjradxVJiZxcowmndQaWmBSaXeEmnDkosqJ7ZXco1fOmYBcxy5Y40cMf+PWGx0LoSqVM2ServYb8X29amuzdzDJZlIZKcmWUlRyaHDYrgwumKDG8cgATxdxbIVlhF6hxi+24EOLrX74JsNBfDh4c3rYO3rfxrEHXoB9EIAYvAVH4BicgDEg3r537J16n/1X/kf/zJ9cSX2vnXkO/ik/+QUoycrp</latexit>               Neural networks as probability distributions • Since conditional probability distributions can be parametrized by functions, and neural networks are functions, conditional probability distributions can be described by neural networks.   E.g., multivariate normal distribution   p ( x | y ) = N ( µ ( y ) , Σ ( y ))( x ) 0 1 n 1 @ − 1 X ( x i − µ i ( y )) Σ − 1 = exp ij ( y )( x j − µ j ( y )) A p 2 (2 π ) n | det Σ ( y ) | ij =1 where � μ ( y ), Σ ( y ) = NN ( y ) . • For this example, it is trivial to draw samples and evaluate the density. • More complex distributions may also be described by neural networks (later in talk). � 11

                Likelihood-free inference with neural networks [First applied to GW by Chua and Vallisneri (2020), Gabbard et al (2019)] • Goal is to train network to model true posterior, as given by prior and likelihood that we specify, i.e.,   p ( θ | s ) → p true ( θ | s ) • Minimize expectation value (over � ) of cross-entropy between the distributions   s L = − ∫ d s p true ( s ) ∫ d θ p true ( θ | s ) log p ( θ | s ) Intractable with knowing posterior for each � ! s • Bayes’ theorem � ⟹ p true ( s ) p true ( θ | s ) = p true ( θ ) p true ( s | θ ) ∴ L = − ∫ d θ p true ( θ ) ∫ d s p true ( s | θ ) log p ( θ | s ) Only requires samples from likelihood, not the posterior! � 12

Likelihood-free gravitational-wave parameter estimation with neural - PowerPoint PPT Presentation

Likelihood-free gravitational-wave parameter estimation with neural networks Stephen R. Green Albert Einstein Institute Potsdam based on arXiv:2002.07656 with C. Simpson and J. Gair Gravity Seminar University of Southampton February 27, 2020 1

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

Gravitational-Wave Astronomy 1060-711: Astronomical Observational Techniques and Instrumentation

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Parameter Estimation and Model Selection of Gravitational-Wave Signals Contaminated by Transient

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

Gravitational Wave Data Analysis: II. Model Selection and Parameter Estimation Chris Van Den

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

A brief history of gravitational-wave research and the gravitational-wave spectrum Wei-Tou Ni

1-Gravitational waves as solutions to GR equations 2-Effect on matter of gravitational waves

Max Likelihood for Log-Linear Models Daphne Koller Log-Likelihood for Markov Nets A B C

Search for the gravity wave signature of Search for the gravity wave signature of

Issues on Template Implementation for the BBH Inspiral Search Eirini Messaritaki (UW-Milwaukee)

On Evaluating a New Class of Available Bandwidth Methods Attila Psztor Darryl Veitch Pter

Chirping for Congestion Control ICCRG IETF80 Prag March 31st, 2011 Mirja Khlewind

ARDUINO SERIES BASIC DIGITAL INPUTS AND OUTPUTS 1. THE BASICS: INPUTS, OUTPUTS, SOURCES AND

Titanium Sapphire lasers 1 Kerr Lens Modelocking Intensity-dependent refractive index Kerr

Research Activities on Eleven Feed: 1-13GHz Eleven Feed and Dual Band (C/Ku) Solution based on

Solar Cell Operation, Performance and Design Rules Spectral Utilization I - External Quantum

Supervised Learning in Structured Spiking Neural Networks 12 (including a detour to SpiNNaker)