likelihood free gravitational wave parameter estimation
play

Likelihood-free gravitational-wave parameter estimation with neural - PowerPoint PPT Presentation

Likelihood-free gravitational-wave parameter estimation with neural networks Stephen R. Green Albert Einstein Institute Potsdam based on arXiv:2002.07656 with C. Simpson and J. Gair Gravity Seminar University of Southampton February 27, 2020 1


  1. Likelihood-free gravitational-wave parameter estimation with neural networks Stephen R. Green Albert Einstein Institute Potsdam based on arXiv:2002.07656 with C. Simpson and J. Gair Gravity Seminar University of Southampton February 27, 2020 � 1

  2. Outline 1. Introduction to Bayesian inference for compact binaries 2. Likelihood-free inference with neural networks (a) Basic approach (b) Normalizing flows (c) Variational autoencoders 3. Results � 2

  3. 
 
 
 
 
 
 
 
 Introduction to parameter estimation • Bayesian inference for compact binaries: 
 Sample posterior distribution for system parameters � (masses, spins, sky θ position, etc.) given detector strain data � . 
 s likelihood prior p ( θ | s ) = p ( s | θ ) p ( θ ) p ( s ) evidence (normalizing factor) • Once likelihood and prior are defined, right hand side can be evaluated (up to normalization). � 3

  4. 
 
 ̂ 
 
 
 
 
 
 
 Introduction to parameter estimation • Likelihood based on assumption that if the gravitational-wave signal were subtracted from � , then what remains must be noise. s • Noise � assumed to follow stationary Gaussian distribution, i.e., 
 n n ∼ p ( n ) ∝ exp ( − 1 2 ( n | n ) ) where the noise-weighted inner product is 
 a ( f ) ̂ a ( f )* ̂ ∞ ( a | b ) = 2 ∫ b ( f )* + ̂ b ( f ) d f S n ( f ) 0 detector noise power spectral density (PSD) • Summed over detectors, this gives the likelihood, 
 p ( s | θ ) ∝ exp ( − 1 ( s I − h I ( θ ) | s I − h I ( θ ) ) ) 2 ∑ I � 4

  5. 
 
 Introduction to parameter estimation • Prior � based on beliefs about system p ( θ ) before looking at data, 
 e.g., uniform in � over some range, 
 m 1 , m 2 uniform in spatial volume, 
 etc. • With prior and likelihood defined, the posterior can be evaluated up to normalization. • Method such as Markov chain Monte Carlo (MCMC) is used to obtain posterior samples. 
 Move around parameter space, and compare strain data � against waveform model � . s h ( θ ) Image: Abbott et al (2016) � 5

  6. 
 Need for new methods • Standard method expensive: • Many likelihood evaluations required for each independent sample • Likelihood evaluation slow, requires a waveform to be generated • Various waveform models (EOBNR, Phenom, …) created as faster alternatives to numerical relativity; reduced-order surrogate models for even faster evaluation. • Days to months for parameter estimation of a single event, depending on type of event and waveform model. Goal of this work: 
 Develop deep learning methods to do parameter estimation much faster. Model the posterior distribution � with a neural network. p ( θ | s ) � 6

  7. Main result: very fast posterior sampling 5 5 0 m 2 / M � 5 5 4 0 4 5 3 0 6 . Rest of this talk: 
 5 4 . φ 0 0 3 . How did we do this? 5 1 . 0 0 . 5 6 8 6 0 . 0 5 t c / s 8 6 0 . 5 3 8 6 0 . 0 2 8 6 0 . 0 0 4 2 d L / Mpc 0 0 0 2 0 0 6 1 0 0 2 1 9 0 . 6 0 . χ 1 z 3 0 . 0 0 . 3 0 . − 0 1 . 5 0 . χ 2 z 0 0 . 5 0 . − 0 2 1 3 . . − 4 2 . θ JN 6 1 . 8 0 . 0 0 . 4 0 6 2 8 5 0 5 0 5 0 5 0 5 0 0 5 0 5 0 0 0 0 3 0 3 6 9 0 5 0 5 0 0 8 6 4 2 5 6 6 7 7 3 4 4 5 5 2 3 5 6 0 0 0 0 0 . 1 . 3 . 4 . 6 . 0 . 0 . 0 . 0 . 0 . 1 . 0 . 0 . 0 . 1 0 . . 0 . 1 . 2 . 3 . 8 8 8 8 2 6 0 4 6 6 6 6 1 1 2 2 − − − m 1 / M � m 2 / M � 0 0 0 0 φ 0 . . . . θ JN � 7 d L / Mpc χ 1 z χ 2 z t c / s

  8. Two key ideas 1. A conditional probability distribution can be described by a neural network. 
 2. The network can be trained to model a gravitational wave posterior distribution without ever evaluating a likelihood. Instead, it only requires samples � from the data generating process. ( θ , s ) � 8

  9. 
 <latexit sha1_base64="lLl4U0dj1wGiFMGs9y0nSvg/7rM=">ACOnicbVDBThsxEPVCgRAKBDhysRpRgYQiL0KCA0ioXDgSqQGkOIq8zuzGwutd2bMo0Srf1Uu/ghuHXjiAql7AXWSPbTQJ1l6fm9m7HlRrpVDxp6ChcUPS8srtdX62sf1jc3G1vaNyworoSMzndm7SDjQykAHFWq4y2INJwG91fTv3bB7BOZeYrjnPopSIxKlZSoJf6jTZ3KklFP9wfHdBzyjXEyEvKI0iUKYW1YjwprZ7Q0SH9TEc8AcZ5nc0uZ4xyMIOqinKrkiG2+o0ma7EZ6HsSVqRJKlz3G498kMkiBYNSC+e6Icux56eikhomdV4yIW8Fwl0PTUiBdcrZ6tP6J5XBjTOrD8G6Uz9u6MUqXPjNPKVqcChe+tNxf953QLj016pTF4gGDl/KC40xYxOc6QDZUGiHnsipFX+r1QOhRUSfdp1H0L4duX35OaoFbJW2D5uXnyp4qiRXfKJ7JOQnJALckWuSYdI8o38IC/kNfgePAc/g1/z0oWg6tkh/yD4/QdYeatY</latexit> <latexit sha1_base64="lLl4U0dj1wGiFMGs9y0nSvg/7rM=">ACOnicbVDBThsxEPVCgRAKBDhysRpRgYQiL0KCA0ioXDgSqQGkOIq8zuzGwutd2bMo0Srf1Uu/ghuHXjiAql7AXWSPbTQJ1l6fm9m7HlRrpVDxp6ChcUPS8srtdX62sf1jc3G1vaNyworoSMzndm7SDjQykAHFWq4y2INJwG91fTv3bB7BOZeYrjnPopSIxKlZSoJf6jTZ3KklFP9wfHdBzyjXEyEvKI0iUKYW1YjwprZ7Q0SH9TEc8AcZ5nc0uZ4xyMIOqinKrkiG2+o0ma7EZ6HsSVqRJKlz3G498kMkiBYNSC+e6Icux56eikhomdV4yIW8Fwl0PTUiBdcrZ6tP6J5XBjTOrD8G6Uz9u6MUqXPjNPKVqcChe+tNxf953QLj016pTF4gGDl/KC40xYxOc6QDZUGiHnsipFX+r1QOhRUSfdp1H0L4duX35OaoFbJW2D5uXnyp4qiRXfKJ7JOQnJALckWuSYdI8o38IC/kNfgePAc/g1/z0oWg6tkh/yD4/QdYeatY</latexit> <latexit sha1_base64="lLl4U0dj1wGiFMGs9y0nSvg/7rM=">ACOnicbVDBThsxEPVCgRAKBDhysRpRgYQiL0KCA0ioXDgSqQGkOIq8zuzGwutd2bMo0Srf1Uu/ghuHXjiAql7AXWSPbTQJ1l6fm9m7HlRrpVDxp6ChcUPS8srtdX62sf1jc3G1vaNyworoSMzndm7SDjQykAHFWq4y2INJwG91fTv3bB7BOZeYrjnPopSIxKlZSoJf6jTZ3KklFP9wfHdBzyjXEyEvKI0iUKYW1YjwprZ7Q0SH9TEc8AcZ5nc0uZ4xyMIOqinKrkiG2+o0ma7EZ6HsSVqRJKlz3G498kMkiBYNSC+e6Icux56eikhomdV4yIW8Fwl0PTUiBdcrZ6tP6J5XBjTOrD8G6Uz9u6MUqXPjNPKVqcChe+tNxf953QLj016pTF4gGDl/KC40xYxOc6QDZUGiHnsipFX+r1QOhRUSfdp1H0L4duX35OaoFbJW2D5uXnyp4qiRXfKJ7JOQnJALckWuSYdI8o38IC/kNfgePAc/g1/z0oWg6tkh/yD4/QdYeatY</latexit> <latexit sha1_base64="lLl4U0dj1wGiFMGs9y0nSvg/7rM=">ACOnicbVDBThsxEPVCgRAKBDhysRpRgYQiL0KCA0ioXDgSqQGkOIq8zuzGwutd2bMo0Srf1Uu/ghuHXjiAql7AXWSPbTQJ1l6fm9m7HlRrpVDxp6ChcUPS8srtdX62sf1jc3G1vaNyworoSMzndm7SDjQykAHFWq4y2INJwG91fTv3bB7BOZeYrjnPopSIxKlZSoJf6jTZ3KklFP9wfHdBzyjXEyEvKI0iUKYW1YjwprZ7Q0SH9TEc8AcZ5nc0uZ4xyMIOqinKrkiG2+o0ma7EZ6HsSVqRJKlz3G498kMkiBYNSC+e6Icux56eikhomdV4yIW8Fwl0PTUiBdcrZ6tP6J5XBjTOrD8G6Uz9u6MUqXPjNPKVqcChe+tNxf953QLj016pTF4gGDl/KC40xYxOc6QDZUGiHnsipFX+r1QOhRUSfdp1H0L4duX35OaoFbJW2D5uXnyp4qiRXfKJ7JOQnJALckWuSYdI8o38IC/kNfgePAc/g1/z0oWg6tkh/yD4/QdYeatY</latexit> 
 Introduction to neural networks • Nonlinear functions constructed as composition of mappings: Consists of: First hidden Input layer layer 1. Linear transformation 
 W 1 x + b 1 h 1 x 2. Simple element-wise σ 1 ( W 1 x + b 1 ) nonlinear mapping. 
 ⇢ x, x ≥ 0 E.g., x ∈ ℝ N h 1 ∈ ℝ N 1 σ 1 ( x ) = 0 , x < 0 � 9

  10. Introduction to neural networks First hidden Final hidden Second hidden Output layer Input layer layer layer layer … y h p x h 1 h 2 σ out ( W out h p + b out ) σ 1 ( W 1 x + b 1 ) σ 2 ( W 2 h 1 + b 2 ) y ∈ ℝ N out x ∈ ℝ N • Training/test data consist of (x, y) pairs. • Train network by tuning the weights W and biases b to minimize loss function � L ( y , y out ) • Stochastic gradient descent combined with chain rule (“backpropagation”) to adjust weights and biases. � 10

  11. <latexit sha1_base64="E86g1g0Oj0vjL8P28h4md/eBhr4=">ACpXicbVFdb9MwFHXC1yhfBR5sVYxEolWSTUEL5UmeNkLaIO1m1S3keM6rTvbCbaDGrn5Z/wK3vg3OF3Ex8aVLB2de6PfW5acKZNFP30/Fu379y9t3e/8+Dho8dPuk+fTXReKkLHJOe5ukixpxJOjbMcHpRKIpFyul5evmh6Z9/o0qzXJ6ZqAzgZeSZYxg46ik+70INtsqhAcjiAQ2K4K5/VQHSJRBFb6G6AtbCuxgGxCiFDnYIQyhYmNa4v0V2VsMEQFC+dyixbU/NFv6xrRTYE4zUwA+7+HhjXSpUgsW4/iei6DTcJg31mXCWtc2gvmth/Xjah2JHSidSta70QKbZcmTDp9qJBtCt4E8Qt6IG2TpLuD7TISmoNIRjradxVJiZxcowmndQaWmBSaXeEmnDkosqJ7ZXco1fOmYBcxy5Y40cMf+PWGx0LoSqVM2ServYb8X29amuzdzDJZlIZKcmWUlRyaHDYrgwumKDG8cgATxdxbIVlhF6hxi+24EOLrX74JsNBfDh4c3rYO3rfxrEHXoB9EIAYvAVH4BicgDEg3r537J16n/1X/kf/zJ9cSX2vnXkO/ik/+QUoycrp</latexit> 
 
 
 
 
 
 
 Neural networks as probability distributions • Since conditional probability distributions can be parametrized by functions, and neural networks are functions, conditional probability distributions can be described by neural networks. 
 E.g., multivariate normal distribution 
 p ( x | y ) = N ( µ ( y ) , Σ ( y ))( x ) 0 1 n 1 @ − 1 X ( x i − µ i ( y )) Σ − 1 = exp ij ( y )( x j − µ j ( y )) A p 2 (2 π ) n | det Σ ( y ) | ij =1 where � μ ( y ), Σ ( y ) = NN ( y ) . • For this example, it is trivial to draw samples and evaluate the density. • More complex distributions may also be described by neural networks (later in talk). � 11

  12. 
 
 
 
 
 
 
 
 Likelihood-free inference with neural networks [First applied to GW by Chua and Vallisneri (2020), Gabbard et al (2019)] • Goal is to train network to model true posterior, as given by prior and likelihood that we specify, i.e., 
 p ( θ | s ) → p true ( θ | s ) • Minimize expectation value (over � ) of cross-entropy between the distributions 
 s L = − ∫ d s p true ( s ) ∫ d θ p true ( θ | s ) log p ( θ | s ) Intractable with knowing posterior for each � ! s • Bayes’ theorem � ⟹ p true ( s ) p true ( θ | s ) = p true ( θ ) p true ( s | θ ) ∴ L = − ∫ d θ p true ( θ ) ∫ d s p true ( s | θ ) log p ( θ | s ) Only requires samples from likelihood, not the posterior! � 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend