June 2, 2018 @ Osaka
Deep learning to diagnose the neutron star
Kenji Fukushima
The University of Tokyo
Based on work with Yuki Fujimoto, Koichi Murase — Deep Learning and Physics 2018 —
1
Deep learning to diagnose the neutron star Kenji Fukushima The - - PowerPoint PPT Presentation
Deep learning to diagnose the neutron star Kenji Fukushima The University of Tokyo Based on work with Yuki Fujimoto, Koichi Murase Deep Learning and Physics 2018 June 2, 2018 @ Osaka 1 Disclaimer I am a user of the deep
June 2, 2018 @ Osaka
Kenji Fukushima
The University of Tokyo
Based on work with Yuki Fujimoto, Koichi Murase — Deep Learning and Physics 2018 —
1
June 2, 2018 @ Osaka
2
I am a “user” of the deep learning…
3
From the point of view of physics users…
Sounds fancy is not enough…
June 2, 2018 @ Osaka
4
MODEL Input Data Nonlinear Mapping Output Data EXPERIMENT Very Uncertain Limited Information Easy Hard Compare… Exclude?
Theory Exp.
5
June 2, 2018 @ Osaka
6
Input Data Nonlinear Mapping Output Data EXPERIMENT Limited Information Easy Hard
Exp.
Not unique… What is the “most likely”
Inverse Problem
7
June 2, 2018 @ Osaka
8
0.0 7 8 9 10 11 Radius (km) 12 13 14 15 0.5 1.0 1.5 2.0
AP4
J1903+0327 J1909-3744 systems Double neutron s Double neutron star sy syJ1614-2230 AP3 ENG MPA1 GM3 GS1 PAL6 FSU SQM3 SQM1 PAL1 MS0 MS2 MS1
2.5 GR Causality Rotation P < ∞ Mass (M()
a
–40 –30 –20 –10 10 20 30Demorest et al. (2010) Precise determination of NS mass using Shapiro delay 1.928(17) Msun
(slightly changed in 2016)
(J1614-2230)
2.01(4) Msun (PSRJ0348+0432) Antoniadis et al. (2013)
June 2, 2018 @ Osaka
9
Equation of State M-R Relation Pressure : p Mass density : r
p = p(ρ)
NS mass : M NS radius : R
Tolman-Oppenheimer-
Mathematically one-to-one correspondence
gravity pressure diff
(Energy density : ε = ρc2)
M = M(ρmax) R = R(ρmax)
Input Data Output Data Nonlinear Mapping
June 2, 2018 @ Osaka
10
Lindblom (1992)
Brute-force solution of the inverse problem
Test data put by hand Solve TOV Reconstructed
Thanks to Y. Fujimoto
Radius Density Mass Pressure
The answer exists!
11
No magic box… Only “solvable” problem can be solved…
June 2, 2018 @ Osaka
12
R is fixed by TOV with p(R)=0 (“surface” condition)
dp/dr(r = R) = 0 d2p/dr2(r = R) ∝ M 2/R2
Determination of R is very uncertain, and on top of that, R itself is anyway very uncertain…
People do not care assuming that NS mass > 1.2 Msun
Very uncertain “by definition”
June 2, 2018 @ Osaka
13
Bayesian Analysis
(Bayes’ theorem)
B : M-R Observation A : EoS Parameters
Want to know Normalization Likelihood Calculable by TOV prior Model Model must be assumed. EoS parametrization must be introduced. Integration in parameter space must be defined.
If infinite observations, prior dependence should be gone.
June 2, 2018 @ Osaka
14
Raithel-Ozel-Psaltis (2017) Mock data (SLy + Noises) Prior Dep. Black curve True EoS Magenta curve Guessed EoS Gray band 68% credibility
June 2, 2018 @ Osaka
15
Several M-R
with errors Several parameters to characterize EoS
Nonlinear Mapping
{Mi, Ri} {Pi}
{Pi} = F({Mi, Ri})
~ 5 Points ~ 15 Points (observations)
Too precise parametrization of EoS is useless (beyond the uncertainty from observations)
Bayesian Analysis Supervised Learning
June 2, 2018 @ Osaka
16
x1=x1
(0)
x2=x2
(0)
x3=x3
(0)
xN=xN
(0)
x1
(2)
x2
(2)
x3
(2)
x4
(2)
x1
(L)=y1
x2
(L)=y2
x3
(L)=y3
xM
(L)=yM
x(k+1)
i
= σ(k+1) Nk X
i=1
W (k+1)
ij
x(k)
j
+ a(k+1)
i
!
{Mi, Ri} {Pi}
Parameters to be tuned
sigmoid func.
clude the sigmoid f LU σ(x) = max{0, x}, h
ReLU
(e + 1), the ReLU t σ(x) = tanh(x),
tanh
Backpropagation
June 2, 2018 @ Osaka
17
Layer index Nodes Activation 1 30 N/A 2 60 ReLU 3 40 ReLU 4 40 ReLU 5 5 tanh
Our Neural Network Design Probably we don’t need such many hidden layers and such many nodes… anyway, this is one working example…
June 2, 2018 @ Osaka
18
For good learning, the “textbook” choice is important… Training data (200000 sets in total) Randomly generate 5 sound velocities → EoS × 2000 sets Solve TOV to identify the corresponding M-R curve Randomly pick up 15 observation points × (ns = 100) sets The machine learns the M-R data have error fluctuations Validation data (200 sets) Generate independently of the training data
(with ∆M = 0.1M, ∆R = 0.5 km)
June 2, 2018 @ Osaka
19
With fluctuations in the training data, the learning goes quickly
“Loss Function" = deviation from the true answers (msle) Monotonically decrease for the training data, but not necessarily so for the validation data
Once the over-fitting occurs, the model becomes more stupid…
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 1 10 100 1000 10000 Loss function (msle) Epochs ns = 100 ns = 1
June 2, 2018 @ Osaka
20
Test with the validation data (parameters not optimized to fit the validation data)
Fujimoto-Fukushima-Murase (2017)
: randomly generated original EoS : reconstructed EoS and associated M-R Two Typical Examples (not biased choice)
10-2 10-1 100 0.1 0.2 0.5 1 p / ρc2 ρc2 [GeV/fm3]
1 2 3 10 12 14 16 M / M⊙ R [km]
June 2, 2018 @ Osaka
21
Overall performance test
Mass (M) 0.6 0.8 1.0 1.2 1.4 1.6 1.8 RMS (km) 0.16 0.12 0.10 0.099 0.11 0.11 0.12
(with ∆M = 0.1M, ∆R = 0.5 km)
Very promising!
Credibility estimate has not been done for simplicity, but it can be included in the learning process.
22
Bayesian or NN, which to choose?
June 2, 2018 @ Osaka
23
y θ := {c2
s,i}
EoS
D|θ) for each EoS.
fMAP(D) = arg max
θ
[Pr(θ) Pr(D|θ)]
Bayesian NN
h`[f]i = Z dθdD Pr(θ) Pr(D|θ)`(θ, f(D))
minimizes
Pr(θ|D)
<latexit sha1_base64="tX4yBQlgRSFv2wqlUxTw0zZmEAI=">ACjnichVG/TxRBFP5YUfH8waKNCc3GCwayxti4CAxEqWgPMADEpZcZpeB2zD7I7tzl5zr/QM2lBRWkhBjqGmxsPEfsOBPMJaY2Fj4du+MsUDfZOZ9873vXkz4yU6yAzRxYh1bfT6jZtjtyq379y9N25P3N/I4k7q6Yf6zjd8mSmdBCpgmMVltJqmToabXpHbwo4ptdlWZBHL0vUTthHI/CvYCXxqmWvZM7qah0j7064X692sF7LXdNWRvZfu6E0bV/qfLk/07KrVCMiIYRTADE/RwWFuqzou6IsRWxdAasf0eLnYRw0cHIRQiGMYaEhmPbQgQEuZ2kDOXMgrKuEIfFdZ2OEtxhmT2gNd93m0P2Yj3Rc2sVPt8iuaZstLBFH2hD3RJn+mUvtLPK2vlZY2ilx57b6BVSWv8zcP1H/9VhewN2n9U/+zZYA/1steAe09KpriFP9B3Xx1dri+uTeWP6Zi+cf/v6I+8Q2i7nf/ZFWtvUWFP+D3KztXg43ZmqCaWH1SXo+/IoxTOIRpvm957GEFTQ5HMPcYZzfLRsa856aj0bpFojQ80D/GXWyi+WfJrJ</latexit><latexit sha1_base64="tX4yBQlgRSFv2wqlUxTw0zZmEAI=">ACjnichVG/TxRBFP5YUfH8waKNCc3GCwayxti4CAxEqWgPMADEpZcZpeB2zD7I7tzl5zr/QM2lBRWkhBjqGmxsPEfsOBPMJaY2Fj4du+MsUDfZOZ9873vXkz4yU6yAzRxYh1bfT6jZtjtyq379y9N25P3N/I4k7q6Yf6zjd8mSmdBCpgmMVltJqmToabXpHbwo4ptdlWZBHL0vUTthHI/CvYCXxqmWvZM7qah0j7064X692sF7LXdNWRvZfu6E0bV/qfLk/07KrVCMiIYRTADE/RwWFuqzou6IsRWxdAasf0eLnYRw0cHIRQiGMYaEhmPbQgQEuZ2kDOXMgrKuEIfFdZ2OEtxhmT2gNd93m0P2Yj3Rc2sVPt8iuaZstLBFH2hD3RJn+mUvtLPK2vlZY2ilx57b6BVSWv8zcP1H/9VhewN2n9U/+zZYA/1steAe09KpriFP9B3Xx1dri+uTeWP6Zi+cf/v6I+8Q2i7nf/ZFWtvUWFP+D3KztXg43ZmqCaWH1SXo+/IoxTOIRpvm957GEFTQ5HMPcYZzfLRsa856aj0bpFojQ80D/GXWyi+WfJrJ</latexit><latexit sha1_base64="tX4yBQlgRSFv2wqlUxTw0zZmEAI=">ACjnichVG/TxRBFP5YUfH8waKNCc3GCwayxti4CAxEqWgPMADEpZcZpeB2zD7I7tzl5zr/QM2lBRWkhBjqGmxsPEfsOBPMJaY2Fj4du+MsUDfZOZ9873vXkz4yU6yAzRxYh1bfT6jZtjtyq379y9N25P3N/I4k7q6Yf6zjd8mSmdBCpgmMVltJqmToabXpHbwo4ptdlWZBHL0vUTthHI/CvYCXxqmWvZM7qah0j7064X692sF7LXdNWRvZfu6E0bV/qfLk/07KrVCMiIYRTADE/RwWFuqzou6IsRWxdAasf0eLnYRw0cHIRQiGMYaEhmPbQgQEuZ2kDOXMgrKuEIfFdZ2OEtxhmT2gNd93m0P2Yj3Rc2sVPt8iuaZstLBFH2hD3RJn+mUvtLPK2vlZY2ilx57b6BVSWv8zcP1H/9VhewN2n9U/+zZYA/1steAe09KpriFP9B3Xx1dri+uTeWP6Zi+cf/v6I+8Q2i7nf/ZFWtvUWFP+D3KztXg43ZmqCaWH1SXo+/IoxTOIRpvm957GEFTQ5HMPcYZzfLRsa856aj0bpFojQ80D/GXWyi+WfJrJ</latexit><latexit sha1_base64="tX4yBQlgRSFv2wqlUxTw0zZmEAI=">ACjnichVG/TxRBFP5YUfH8waKNCc3GCwayxti4CAxEqWgPMADEpZcZpeB2zD7I7tzl5zr/QM2lBRWkhBjqGmxsPEfsOBPMJaY2Fj4du+MsUDfZOZ9873vXkz4yU6yAzRxYh1bfT6jZtjtyq379y9N25P3N/I4k7q6Yf6zjd8mSmdBCpgmMVltJqmToabXpHbwo4ptdlWZBHL0vUTthHI/CvYCXxqmWvZM7qah0j7064X692sF7LXdNWRvZfu6E0bV/qfLk/07KrVCMiIYRTADE/RwWFuqzou6IsRWxdAasf0eLnYRw0cHIRQiGMYaEhmPbQgQEuZ2kDOXMgrKuEIfFdZ2OEtxhmT2gNd93m0P2Yj3Rc2sVPt8iuaZstLBFH2hD3RJn+mUvtLPK2vlZY2ilx57b6BVSWv8zcP1H/9VhewN2n9U/+zZYA/1steAe09KpriFP9B3Xx1dri+uTeWP6Zi+cf/v6I+8Q2i7nf/ZFWtvUWFP+D3KztXg43ZmqCaWH1SXo+/IoxTOIRpvm957GEFTQ5HMPcYZzfLRsa856aj0bpFojQ80D/GXWyi+WfJrJ</latexit>Approximated estimated → Baysian
NN allows for more general choice of loss functions. Baysian assumes parametrized likelihood functions.
June 2, 2018 @ Osaka
24
Developing a toolkit for real data like not discrete data with error, but regions of credibility Error analysis (credibility estimate) in the output side