Machine learning for bounce calculation
Based on 1805.12153 2018/12/10 @ ICTP workshop on machine learning landscape
01 / 29
Ryusuke Jinno (IBS-CTPU)
Machine learning for bounce calculation Ryusuke Jinno (IBS-CTPU) - - PowerPoint PPT Presentation
Machine learning for bounce calculation Ryusuke Jinno (IBS-CTPU) Based on 1805.12153 2018/12/10 @ ICTP workshop on machine learning landscape 01 / 29 SELF INTRODUCTION Ryusuke ( ) Jinno ( ) - 2016/3 : Ph.D. @ Univ. of Tokyo -
Based on 1805.12153 2018/12/10 @ ICTP workshop on machine learning landscape
01 / 29
Ryusuke Jinno (IBS-CTPU)
Ryusuke Jinno / 29
1805.12153
02
Ryusuke (隆介) Jinno (神野)
Research interests & recent works
Hillclimbing Higgs inflation (new realization of Higgs inflation with hillclimbing scheme)
Ryusuke Jinno / 29
1805.12153
02
Ryusuke (隆介) Jinno (神野)
Research interests & recent works
Hillclimbing Higgs inflation (new realization of Higgs inflation with hillclimbing scheme)
Ryusuke Jinno / 29
1805.12153
Ryusuke Jinno / 29
1805.12153
[ https://www.slideshare.net/awahid/big-data-and-machine-learning-for-businesses ]
03
Ryusuke Jinno / 29
1805.12153 [ https://www.slideshare.net/awahid/big-data-and-machine-learning-for-businesses ]
Can I apply this technique to problems in high-energy physics?
03
Ryusuke Jinno / 29
1805.12153
Terminology?
[ J. McCarthy ]
Machines that can perform tasks that are characteristic of human intelligence A way of achieving AI: lerning without being explicitly programmed
[ https://medium.com/iotforall/the-difference-between-artificial-intelligence-machine-learning-and-deep-learning-3aa67bff5991 ] [ https://en.wikipedia.org/wiki/Artificial_intelligence ] Note : the speaker’s major is not machine learning !! e.g.
Artificial intelligence (AI) Machine learning (ML) Neural network (NN) Deep neural network Deep learning Machine learning with artificial neurons (→ later) Neural network with deep (= many) layers of neurons
x1 xn x2 ... w1 w2 wn
04
Ryusuke Jinno / 29
1805.12153
Linear problem
: # of "discount" in the email x1 : # of "luxuary" in the email x2
05
: spam : not spam Your data Find such that a, b x2 = ax1 + b is the boundary
Question
Ryusuke Jinno / 29
1805.12153
Linear problem
05
: spam : not spam Your data Find such that a, b x2 = ax1 + b is the boundary
Question "Linear problem" → Good solution found easily Answer
: # of "luxuary" in the email x2 : # of "discount" in the email x1
Ryusuke Jinno / 29
1805.12153
Nonlinear problem
06
: spam : not spam Your data Find such that a, b x2 = ax1 + b is the boundary
Question
: # of "luxuary" in the email x2 : # of "discount" in the email x1
Ryusuke Jinno / 29
1805.12153
Nonlinear problem
: spam : not spam Your data Find such that a, b x2 = ax1 + b is the boundary
Question
06
: # of "luxuary" in the email x2 : # of "discount" in the email x1
Ryusuke Jinno / 29
1805.12153
Nonlinear problem
: spam : not spam Your data Find such that a, b x2 = ax1 + b is the boundary
Question
06
: # of "luxuary" in the email x2 : # of "discount" in the email x1
Ryusuke Jinno / 29
1805.12153
Nonlinear problem
: spam : not spam Your data Find such that a, b x2 = ax1 + b is the boundary
Question "Nonlinear problem" → No good solution Answer
06
: # of "luxuary" in the email x2 : # of "discount" in the email x1
Ryusuke Jinno / 29
1805.12153
Nonlinear problem
r θ With some effort, you may find & useful: r = q x2
1 + x2 2
θ = arctan(x2/x1)
07
Ryusuke Jinno / 29
1805.12153
Nonlinear problem
r θ With some effort, you may find & useful: r = q x2
1 + x2 2
θ = arctan(x2/x1) "Feature engineering"
(i.e. to capture nonlinearity) Good solution found, but...
07
Ryusuke Jinno / 29
1805.12153
Biological neuron
08
[ https://medium.com/autonomous-agents/ mathematical-foundation-for-activation-functions-in-artificial-neural-networks-a51c9dd7c089 ]
synapse axon synapse electric signal neuron
electric signal is sent to next neuron through axon
Ryusuke Jinno / 29
1805.12153
Artificial neuron mimics biological neuron
09
x1 x2 xn wn w1 w2 X xiwi f z
input weight nonlinear function
sum z = f ⇣X xiwi + b ⌘ f(y) y Diagramatic notation Equation f : ReLU (rectified linear unit) wi b : weight : bias
⇢
Ryusuke Jinno / 29
1805.12153
Neural network = network of artificial neurons
neuron neural network
10
Ryusuke Jinno / 29
1805.12153
Neural network = network of artificial neurons
11
x1 = f (W1xin + b1) xn = f (Wnxn−1 + bn) xout = WoutxN + bout
(2 ≤ n ≤ N)
f(y) = f(y1) f(y2) f(yn)
· · ·
Note1 : Wn
xn
bn
= matrix / = vector / = vector Note2 :
Wnxn−1 + bn = (Wn)ij(xn−1)j + (bn)i
Ryusuke Jinno / 29
1805.12153
How to train the neural network with "supervised learning"
b → b − α∂E ∂b
W → W − α ∂E ∂W
Note : there are more sophisticated algorithms, e.g. AdaGrad, Adam, ...
Error function E =
e.g.
E α : constant
) X
data
X
i:component
)i
b
W
Ryusuke Jinno / 29
1805.12153
Ability of neural network to capture nonlinearity
(xin, x(true)
) = (xin, xin(xin − 0.3)(xin − 0.6)(xin − 0.9))
0.2 0.4 0.6 0.8 1.0x
0.01 0.02 0.03 Pred 0.2 0.4 0.6 0.8 1.0x
0.01 0.02 0.03 Ans
xin xout x(true)
13
Ryusuke Jinno / 29
1805.12153
Ability of neural network to capture nonlinearity
(xin, x(true)
) = (xin, xin(xin − 0.3)(xin − 0.6)(xin − 0.9))
0.2 0.4 0.6 0.8 1.0x
0.01 0.02 0.03 Pred 0.2 0.4 0.6 0.8 1.0x
0.01 0.02 0.03 Ans
xin xout x(true)
13
Neural network is extremely useful in capturing nonlinearity
Ryusuke Jinno / 29
1805.12153
cat dog
machine
label for cat label for dog
14
Image classifier: nolinear relation btwn. input (= image) and output (= label)
Ryusuke Jinno / 29
1805.12153
0.8 0.1 0.7 Input layer Output layer
Note : precisely, output layer is log-odds log [P(cat)/P(dog)] Note : actual image recognition is not this simple, e.g. CNN
14
Image classifier: nolinear relation btwn. input (= image) and output (= label)
Ryusuke Jinno / 29
1805.12153
Ryusuke Jinno / 29
1805.12153
Ryusuke Jinno / 29
1805.12153
Gravitational waves (GWs) from BH & NS binaries have been detected
15
Tunneling & bubble dynamics in the early Universe also produce GWs Such GWs may be detected in (near) future
Pulsar timing arrays Space Ground 0.01-1Hz 10 Hz
2
10 Hz
[http://rhcole.com/apps/GWplotter/]
/ 29
Ryusuke Jinno / 29
1805.12153
Gravitational waves (GWs) from BH & NS binaries have been detected
15
Tunneling & bubble dynamics in the early Universe also produce GWs Such GWs may be detected in (near) future
Pulsar timing arrays Space Ground 0.01-1Hz 10 Hz
2
10 Hz
[http://rhcole.com/apps/GWplotter/]
/ 29
Ryusuke Jinno / 29
1805.12153
Quantum tunneling in vacuum in 1+3 dim.
Γ
with boundary conditions
d¯ φ dr (r = 0) = 0
¯ φ(r = ∞) = 0
,
Γ ∝ e−SE[ ¯
φ]
SE[¯ φ] = Z dtE Z d3x 1 2(∂E ¯ φ)2 + V (¯ φ)
−V
φ
rate Γ
¯ φ ¯ φ
Euclidean action of "bounce configuration" −V
φ
r = 0
r = ∞
16
[ Coleman '77 ]
d2 ¯ φ dr2 + 3 r d¯ φ dr − dV d¯ φ = 0
Ryusuke Jinno / 29
1805.12153
−V
φ
undershoot
Calculation of requires many times of iterations
¯ φ
Every time we have new particle physics setup, we re-calculate the EOM.
Note : there are many approaches, e.g.
17
[ Duncan et al. '92, Dutta et al. '12, Guada et al. '18 : Piecewise linear bounce ] [ Kusenko '95, Moreno et al. '98 : Improved action ] [ Konstandin et al. '06 : Damping injection ] [ Cline et al. '99, Wainwright '11 : Path deformation ] [ Espinosa '18 : Auxiliary potential ] [ Masoumi et al. '16 : Multiple shooting ]
Isn't it nonsense? Can we use machine lerning technique?
Ryusuke Jinno / 29
1805.12153
Potential is image
V
Then, the problem becomes image recognition
18
Ryusuke Jinno / 29
1805.12153
Potential is image
V
Then, the problem becomes image recognition
18
Ryusuke Jinno / 29
1805.12153
19
Machine-learning approach
cat-dog classifier does not have to recognize them as humans do
SE V
Ryusuke Jinno / 29
1805.12153
Ryusuke Jinno / 29
1805.12153
We use 3 classes of potentials C1-C3:
C1 C2 C3 Potential Bounce action
V (φ) =
7
X
n=1
a(1)
n φn+1
Class 1 (C1) :
V (φ) =
7
X
n=1
a(2)
n φ2n
Class 2 (C2) :
V (φ) = a(3)
1 φ2 + 7
X
n=2
a(3)
n φ2n−1
Class 3 (C3) :
a(i)
n
(→ backup)
20
Ryusuke Jinno / 29
1805.12153
We construct training & test & application dataset
e.g. 8,000 data from C1 2,000 data from C1 10,000 data from C1 Application t Test Training &
Step1 Step2
21
Ryusuke Jinno / 29
1805.12153
We construct training & test & application dataset
e.g. 8,000 data from C2 10,000 data from C2 Application t Test Training &
Step1 Step2
21
2,000 data from C2
Ryusuke Jinno / 29
1805.12153
We construct training & test & application dataset
e.g. 8,000 data from C3 10,000 data from C3 Application t Test Training &
Step1 Step2
21
2,000 data from C3
Ryusuke Jinno / 29
1805.12153
We construct training & test & application dataset
e.g. 24,000 data from C1+C2+C3 6,000 data from C1+C2+C3 30,000 data from C1+C2+C3 Application t Test Training &
Step1 Step2
21
Ryusuke Jinno / 29
1805.12153
We construct training & test & application dataset
e.g. 16,000 data from C2+C3 4,000 data from C2+C3 10,000 data from C1 Application t Test Training &
Step1 Step2
21
Ryusuke Jinno / 29
1805.12153
We construct training & test & application dataset
e.g. Application t Test Training &
Step1 Step2
21
16,000 data from C3+C1 4,000 data from C3+C1 10,000 data from C2
Ryusuke Jinno / 29
1805.12153
We construct training & test & application dataset
e.g. Application t Test Training &
Step1 Step2
21
16,000 data from C1+C2 4,000 data from C1+C2 10,000 data from C3
Ryusuke Jinno / 29
1805.12153
N = 2 We try a simple machine :
22
Ryusuke Jinno / 29
1805.12153
xin = ⇢
⇢ V (φsample)
16, · · · , 15 16
⇢ V 0(φsample)
16, · · · , 15 16
⇢ V 00(φsample)
16, · · · , 16 16
(xin)i ! (xin)i h(xin)ii σ(xin)i xout ! xout hxouti σxout
Note : implicit rescaling of input & output
V φ
Output : predicted value of logarithmic bounce action
xin xout hi σ & : mean & variance calculated over training & test dataset
4
23
Ryusuke Jinno / 29
1805.12153
Error function = how poorly the machine predicts
x(true)
= ln S(true)
4
xout = ln S(pred)
4
: predicted value of logarithmic bounce action : true value of logarithmic bounce action
Training = update of weights and biases using error function
Note : In the actual training we use a slightly more sophisticated algorithm Adam
E = 1 (# of data passed to the machine) X
data
b → b − α∂E ∂b
W → W − α ∂E ∂W
Ryusuke Jinno / 29
1805.12153
Mini-batch training
Implementation
training dataset € € € € € € € € € €
25
Ryusuke Jinno / 29
1805.12153
Mini-batch training
Implementation
training dataset € € € € € € € € € 1 update
25
Ryusuke Jinno / 29
1805.12153
Mini-batch training
Implementation
training dataset € € € € € € € € 2 update
25
Ryusuke Jinno / 29
1805.12153
Mini-batch training
Implementation
training dataset € € € € € € € 3 update
25
Ryusuke Jinno / 29
1805.12153
Mini-batch training
Implementation
training dataset 10 update = 1 epoch
25
Ryusuke Jinno / 29
1805.12153
Case A : 1 class for training & test & application
Training : 8,000 data from C1 Test : 2,000 data from C1 Application : 10,000 data from C1 Average of 10 times trial Scatter plot for machine’s performance /
26
Ryusuke Jinno / 29
1805.12153
Case A : 1 class for training & test & application
Training : 8,000 data from C2 Test : 2,000 data from C2 Application : 10,000 data from C2 Average of 10 times trial Scatter plot for machine’s performance /
26
Ryusuke Jinno / 29
1805.12153
Case A : 1 class for training & test & application
Training : 8,000 data from C3 Test : 2,000 data from C3 Application : 10,000 data from C3 Average of 10 times trial Scatter plot for machine’s performance /
26
Ryusuke Jinno / 29
1805.12153
Case B : mixture of 3 classes
Training : 24,000 data from C1+C2+C3 Test : 6,000 data from C1+C2+C3 Application : 30,000 data from C1+C2+C3 Average of 10 times trial Scatter plot for machine’s performance /
26
Ryusuke Jinno / 29
1805.12153
Case C : training & test over 2 classes / application to the other 1 class
Average of 10 times trial Scatter plot for machine’s performance Training : 16,000 data from C2+C3 Test : 4,000 data from C2+C3 Application : 10,000 data from C1 /
26
Ryusuke Jinno / 29
1805.12153
Average of 10 times trial Scatter plot for machine’s performance
26
Training : 16,000 data from C3+C1 Test : 4,000 data from C3+C1 Application : 10,000 data from C2 /
Case C : training & test over 2 classes / application to the other 1 class
Ryusuke Jinno / 29
1805.12153
Average of 10 times trial Scatter plot for machine’s performance
26
Training : 16,000 data from C1+C2 Test : 4,000 data from C1+C2 Application : 10,000 data from C3 /
Case C : training & test over 2 classes / application to the other 1 class
Ryusuke Jinno / 29
1805.12153
How much precision can we expect in practical use? How much is the speedup?
Potential shapes in particle physics are not that many → If we train with such potentials, the resulting precision will be C1+C2+C3 or better
[Guada et al. '18, "Polygonal bounces" (private communication)]
while after training it takes O(10 ) sec to calculate the bounce
27
Ryusuke Jinno / 29
1805.12153
Generalizations?
2 dim. : convolutional neural network (CNN) may help ML may be used for 1 dim. part in existing multidimensional public codes e.g. 1) 2) ML may also be used for "initial position suggestor" in such public codes 3) by identifying the output as the initial position
28
Ryusuke Jinno / 29
1805.12153
Generalizations?
input hidden1 input hidden1 input hidden1
2 dim. : convolutional neural network (CNN) may help ML may be used for 1 dim. part in existing multidimensional public codes e.g. 1) 2) ML may also be used for "initial position suggestor" in such public codes 3) by identifying the output as the initial position
28
Ryusuke Jinno / 29
1805.12153
Generalizations?
input hidden1 input hidden1 input hidden1
2 dim. : convolutional neural network (CNN) may help ML may be used for 1 dim. part in existing multidimensional public codes e.g. 1) 2) ML may also be used for "initial position suggestor" in such public codes 3) by identifying the output as the initial position
28
Ryusuke Jinno / 29
1805.12153
Generalizations?
input hidden1 input hidden1 input hidden1 input hidden1 input hidden1 input hidden1
2 dim. : convolutional neural network (CNN) may help ML may be used for 1 dim. part in existing multidimensional public codes e.g. 1) 2) ML may also be used for "initial position suggestor" in such public codes 3) by identifying the output as the initial position
28
Ryusuke Jinno / 29
1805.12153
Calculation of quantities from scalar potential can be regarded as image recognition problem We proposed using machine learning technique for such calculations, and demonstrated its usefulness in one-dim. transition We explained possible ideas for generalization to multi-dimensional transitions
29
Ryusuke Jinno / 29
1805.12153
V 00
Vmax
10−2 ≤ Vmax ≤ 10−0.5
Random seeds generation (Vmax, φ0, φ1−, φ1+, φ2)
[0, 1]
φ1+ < φ0 < φ2 < φ1− φ1+ < φ2 < φ0 < φ1−
V
V 0
Added to data if there is no local maximum/minimum other than φ = φ0, 0, 1 Coefficients are determined so that
a(i)
n
φ = 1 ⇢
minimum or or
φ = 0
Vmax
⇢
@ maximum
φ = φ0
−1
φ = φ1+ φ = φ1−
minimum @ maximum
⇢ ⇢ ⇢ ⇢ ⇢ ⇢