dualities in and from machine learning
play

Dualities in and from Machine Learning Sven Krippendorf Deep - PowerPoint PPT Presentation

Dualities in and from Machine Learning Sven Krippendorf Deep Learning and Physics 2019 Yukawa Institute for Theoretical Physics, Kyoto October 31 st 2019 1 Spend 2 more slides on Current ML applications in high energy 2


  1. Dualities in and from Machine Learning Sven Krippendorf 
 Deep Learning and Physics 2019 
 Yukawa Institute for Theoretical Physics, Kyoto October 31 st 2019 � 1

  2. Spend 2 more slides on Current ML applications 
 in high energy � 2

  3. 
 
 
 Improving sensitivity Day, SK 1907.07642 Cluster B • ML-techniques heavily used in experimental bounds. AGN • Brief example: Improving sensitivity for ultra-light axion- axion 
 like particles, compared to previous bounds. conversion • ML algorithms good at classification. Detecting particles is a classification problem. Our classifiers: 
 | γ ( E ) i ! α | γ ( E ) i + β | a ( E ) i Spectrum 0 (no axions) Classifier γ 1 (axions) galactic absorption • Training: Simulate data with and without axions for appropriate X-ray sources • Bounds: Compare fake & real data performance • Algorithms (sklearn): decision trees, boosted decision trees, random forests, Gaussian Naive Bayes, Gaussian Process classifier, SVM, … X-ray telescope Previous bounds: 
 NGC1275: 1605.01043, Other sources: 1704.05256, Athena bounds: 1707.00176 with: Conlon, Day, Jennings; Berg, Muia, Powell, Rummel � 3

  4. NGC1275: 1605.01043 Constraining ALPs Other sources: 1704.05256 Athena bounds: 1707.00176 with: Conlon, Day, Jennings, Rummel; Berg, Muia, Powell • Photon-axion interconversion in background magnetic fields: 
 g a γγ 4 aF ˜ ℒ ⊃ − F = g a γγ E ⋅ B • One interesting parameter region can be obtained for photons from sources in and behind galaxy cluster magnetic fields. 
 ✓ B ⊥ ⌘ ✓ 10 − 3 cm − 3 ◆ ✓ 10 11 GeV ◆ ⇣ ◆ ω Θ = 0 . 28 1 µ G 1 keV n e M Θ 2 P γ → a = 1 1 + Θ 2 sin 2 ⇣ ⌘ p ∆ 1 + Θ 2 ⌘ ✓ ◆ ✓ 1keV ◆ n e L ⇣ ∆ = 0 . 54 2 10 − 3 cm − 3 10kpc ω sweet spot P γ → a 0.25 no oscillations 0.20 0.15 too rapid oscillations 0.10 suppressed conversion 0.05 ω [ keV ] 0.010 0.100 1 10 100 � 4

  5. Improving sensitivity Picture: spectral distortui Picture: bounds overview • Data: Chandra X-ray observations of bright point sources Table: our results (AGN, Quasar) in or behind galaxy clusters • Bounds for ALPs with m<10 -12 eV due to absence of characteristic spectral modulations caused by interconversion between photons and axions in cluster background magnetic field DTC GaussianNB QDA RFC Previous AB 10 -6 x 10 -12 GeV -1 Telescopes VMB LSW (OSQAR) (PVLAS) C 10 -8 A1367 (resid.) 1.9 - - - - 2.4 Axion Coupling |G A γγ | (GeV -1 ) Helioscopes (CAST) 10 -10 A1367 (up-resid.) 2.0 - 1.9 - - 2.4 Horizontal Branch Stars SN 1987A HESS NGC1275 - Chandra Fermi-LAT 10 -12 A1795 Quasar (resid.) - - 1.7 - 1.4 >10.0 NGC1275 - Athena Haloscopes (ADMX) 10 -14 KSVZ A1795 Quasar (up-resid.) - - - - - >10.0 DFSZ 10 -16 A1795 Sy1 (resid.) 1.0 0.8 1.2 1.1 0.7 1.5 10 -18 10 -30 10 -25 10 -20 10 -15 10 -10 10 -5 10 0 A1795 Sy1 (up-resid.) 1.1 1.1 1.1 1.0 0.8 1.5 Axion Mass m A (eV) � 5

  6. Many talks remove slides ML for the string landscape? many talks: Halverson, Ruehle, Shiu � 6

  7. Other avenues? � 7

  8. “Don’t ask what ML can do for you, ask what you can do for ML.” – Gary Shiu � 8

  9. Physics ⋂ ML � 9

  10. Dualities Betzler, SK: 191x.xxxxx � 10

  11. 
 
 The problem • Obtain correlators at high(er) accuracy: 
 think of this as properties of your data � ⟨ f ( ϕ i ) ⟩ • In physics (incl. condensed matter, holography, string/field theory), we often use clever data representations to evaluate correlators. • Multiple representations can be useful ➔ dualities • Dual representations are very good representations to evaluate certain correlation functions (mapping strongly coupled data products to weakly coupled data products). • Examples … � 11

  12. How are dualities useful in practice? 
 aka connecting physics questions to data questions � 12

  13. 
 
 
 Examples: dual representations • Discrete Fourier transformation: 
 n x k = 1 ∑ p j e 2 π ijk / n n j =1 n ∑ x j e − 2 π ijk / n p k = j =1 • Is there a signal under the noise? � 13

  14. Examples: dual representations • 2D Ising model: high - low temperature self-duality Original Dual H = − J ∑ σ i σ j H = − J ∑ s i s j T critical ⟨ i , j ⟩ ⟨ i , j ⟩ Z = ∑ e − ˜ Z = ∑ e − β H ( s ) β H ( σ ) 1 β = − 1 ˜ β = 2 log tanh β k B T Ordered rep. ↔ Disordered rep. Position space? Momentum space? Krammers, Wannier 1941; Onsager 1943; review: Savit 1980 � 14

  15. Which data problem? • Some correlation function which is easier evaluated on dual variables. 
 ⟨ σ i σ j ⟩ , ⟨ E ( σ ) ⟩ , ⟨ M ( σ ) ⟩ • Can we classify the temperature for low-temperature configurations? Which temperature is a sample drawn from (at low temperatures)? They look rather similar. How about in the dual rep.? Replace Images � 15

  16. 
 
 
 
 
 
 Data question on Ising • But at the dual temperatures, our data takes a di ff erent shape: 
 Duality • It is easier to classify temperature of a low-temperature configuration in the dual representation … • How come? P ( σ ) = e ˜ E / ˜ P ( s ) = e E / T T , Z Z ⟨Δ E ⟩ ≪ ⟨Δ ˜ Δ T ≪ Δ ˜ E ⟩ T � 16

  17. 
 
 
 
 
 
 
 
 Data question on Ising • Let’s look at the overlap of energy distributions in finite size samples 
 Original variables Let’s check for performance. Dual variables � 17

  18. Change figures Ising: simple network • Let’s confirm this at simple networks • Side remark: way outperforming standard sklearn classifiers � 18

  19. 
 
 Example: dual representations • 1D Ising with multiple spin interactions: 
 N − n +1 n − 1 N normal: � (here: � ) 
 ∑ ∏ ∑ B = 0 H ( s ) = − J s k + l − B s k k =1 l =0 k =1 N − n +1 n − 1 dual: � where � ∑ ∏ H ( σ ) = − J σ k σ k = s k + l k =1 l =0 s N = 10 + + n = 3 σ + + Ghost spins (fixed value) • Two data questions: 1) energy of spin configuration 
 2) metastable configuration cf. 1605.05199 � 19

  20. 
 
 
 
 
 
 
 
 Example: dual representations • Two data questions: 1) energy of spin configuration 
 2) metastable configuration 
 - - + - - - + - - + s N = 10 + + n = 3 + + + - + + + + - + σ + + Energy metastable Add evaluation of metastability on dual and normal variables stable configuration cf. 1605.05199 � 20

  21. Example: dual representations • Performance on Normal variables metastability classification n = 4 n = 5 n = 8 n = 9 n = 12 6 · 10 2 0 . 9113 0 . 8688 0 . 8788 0 . 8813 0 . 8803 (single hidden layer) 3 · 10 3 0 . 9243 0 . 9215 0 . 9223 0 . 9295 − • Deeper networks or CNNs 9 . 5 · 10 3 0 . 9424 0 . 9475 0 . 9739 − − perform better to a certain Dual variables degree but at large N or n n = 4 n = 5 n = 8 n = 9 n = 12 6 · 10 2 0 . 9911 0 . 9783 0 . 9819 0 . 9855 0 . 9909 show the same feature. 3 · 10 3 0 . 9958 0 . 9977 0 . 9994 1 . 0000 − 9 . 5 · 10 3 1 . 0000 1 . 0000 1 . 0000 − − 3000 training 
 600 training 
 samples samples (b) 3000 training samples. � 21

  22. Upshot: dual representations simplify answering certain data questions 
 (i.e. simple networks sufficient) � 22

  23. Why interesting for ML? � 23

  24. 
 
 
 
 
 
 
 
 Why interesting for ML? • Finding good representations which allow to answer the data question is hard (if not impossible). 
 Normal Frame Dual Frame Data Question ✘ ✔ Neural 
 Neural 
 Neural 
 Network Network Network Input Data: 
 Output: Answer 
 Output Normal Frame to Data Question Dual Representation • In this talk we use physics examples. A generalisation to other data products/questions would be interesting. Here, the data question on the dual network can be addressed with very simple networks. � 24

  25. 
 
 
 
 
 
 
 DFT: simple network • Supervised learning task (binary classification): 
 N discrete values {(( x R , x I ), y )} {(( p R , p I ), y )} noise + signal noise y = 1 y = 0 • Network 
 Layer Shape Parameters For this network Conv1D (2000,2) 4 classification works in Activation (2000,2) - momentum space, 
 Dense 1 4001 but not in position space. Activation 1 - � 25

  26. 
 
 
 
 
 Utilising dual representation • Goal: improve performance on position space. • Deeper network? Can do the job in principle 
 [DFT can be implemented with a single dense layer] • However finding it dynamically is `impossible’ with standard optimisers, initialisations, and regularisers. 
 Layer Shape Parameters Dense (2000,2) 16000000 Random DFT Conv1D (2000,2) 4 starting point Activation (2000,2) - Dense 1 4001 Activation 1 - � 26

  27. Can we improve the situation by favouring dual representations? � 27

  28. How to utilise dualities? • Learn dual transformations when explicitly known (trivial) and use them as intermediate step in architecture. • Enforce dual representations via feature separation. • When not explicitly known, we can match features of distributions (example 2D Ising high-low-temperature duality) • Re-obtaining “dualities” (1D Ising) by demanding good performance on medium-hard correlation in intermediate layer where no loss of information is present (beyond known dualities). � 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend