branes with brains
play

Branes with Brains Reinforcement learning in the landscape of - PowerPoint PPT Presentation

Branes with Brains Reinforcement learning in the landscape of intersecting brane worlds F ABIAN R UEHLE (U NIVERSITY OF O XFORD ) String_Data 2017, Boston 11/30/2017 Based on [work in progress] with Brent Nelson and Jim Halverson Motivation -


  1. Branes with Brains Reinforcement learning in the landscape of intersecting brane worlds F ABIAN R UEHLE (U NIVERSITY OF O XFORD ) String_Data 2017, Boston 11/30/2017 Based on [work in progress] with Brent Nelson and Jim Halverson

  2. Motivation - ML ‣ Three approaches to machine learning: • Supervised Learning: Train the machine by telling it what to do ✦ • Unsupervised Learning: Let the machine train without telling it what to do ✦ • Reinforcement Learning: [Sutton, Barto ’98 ’17] Based on behavioral psychology ✦ Don’t tell the machine exactly what to do but reward “good” and/or ✦ punish “bad” actions AI = reinforcement learning + deep (neural networks) learning ✦ [Silver ’16]

  3. Motivation - RL ‣ Agents interact with an environment (e.g. string landscape) ‣ Each interaction changes the state of the agent, e.g. the dof’s parameterizing the string vacuum ‣ Each step is either rewarded (action lead to a more realistic vacuum) or punished (action lead to a less realistic vacuum) ‣ The agent acts with the aim of maximizing its long-term reward ‣ Agent repeats actions until it is told to stop (found a realistic vacuum or give up)

  4. Outline ‣ String Theory setup: • Intersecting D6-branes on orbifolds of toroidal orientifolds ‣ Implementation in Reinforcement Learning (RL) • Basic overview • Implementing the RL code • Modelling the environment ‣ Preliminary results • Finding consistent solutions ‣ Conclusion

  5. String Theory 101 Intersecting D6-branes on orbifolds of toroidal orientifolds

  6. String Theory 101 ‣ Have: IIA String Theory in 9D + time with 32 supercharges ‣ Want: A Theory in 3D + time with 4 supercharges ‣ Idea: Make extra 6D so small that we do not see them ‣ How do we do that? 1. Make them compact 2. Make their diameter so small that our experiments cannot 
 detect them ‣ Reduce supercharges from 32 to 4: • Identify some points with their mirror image

  7. String Theory 101 - Setup ‣ Why this setup? • Well studied [Blumenhagen,Gmeiner,Honecker,Lust,Weigand '04'05; Douglas, Taylor '07, ...] • Comparatively simple [Ibanez, Uranga ’12] • Number of (well-defined) solutions known to be finite: [Douglas, Taylor ’07] Use symmetries to relate different vacua ✦ Combine consistency conditions to rule out combinations ✦ • BUT: Number of possibilities so large that not a single “interesting” solution could be found despite enormous random scans (estimated to 1:10 9 ) • Seems Taylor-made for big data / AI methods

  8. String Theory 101 - Compactification ‣ How to make a dimension compact? Pacman ⇒

  9. String Theory 101 - Compactification ‣ How to make a dimension compact? Pacman ⇒

  10. String Theory 101 - Compactification ‣ How to make a dimension compact? Pacman ⇒

  11. String Theory 101 - Compactification ‣ How to make a dimension compact? Pacman ⇒

  12. String Theory 101 - Compactification y 1 y 2 y 3 x 1 x 2 x 3 ‣ Now six compact dimensions, but idea too simple ‣ Resulting space too simple (but just a little bit) ‣ Make it a bit more complicated

  13. String Theory 101 - Orbifolds y 1 T 2 / Z 2 T 2 x 1 ‣ Mathematically: ( x 1 , y 1 ) → ( − x 1 , − y 1 ) ‣ Resulting object is called an orbifold ‣ Need to also orientifold: 
 ( x 1 , y 1 ) → ( x 1 , − y 1 ) (plus something similar for the string itself)

  14. String Theory 101 - Winding numbers ( n, m ) = (1 , 2) ( n, m ) = (1 , 0) ( n, m ) = (0 , 1) ( n, m ) Winding numbers : ( n, m ) , ( n, − m ) Note: Due to orientifold: include

  15. String Theory 101 - D6 branes T 2 T 2 T 2 y 1 y 2 y 3 3D x 1 x 2 x 3 ‣ D6 brane: our 3D + a line on each torus ‣ Can stack multiple D6 branes on top of each other ‣ Brane stacks Tuple: ( N, n 1 , m 1 , n 2 , m 2 , n 3 , m 3 ) ⇔

  16. String Theory 101 - Gauge group and particles ‣ Observed gauge group: 
 SU (3) × SU (2) × U (1) Y D6 branes on top of each other 
 U ( N ) : N Special cases: • D6 branes parallel to O6-plane SO (2 N ) : N • D6 branes orthogonal to O6-plane Sp ( N ) : N ‣ Intersection of -brane and -brane stack: 
 N M Particles in representation ( N, M ) 1 , − 1 ‣ Observed particles in the universe: 3 × (3 , 2) 1 + 3 × (3 , 1) − 4 + 3 × (3 , 1) 2 + Quarks 4 × (1 , 2) − 3 + 1 × (1 , 2) 3 + 3 × (1 , 1) 6 Leptons + Higgs

  17. String Theory 101 - MSSM T 2 T 2 T 2 y 1 y 2 y 3 3D x 1 x 2 x 3 ‣ Green and yellow intersect in points 3 · 1 · 1 = 3 ‣ Note: Counting intersections on the orbifold a bit more subtle

  18. String Theory 101 - Consistency ‣ Tadpole cancellation: Balance energy of D6 and O6: N a n a 0 1 0 1 1 n a 2 n a 8 3 #stacks − N a n a 1 m a 2 m a 4 X B C B C 3 A = B C B C − N a m a 1 n a 2 m a 4 @ @ A 3 a =1 − N a m a 1 m a 2 n a 8 3 ‣ K-Theory: Global consistency: 0 1 0 1 0 1 2 N a m a 2 0 1 m a 2 m a 3 #stacks 2 0 − N a m a 1 n a 2 n a X B C B C B C 3 A mod A = − N a n a B C B C B C 2 0 1 m a 2 n a @ @ @ A 3 a =1 − 2 N a n a 2 0 1 n a 2 m a 3

  19. String Theory 101 - Consistency ‣ SUSY (computational control): ∀ a = 1 , . . . , # stacks m a 1 m a 2 m a 3 − j m a 1 n a 2 n a 3 − k n a 1 m a 2 n a 3 − ` n a 1 n a 2 m a 3 = 0 n a 1 n a 2 n a 3 − j n a 1 m a 2 m a 3 − k m a 1 n a 2 m a 3 − ` m a 1 m a 2 n a 3 > 0 ‣ Pheno: + particles SU (3) × SU (2) × U (1) ‣ is iff: T = ( T 1 , T 2 , . . . , T k ) , k = # U ( N ) stacks U (1)   T 1  2 N k m k    2 N 1 m 1 2 N 2 m 2 0 T 2 · · · 1 1 1   2 N k m k 2 N 1 m 2 2 N 2 m 2  = 0   .  · · · ·    1 2 2 .   . 2 N k m k 2 N 1 m 2 2 N 2 m 2 0  · · · 3 3 3 T k

  20. String Theory 101 - IIA state space ‣ State space gigantic • Choose a maximal value for winding number w max • Let be the number of possible winding number N B combinations (up to ) after symmetry reduction w max • Let be the maximal number of stacks N S ✓ N B ◆ • Allows for combinations N S • Note: Each stack can have branes N = 1 , 2 , 3 , . . .

  21. Reinforcement learning

  22. Reinforcement learning - Overview ‣ At time , agent in state s t ∈ S total t ‣ Select action from action space based on policy 
 A a t π π : S total 7! A ‣ Receive reward for action based on reward r t ∈ R a t function R, R : S total × A → R ‣ Transition to the next state s t +1 ∞ X γ k r t + k ‣ Try to maximize long-term return , γ ∈ (0 , 1] G t = k =1 ‣ Keep track of state value (“how good is the state”) v ( s ) ‣ Compute advantage estimate 
 Adv = r − v (“how much better than expected has the action turned out to be”)

  23. Reinforcement Learning - Overview ‣ How to maximize future return? • Depends on policy π ‣ Several approaches • Tabular (small state/action spaces): [Sutton, Barto ’98] Temporal difference learning ✦ my breakout group on Friday ⇒ ✦ SARSA Q-learning ✦ • Deep RL (large/infinite state/action spaces): [Mnih et al ’15] ✦ Deep Q-Network [Mnih et al ’16] Asynchronous advantage actor-critic (A3C) ✦ Variations/extensions: Wolpertinger [Dulac-Arnold et al ’16], Rainbow [Hessel et al '17] ✦

  24. Reinforcement Learning - A3C Global instance Policy Value Network Input … Worker 1 Worker 2 Worker n Policy Value Policy Value Policy Value Network Network Network Input Input Input Environment Environment Environment

  25. Reinforcement Learning - A3C ‣ Asynchronous : Have n workers explore the environment simultaneously and asynchronously • improves training stability (experience of workers separated) • improves exploration ‣ Advantage : Use advantage to update policy ‣ Actor-critic : To maximize return need to know state or action value and optimize policy. Methods like Q-learning focuses on value function • Methods like policy-gradient focus on policy • AC: Use value estimate (“critic”) to update policy (“actor”) •

  26. Reinforcement Learning - Implementation ‣ Open AI Gym: Interface between agent (RL) and environment (string landscape) [Brockman et al '16] We provide the environment • We use ChainerRL’s implementation of A3C for the agent • step Environment Chainer RL ✦ method ✦ action space make (A3C,DQN,…) ✦ observation (state) env reset ✦ NN architecture 
 space (FF, LSTM,…) ‣ step: ‣ make environment • go to new state ‣ specify RL method (A3C) • return (new_state, reward, done, comment) ‣ specify policy NN (FF,LSTM) ‣ reset: • reset episode • return start_state

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend