Machine learning, incomputably large data sets, and the string - - PowerPoint PPT Presentation

machine learning incomputably large data sets and the
SMART_READER_LITE
LIVE PREVIEW

Machine learning, incomputably large data sets, and the string - - PowerPoint PPT Presentation

Machine learning, incomputably large data sets, and the string landscape 2017 Workshop on Data Science and String Theory Northeastern University December 1, 2017 Washington (Wati) Taylor, MIT Based in part on arXiv: 1510.04978, 1511.03209,


slide-1
SLIDE 1

Machine learning, incomputably large data sets, and the string landscape

2017 Workshop on Data Science and String Theory Northeastern University December 1, 2017 Washington (Wati) Taylor, MIT

Based in part on arXiv: 1510.04978, 1511.03209, 1710.11235, written in collaboration with Y. Wang

  • W. Taylor

Machine learning and the string landscape 1 / 15

slide-2
SLIDE 2

Outline

  • 1. Comments on problems for machine learning and in the landscape
  • 2. The “skeleton” of the F-theory landscape: a very large graph
  • W. Taylor

Machine learning and the string landscape 2 / 15

slide-3
SLIDE 3

Some comments on problems for machine learning and in the landscape Some personal reflections ∼ 30 years ago I worked for a company named “Thinking Machines”. Goal: “Build a machine that would be proud of us” (D. Hillis) Company built a machine with 64k parallel processes; 1997 →∼ 100 Gflops Richard Feynman designed the communication network/routing system

  • W. Taylor

Machine learning and the string landscape 3 / 15

slide-4
SLIDE 4

As a side project, I worked on “evolving” neural network like systems; ultimate goal: play go, learning from scratch Difficult problem, not enough computer power; I went to grad school to learn string theory. Stopped following AI in any detail. AlphaGo Zero astonished me!

  • 1. Computers are much faster, clearly

But also, I suspect:

  • 2. Humans are even less effective at playing go than I thought.

A bit like physics research: we have some vague idea of our long term strategy, and some technical and analytic tools that we apply in a fairly limited human fashion . . .

  • W. Taylor

Machine learning and the string landscape 4 / 15

slide-5
SLIDE 5

As a side project, I worked on “evolving” neural network like systems; ultimate goal: play go, learning from scratch Difficult problem, not enough computer power; I went to grad school to learn string theory. Stopped following AI in any detail. AlphaGo Zero astonished me!

  • 1. Computers are much faster, clearly

But also, I suspect:

  • 2. Humans are even less effective at playing go than I thought.

A bit like physics research: we have some vague idea of our long term strategy, and some technical and analytic tools that we apply in a fairly limited human fashion . . .

  • W. Taylor

Machine learning and the string landscape 4 / 15

slide-6
SLIDE 6

As a side project, I worked on “evolving” neural network like systems; ultimate goal: play go, learning from scratch Difficult problem, not enough computer power; I went to grad school to learn string theory. Stopped following AI in any detail. AlphaGo Zero astonished me!

  • 1. Computers are much faster, clearly

But also, I suspect:

  • 2. Humans are even less effective at playing go than I thought.

A bit like physics research: we have some vague idea of our long term strategy, and some technical and analytic tools that we apply in a fairly limited human fashion . . .

  • W. Taylor

Machine learning and the string landscape 4 / 15

slide-7
SLIDE 7

What kinds of problems is machine learning currently best at? Classification problems: image/face recognition, speech recognition, . . . Generally large data field, categorization into finite classes Optimization problems: “pretty good” solutions in high-dimensional spaces with reasonably smooth local structure

  • W. Taylor

Machine learning and the string landscape 5 / 15

slide-8
SLIDE 8

Problems relevant for the string landscape Types of problems with issues that go beyond current machine learning

  • 1. We don’t know what we’re doing, don’t have a framework

— Fundamental background-independent formulation of ST/QG — Nonperturbative definition of F-theory

  • 2. We lack mathematical frameworks, even for semi-understood problems

— Classify non-geometric flux vacua — Describe G2 manifolds with nonabelian symmetries

  • 3. Many things we don’t know how to compute

— Classify Calabi-Yau threefolds (can’t even prove finite number) — Compute superpotential and low-energy EFT for F-theory (nonpert.)

  • 4. We don’t know what physics we are looking for

— SUSY/SUSY breaking? — GUT SU(5)? SO(10)? E6, E8? non-Higgsable SU(3) x SU(2) x U(1)? I believe machine learning will not solve any of these problems anytime soon.

  • W. Taylor

Machine learning and the string landscape 6 / 15

slide-9
SLIDE 9

Problems relevant for the string landscape Types of problems with issues that go beyond current machine learning

  • 1. We don’t know what we’re doing, don’t have a framework

— Fundamental background-independent formulation of ST/QG — Nonperturbative definition of F-theory

  • 2. We lack mathematical frameworks, even for semi-understood problems

— Classify non-geometric flux vacua — Describe G2 manifolds with nonabelian symmetries

  • 3. Many things we don’t know how to compute

— Classify Calabi-Yau threefolds (can’t even prove finite number) — Compute superpotential and low-energy EFT for F-theory (nonpert.)

  • 4. We don’t know what physics we are looking for

— SUSY/SUSY breaking? — GUT SU(5)? SO(10)? E6, E8? non-Higgsable SU(3) x SU(2) x U(1)? I believe machine learning will not solve any of these problems anytime soon.

  • W. Taylor

Machine learning and the string landscape 6 / 15

slide-10
SLIDE 10

Problems relevant for the string landscape Types of problems with issues that go beyond current machine learning

  • 1. We don’t know what we’re doing, don’t have a framework

— Fundamental background-independent formulation of ST/QG — Nonperturbative definition of F-theory

  • 2. We lack mathematical frameworks, even for semi-understood problems

— Classify non-geometric flux vacua — Describe G2 manifolds with nonabelian symmetries

  • 3. Many things we don’t know how to compute

— Classify Calabi-Yau threefolds (can’t even prove finite number) — Compute superpotential and low-energy EFT for F-theory (nonpert.)

  • 4. We don’t know what physics we are looking for

— SUSY/SUSY breaking? — GUT SU(5)? SO(10)? E6, E8? non-Higgsable SU(3) x SU(2) x U(1)? I believe machine learning will not solve any of these problems anytime soon.

  • W. Taylor

Machine learning and the string landscape 6 / 15

slide-11
SLIDE 11

Problems relevant for the string landscape Types of problems with issues that go beyond current machine learning

  • 1. We don’t know what we’re doing, don’t have a framework

— Fundamental background-independent formulation of ST/QG — Nonperturbative definition of F-theory

  • 2. We lack mathematical frameworks, even for semi-understood problems

— Classify non-geometric flux vacua — Describe G2 manifolds with nonabelian symmetries

  • 3. Many things we don’t know how to compute

— Classify Calabi-Yau threefolds (can’t even prove finite number) — Compute superpotential and low-energy EFT for F-theory (nonpert.)

  • 4. We don’t know what physics we are looking for

— SUSY/SUSY breaking? — GUT SU(5)? SO(10)? E6, E8? non-Higgsable SU(3) x SU(2) x U(1)? I believe machine learning will not solve any of these problems anytime soon.

  • W. Taylor

Machine learning and the string landscape 6 / 15

slide-12
SLIDE 12

Problems relevant for the string landscape Types of problems with issues that go beyond current machine learning

  • 1. We don’t know what we’re doing, don’t have a framework

— Fundamental background-independent formulation of ST/QG — Nonperturbative definition of F-theory

  • 2. We lack mathematical frameworks, even for semi-understood problems

— Classify non-geometric flux vacua — Describe G2 manifolds with nonabelian symmetries

  • 3. Many things we don’t know how to compute

— Classify Calabi-Yau threefolds (can’t even prove finite number) — Compute superpotential and low-energy EFT for F-theory (nonpert.)

  • 4. We don’t know what physics we are looking for

— SUSY/SUSY breaking? — GUT SU(5)? SO(10)? E6, E8? non-Higgsable SU(3) x SU(2) x U(1)? I believe machine learning will not solve any of these problems anytime soon.

  • W. Taylor

Machine learning and the string landscape 6 / 15

slide-13
SLIDE 13

Nonetheless, many possible applications of large scale computation/ML Notwithstanding the preceding issues, we are getting a good enough handle on the landscape that we are beginning to have large datasets. Some (see below) are too large to enumerate (e.g. more elements than particles in the observable universe)

  • Many difficult computational problems
  • Need methods for dealing with large, poorly understood datasets (ML?)

— Identify patterns, suggest hypotheses for theoretical advances — Look for elements combining desired features (e.g. SM gauge group, matter content, Yukawas, etc.; cf. Ruehle talk) — Statistics and structure of BIG datasets

  • W. Taylor

Machine learning and the string landscape 7 / 15

slide-14
SLIDE 14

Nonetheless, many possible applications of large scale computation/ML Notwithstanding the preceding issues, we are getting a good enough handle on the landscape that we are beginning to have large datasets. Some (see below) are too large to enumerate (e.g. more elements than particles in the observable universe)

  • Many difficult computational problems
  • Need methods for dealing with large, poorly understood datasets (ML?)

— Identify patterns, suggest hypotheses for theoretical advances — Look for elements combining desired features (e.g. SM gauge group, matter content, Yukawas, etc.; cf. Ruehle talk) — Statistics and structure of BIG datasets

  • W. Taylor

Machine learning and the string landscape 7 / 15

slide-15
SLIDE 15

Some types of computationally challenging problems in the landscape: (examples of current projects) Large computations: 473,800,776 reflexive 4D polytopes ⇒ Calabi-Yau 3-folds (KS database) e.g. Analyze elliptic fibration structure (w/ Y. Huang) Complex algebra: Many computations require solving large systems of algebraic equations easy to go beyond capacity of existing computational AG (e.g. Groebner basis) e.g. Analyze Weierstrass tunings (w/ N. Raghuram) Diophantine equations: Solving equations over integers analytically, computationally difficult e.g. Solve abelian 6D anomaly equations (w/ A. Turner) −6a · b =

  • q

nqq2; 3b · b =

  • q

nqq4;

  • W. Taylor

Machine learning and the string landscape 8 / 15

slide-16
SLIDE 16

Some types of computationally challenging problems in the landscape: (examples of current projects) Large computations: 473,800,776 reflexive 4D polytopes ⇒ Calabi-Yau 3-folds (KS database) e.g. Analyze elliptic fibration structure (w/ Y. Huang) Complex algebra: Many computations require solving large systems of algebraic equations easy to go beyond capacity of existing computational AG (e.g. Groebner basis) e.g. Analyze Weierstrass tunings (w/ N. Raghuram) Diophantine equations: Solving equations over integers analytically, computationally difficult e.g. Solve abelian 6D anomaly equations (w/ A. Turner) −6a · b =

  • q

nqq2; 3b · b =

  • q

nqq4;

  • W. Taylor

Machine learning and the string landscape 8 / 15

slide-17
SLIDE 17

Some types of computationally challenging problems in the landscape: (examples of current projects) Large computations: 473,800,776 reflexive 4D polytopes ⇒ Calabi-Yau 3-folds (KS database) e.g. Analyze elliptic fibration structure (w/ Y. Huang) Complex algebra: Many computations require solving large systems of algebraic equations easy to go beyond capacity of existing computational AG (e.g. Groebner basis) e.g. Analyze Weierstrass tunings (w/ N. Raghuram) Diophantine equations: Solving equations over integers analytically, computationally difficult e.g. Solve abelian 6D anomaly equations (w/ A. Turner) −6a · b =

  • q

nqq2; 3b · b =

  • q

nqq4;

  • W. Taylor

Machine learning and the string landscape 8 / 15

slide-18
SLIDE 18

Some types of computationally challenging problems in the landscape: (examples of current projects) Large computations: 473,800,776 reflexive 4D polytopes ⇒ Calabi-Yau 3-folds (KS database) e.g. Analyze elliptic fibration structure (w/ Y. Huang) Complex algebra: Many computations require solving large systems of algebraic equations easy to go beyond capacity of existing computational AG (e.g. Groebner basis) e.g. Analyze Weierstrass tunings (w/ N. Raghuram) Diophantine equations: Solving equations over integers analytically, computationally difficult e.g. Solve abelian 6D anomaly equations (w/ A. Turner) −6a · b =

  • q

nqq2; 3b · b =

  • q

nqq4;

  • W. Taylor

Machine learning and the string landscape 8 / 15

slide-19
SLIDE 19

An example of a BIG dataset: The skeleton of the F-theory landscape This is a well defined graph, containing nodes • and edges ——– Contains >∼ 103000 nodes Nodes = smooth toric threefold bases that support an elliptic CY4 Edges = transitions from blowing up toric points or curves Describes a class of geometries for 4D N = 1 F-theory vacua hypothesized to describe core/outline of F-theory landscape WT/Wang: Monte Carlo on set connected to P3: ∼ 1050 geometries (w/o codim. 2 (4, 6) curves) Halverson/Long/Sung: Systematic blow-ups of weak Fano 3-folds: ∼ 10750 (w/c2-46) WT/Wang: One-way MC on set w/c2-46: >∼ 103000 geometries

  • W. Taylor

Machine learning and the string landscape 9 / 15

slide-20
SLIDE 20

An example of a BIG dataset: The skeleton of the F-theory landscape This is a well defined graph, containing nodes • and edges ——– Contains >∼ 103000 nodes Nodes = smooth toric threefold bases that support an elliptic CY4 Edges = transitions from blowing up toric points or curves Describes a class of geometries for 4D N = 1 F-theory vacua hypothesized to describe core/outline of F-theory landscape WT/Wang: Monte Carlo on set connected to P3: ∼ 1050 geometries (w/o codim. 2 (4, 6) curves) Halverson/Long/Sung: Systematic blow-ups of weak Fano 3-folds: ∼ 10750 (w/c2-46) WT/Wang: One-way MC on set w/c2-46: >∼ 103000 geometries

  • W. Taylor

Machine learning and the string landscape 9 / 15

slide-21
SLIDE 21

An example of a BIG dataset: The skeleton of the F-theory landscape This is a well defined graph, containing nodes • and edges ——– Contains >∼ 103000 nodes Nodes = smooth toric threefold bases that support an elliptic CY4 Edges = transitions from blowing up toric points or curves Describes a class of geometries for 4D N = 1 F-theory vacua hypothesized to describe core/outline of F-theory landscape WT/Wang: Monte Carlo on set connected to P3: ∼ 1050 geometries (w/o codim. 2 (4, 6) curves) Halverson/Long/Sung: Systematic blow-ups of weak Fano 3-folds: ∼ 10750 (w/c2-46) WT/Wang: One-way MC on set w/c2-46: >∼ 103000 geometries

  • W. Taylor

Machine learning and the string landscape 9 / 15

slide-22
SLIDE 22

Definition of F-theory skeleton graph: nodes Toric threefold defined by rays and cone structure in N = Z3 Rays vi ∈ Z3, Edges eij = (vi, vj), Cones σijk = (vi, vj, vk) Edges and cones define a triangulation of projection on sphere S2 Defines a complex threefold, (C∗)3 + toric divisors Di (from vi), curves (from eij), points (from σijk) Smooth if each σijk has unit volume Example: complex projective space CP3 {vi} = {(1, 0, 0), (0, 1, 0), (0, 0, 1), (−1, −1, −1)}; Di are vanishing loci of homogeneous coordinates [x, y, z, w]

  • W. Taylor

Machine learning and the string landscape 10 / 15

slide-23
SLIDE 23

Definition of F-theory skeleton graph: nodes Toric threefold defined by rays and cone structure in N = Z3 Rays vi ∈ Z3, Edges eij = (vi, vj), Cones σijk = (vi, vj, vk) Edges and cones define a triangulation of projection on sphere S2 Defines a complex threefold, (C∗)3 + toric divisors Di (from vi), curves (from eij), points (from σijk) Smooth if each σijk has unit volume Example: complex projective space CP3 {vi} = {(1, 0, 0), (0, 1, 0), (0, 0, 1), (−1, −1, −1)}; Di are vanishing loci of homogeneous coordinates [x, y, z, w]

  • W. Taylor

Machine learning and the string landscape 10 / 15

slide-24
SLIDE 24

Definition of F-theory skeleton graph: nodes Toric threefold defined by rays and cone structure in N = Z3 Rays vi ∈ Z3, Edges eij = (vi, vj), Cones σijk = (vi, vj, vk) Edges and cones define a triangulation of projection on sphere S2 Defines a complex threefold, (C∗)3 + toric divisors Di (from vi), curves (from eij), points (from σijk) Smooth if each σijk has unit volume Example: complex projective space CP3 {vi} = {(1, 0, 0), (0, 1, 0), (0, 0, 1), (−1, −1, −1)}; Di are vanishing loci of homogeneous coordinates [x, y, z, w]

  • W. Taylor

Machine learning and the string landscape 10 / 15

slide-25
SLIDE 25

Definition of F-theory skeleton graph: edges Edges connect the nodes (toric threefolds) by blowing up points, curves. Blowing up a point σijk: add new vertex v = vi + vj + vk Blowing up a curve eij: add new vertex v = vi + vj So far, defines an infinite connected graph, starting with e.g. P3

  • W. Taylor

Machine learning and the string landscape 11 / 15

slide-26
SLIDE 26

Definition of F-theory skeleton graph: edges Edges connect the nodes (toric threefolds) by blowing up points, curves. Blowing up a point σijk: add new vertex v = vi + vj + vk Blowing up a curve eij: add new vertex v = vi + vj So far, defines an infinite connected graph, starting with e.g. P3

  • W. Taylor

Machine learning and the string landscape 11 / 15

slide-27
SLIDE 27

Definition of F-theory skeleton graph: edges Edges connect the nodes (toric threefolds) by blowing up points, curves. Blowing up a point σijk: add new vertex v = vi + vj + vk Blowing up a curve eij: add new vertex v = vi + vj So far, defines an infinite connected graph, starting with e.g. P3

  • W. Taylor

Machine learning and the string landscape 11 / 15

slide-28
SLIDE 28

Definition of F-theory skeleton graph: edges Edges connect the nodes (toric threefolds) by blowing up points, curves. Blowing up a point σijk: add new vertex v = vi + vj + vk Blowing up a curve eij: add new vertex v = vi + vj So far, defines an infinite connected graph, starting with e.g. P3

  • W. Taylor

Machine learning and the string landscape 11 / 15

slide-29
SLIDE 29

Gauge groups and upper bounds on threefold geometries Elliptic Calabi-Yau fourfold over a base B defined by Weierstrass model y2 = x3 + fx + g f, g sections of line bundles O(−4K), O(−6K) On toric threefold base, f, g combinations of monomials in M4, M6 Mk = {m ∈ N∗ : m, vi ≥ −k, ∀i} . Orders of vanishing of f, g on divisor Di

  • rdDif = minm∈M4(4 + m, vi), ordDig = minm∈M6(6 + m, vi)
  • rd f, g, ∆ = 4f 3 + 27g2 ⇒ generic (non-Higgsable) gauge groups.
  • rd (f, g) > (4, 6) on codimension one (divisor) ⇒ no CY resolution

So analyzing M4, M6 gives gauge group on generic elliptic fibration, limiting bound on set of nodes Number of nodes is finite, since number of elliptic CY4’s finite (Di Cerbo, Svaldi ’17)

  • W. Taylor

Machine learning and the string landscape 12 / 15

slide-30
SLIDE 30

Gauge groups and upper bounds on threefold geometries Elliptic Calabi-Yau fourfold over a base B defined by Weierstrass model y2 = x3 + fx + g f, g sections of line bundles O(−4K), O(−6K) On toric threefold base, f, g combinations of monomials in M4, M6 Mk = {m ∈ N∗ : m, vi ≥ −k, ∀i} . Orders of vanishing of f, g on divisor Di

  • rdDif = minm∈M4(4 + m, vi), ordDig = minm∈M6(6 + m, vi)
  • rd f, g, ∆ = 4f 3 + 27g2 ⇒ generic (non-Higgsable) gauge groups.
  • rd (f, g) > (4, 6) on codimension one (divisor) ⇒ no CY resolution

So analyzing M4, M6 gives gauge group on generic elliptic fibration, limiting bound on set of nodes Number of nodes is finite, since number of elliptic CY4’s finite (Di Cerbo, Svaldi ’17)

  • W. Taylor

Machine learning and the string landscape 12 / 15

slide-31
SLIDE 31

The skeleton graph and the landscape: 6D analogue Similar story in 6D, but bases are toric surfaces. 61,539 smooth toric base surfaces (w/o c2-46’s) [Morrison/WT]

100 200 300 400 500 h11 100 200 300 400 500 h21

(blue = generic eCY3 over toric base, gray = full KS database) Fill range of (known) Calabi-Yaus. For all elliptic CY’s need:

  • non-toric (WT/Wang ’15, h2,1 > 150)
  • all tunings of Weierstrass model over each base (Johnson/WT, large h2,1)

6D story fairly well under control

  • W. Taylor

Machine learning and the string landscape 13 / 15

slide-32
SLIDE 32

The skeleton graph and the landscape: 4D Expect that the graph of smooth toric threefold bases captures rough global structure of the landscape. For a complete understanding, need:

  • Non-toric bases
  • Tunings of Weierstrass
  • Superpotential, fluxes, brane DOF, etc.
  • Clarify physics of codimension two (4, 6) curves, other singularities

First step: understand scope and features of 4D skeleton graph — Enumerate nodes with rare features? (e.g. no c2-46, minimal, . . . ) — Find nodes with specific combinations of features? Exploring this graph: see Long, Wang talks Physics: Standard model? Cosmology? Need more detailed analysis of fluxes, superpotential etc.

  • W. Taylor

Machine learning and the string landscape 14 / 15

slide-33
SLIDE 33

The skeleton graph and the landscape: 4D Expect that the graph of smooth toric threefold bases captures rough global structure of the landscape. For a complete understanding, need:

  • Non-toric bases
  • Tunings of Weierstrass
  • Superpotential, fluxes, brane DOF, etc.
  • Clarify physics of codimension two (4, 6) curves, other singularities

First step: understand scope and features of 4D skeleton graph — Enumerate nodes with rare features? (e.g. no c2-46, minimal, . . . ) — Find nodes with specific combinations of features? Exploring this graph: see Long, Wang talks Physics: Standard model? Cosmology? Need more detailed analysis of fluxes, superpotential etc.

  • W. Taylor

Machine learning and the string landscape 14 / 15

slide-34
SLIDE 34

The skeleton graph and the landscape: 4D Expect that the graph of smooth toric threefold bases captures rough global structure of the landscape. For a complete understanding, need:

  • Non-toric bases
  • Tunings of Weierstrass
  • Superpotential, fluxes, brane DOF, etc.
  • Clarify physics of codimension two (4, 6) curves, other singularities

First step: understand scope and features of 4D skeleton graph — Enumerate nodes with rare features? (e.g. no c2-46, minimal, . . . ) — Find nodes with specific combinations of features? Exploring this graph: see Long, Wang talks Physics: Standard model? Cosmology? Need more detailed analysis of fluxes, superpotential etc.

  • W. Taylor

Machine learning and the string landscape 14 / 15

slide-35
SLIDE 35

Conclusions

  • 1. While there are many hard physics problems in the landscape that go beyond

current machine learning methods, we have reached the point where significant large datasets may be amenable to big data/ML methods, and may give physically significant results.

  • 2. The “skeleton graph” of toric threefold bases for elliptic Calabi-Yau

fourfolds forms a prototype of a model of the 4D N = 1 landscape that offers many interesting questions for investigation.

  • W. Taylor

Machine learning and the string landscape 15 / 15