The IMO Grand Challenge Daniel Selsam Microsoft Research September - - PowerPoint PPT Presentation

the imo grand challenge
SMART_READER_LITE
LIVE PREVIEW

The IMO Grand Challenge Daniel Selsam Microsoft Research September - - PowerPoint PPT Presentation

The IMO Grand Challenge Daniel Selsam Microsoft Research September 16th, 2020 Outline The Great Myth 1 The Grand Challenge 2 High-Level Strategy 3 Preliminary Roadmap 4 The Search Transformer The Universal Oracle Beyond the IMO 5 2


slide-1
SLIDE 1

The IMO Grand Challenge

Daniel Selsam

Microsoft Research

September 16th, 2020

slide-2
SLIDE 2

Outline

1

The Great Myth

2

The Grand Challenge

3

High-Level Strategy

4

Preliminary Roadmap The Search Transformer The Universal Oracle

5

Beyond the IMO

2 / 38

slide-3
SLIDE 3

The Great Myth

Third paragraph of standard Deep Learning textbook:

3 / 38

slide-4
SLIDE 4

The Great Myth

Third paragraph of standard Deep Learning textbook: In the early days of artificial intelligence, the field rapidly tackled and solved problems that are intellectually difficult for human be- ings but relatively straightforward for computers—problems that can be described by a list of formal, mathematical rules. The true challenge to artificial intelligence proved to be solving the tasks that are easy for people to perform but hard for people to describe formally—problems that we solve intuitively, that feel automatic, like recognizing spoken words or faces in images. Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.

3 / 38

slide-5
SLIDE 5

The Sad Truth

4 / 38

slide-6
SLIDE 6

The Sad Truth

Nowhere near human-level even on formally specified problems.

4 / 38

slide-7
SLIDE 7

The Sad Truth

Nowhere near human-level even on formally specified problems. Computers are only superhuman in certain niches.

problems with simple algorithms (e.g. differentiation) problems with limited structure (e.g. SAT) problems in certain FO theories (e.g. EUF, LRA) (others)

4 / 38

slide-8
SLIDE 8

The Sad Truth

Nowhere near human-level even on formally specified problems. Computers are only superhuman in certain niches.

problems with simple algorithms (e.g. differentiation) problems with limited structure (e.g. SAT) problems in certain FO theories (e.g. EUF, LRA) (others)

Such feats have masked the lack of progress on the general problem.

4 / 38

slide-9
SLIDE 9

The Sad Truth

Nowhere near human-level even on formally specified problems. Computers are only superhuman in certain niches.

problems with simple algorithms (e.g. differentiation) problems with limited structure (e.g. SAT) problems in certain FO theories (e.g. EUF, LRA) (others)

Such feats have masked the lack of progress on the general problem. Can still be hard to produce machine-checkable proofs at all.

even for relatively obvious steps after building libraries of abstractions/tactics with real-time interaction and feedback after decades of tool-building in both mathematics and software verification

4 / 38

slide-10
SLIDE 10

Working Forwards

5 / 38

slide-11
SLIDE 11

Working Forwards

The norm in AR is to work forwards from existing methods.

better heuristics for existing search spaces more efficient datastructures for existing algorithms procedures for new theories that can slot into SMT solvers

5 / 38

slide-12
SLIDE 12

Working Forwards

The norm in AR is to work forwards from existing methods.

better heuristics for existing search spaces more efficient datastructures for existing algorithms procedures for new theories that can slot into SMT solvers

Obvious pros:

can often push technologies much further than expected again and again, cross thresholds that unlock important applications

5 / 38

slide-13
SLIDE 13

Working Forwards

The norm in AR is to work forwards from existing methods.

better heuristics for existing search spaces more efficient datastructures for existing algorithms procedures for new theories that can slot into SMT solvers

Obvious pros:

can often push technologies much further than expected again and again, cross thresholds that unlock important applications

But in the long run: too easy to ignore the ultimate brickwalls!

5 / 38

slide-14
SLIDE 14

Working Forwards

The norm in AR is to work forwards from existing methods.

better heuristics for existing search spaces more efficient datastructures for existing algorithms procedures for new theories that can slot into SMT solvers

Obvious pros:

can often push technologies much further than expected again and again, cross thresholds that unlock important applications

But in the long run: too easy to ignore the ultimate brickwalls! Existing paradigms will never get us to human-level reasoning.

5 / 38

slide-15
SLIDE 15

Working Backwards

6 / 38

slide-16
SLIDE 16

Working Backwards

Working backwards from a goal is not a panacea.

it may be wholly unclear how to make any progress at all

  • r: it may be impossible to even measure progress
  • r worse: it may just require a lot of one-off engineering

6 / 38

slide-17
SLIDE 17

Working Backwards

Working backwards from a goal is not a panacea.

it may be wholly unclear how to make any progress at all

  • r: it may be impossible to even measure progress
  • r worse: it may just require a lot of one-off engineering

But: the right goal at the right time can be powerful.

can trigger the right questions suggest promising new approaches bring together siloed subfields at best: can lead to revolutionary advances!

6 / 38

slide-18
SLIDE 18

Working Backwards

Working backwards from a goal is not a panacea.

it may be wholly unclear how to make any progress at all

  • r: it may be impossible to even measure progress
  • r worse: it may just require a lot of one-off engineering

But: the right goal at the right time can be powerful.

can trigger the right questions suggest promising new approaches bring together siloed subfields at best: can lead to revolutionary advances!

Claim: The IMO is the right goal at the right time.

6 / 38

slide-19
SLIDE 19

Outline

1

The Great Myth

2

The Grand Challenge

3

High-Level Strategy

4

Preliminary Roadmap The Search Transformer The Universal Oracle

5

Beyond the IMO

7 / 38

slide-20
SLIDE 20

The International Mathematical Olympiad (IMO)

8 / 38

slide-21
SLIDE 21

The International Mathematical Olympiad (IMO)

The most celebrated intellectual competition in the world.

8 / 38

slide-22
SLIDE 22

The International Mathematical Olympiad (IMO)

The most celebrated intellectual competition in the world. Logistics:

every year, >100 countries train and filter and send 6 (HS) students. two-day test, with 4.5 hours/3 problems each day medals are percentile-based: top ≈8% win Gold

8 / 38

slide-23
SLIDE 23

The International Mathematical Olympiad (IMO)

The most celebrated intellectual competition in the world. Logistics:

every year, >100 countries train and filter and send 6 (HS) students. two-day test, with 4.5 hours/3 problems each day medals are percentile-based: top ≈8% win Gold

All material is elementary.

  • nly high-school level mathematics required

algebra, number theory, combinatorics, geometry solutions tend to be short and sweet

8 / 38

slide-24
SLIDE 24

The International Mathematical Olympiad (IMO)

The most celebrated intellectual competition in the world. Logistics:

every year, >100 countries train and filter and send 6 (HS) students. two-day test, with 4.5 hours/3 problems each day medals are percentile-based: top ≈8% win Gold

All material is elementary.

  • nly high-school level mathematics required

algebra, number theory, combinatorics, geometry solutions tend to be short and sweet but: they are designed to require tremendous ingenuity

8 / 38

slide-25
SLIDE 25

The International Mathematical Olympiad (IMO)

The most celebrated intellectual competition in the world. Logistics:

every year, >100 countries train and filter and send 6 (HS) students. two-day test, with 4.5 hours/3 problems each day medals are percentile-based: top ≈8% win Gold

All material is elementary.

  • nly high-school level mathematics required

algebra, number theory, combinatorics, geometry solutions tend to be short and sweet but: they are designed to require tremendous ingenuity

Extremely elite.

8 / 38

slide-26
SLIDE 26

Example Problems (Algebra)

Problem (IMO 2005 #3) Let x, y, z be three positive reals such that xyz ≥ 1. Prove that x5 − x2 x5 + y 2 + z2 + y 5 − y 2 x2 + y 5 + z2 + z5 − z2 x2 + y 2 + z5 ≥ 0

9 / 38

slide-27
SLIDE 27

Example Problems (Number Theory)

Problem (IMO 2003 #6) Show that for each prime p, there exists a prime q such that np − p is not divisible by q for any positive integer n.

10 / 38

slide-28
SLIDE 28

Example Problems (Combinatorics)

Problem (IMO 1995 #6) Let p be an odd prime number. How many p-element subsets A of {1, 2, . . . , 2p} are there, the sum of whose elements is divisible by p?

11 / 38

slide-29
SLIDE 29

Example Problems (Geometry)

Problem (IMO 2006 #6) Assign to each side b of a convex polygon P the maximum area of a triangle that has b as a side and is contained in P. Show that the sum of the areas assigned to the sides of P is at least twice the area of P.

12 / 38

slide-30
SLIDE 30

The IMO Grand Challenge

13 / 38

slide-31
SLIDE 31

The IMO Grand Challenge

The challenge:

13 / 38

slide-32
SLIDE 32

The IMO Grand Challenge

The challenge: build an AI that can win a gold medal.

13 / 38

slide-33
SLIDE 33

The IMO Grand Challenge

The challenge: build an AI that can win a gold medal. Formal-to-formal (F2F) variant of the IMO.

AI receives formal statements of problems must produce machine-checkable proofs (caveat: “determine” problems)

13 / 38

slide-34
SLIDE 34

The IMO Grand Challenge

The challenge: build an AI that can win a gold medal. Formal-to-formal (F2F) variant of the IMO.

AI receives formal statements of problems must produce machine-checkable proofs (caveat: “determine” problems)

Other details:

system must be checksummed before the problems are released no access to Internet regular wall-clock time but no other computational limitations proofs must be checkable in (say) 10 minutes (roughly what it takes to check a human proof)

13 / 38

slide-35
SLIDE 35

The IMO Grand Challenge

The challenge: build an AI that can win a gold medal. Formal-to-formal (F2F) variant of the IMO.

AI receives formal statements of problems must produce machine-checkable proofs (caveat: “determine” problems)

Other details:

system must be checksummed before the problems are released no access to Internet regular wall-clock time but no other computational limitations proofs must be checkable in (say) 10 minutes (roughly what it takes to check a human proof)

Committee:

Leonardo de Moura (MSR) Kevin Buzzard (Imperial College London) Reid Barton (University of Pittsburgh) Percy Liang (Stanford University) Sarah Loos (Apple) Freek Wiedijk (University of Nijmegen)

13 / 38

slide-36
SLIDE 36

Why the IMO?

14 / 38

slide-37
SLIDE 37

Why the IMO?

Extremely simple setting:

problems are formally specified (no vagueness or ambiguity) solutions can be machine-checked (no need to imitate humans) closed-world (limited background knowledge required)

14 / 38

slide-38
SLIDE 38

Why the IMO?

Extremely simple setting:

problems are formally specified (no vagueness or ambiguity) solutions can be machine-checked (no need to imitate humans) closed-world (limited background knowledge required)

Yet broad consensus:

incredibly hard, maybe even AI-complete would be among all-time great achievements of CS winning tech would revolutionize AI, AR, PL, mathematics

14 / 38

slide-39
SLIDE 39

Why the IMO?

Extremely simple setting:

problems are formally specified (no vagueness or ambiguity) solutions can be machine-checked (no need to imitate humans) closed-world (limited background knowledge required)

Yet broad consensus:

incredibly hard, maybe even AI-complete would be among all-time great achievements of CS winning tech would revolutionize AI, AR, PL, mathematics

Ongoing supply of new problems.

long-standing, global, decentralized process

14 / 38

slide-40
SLIDE 40

Why the IMO?

Extremely simple setting:

problems are formally specified (no vagueness or ambiguity) solutions can be machine-checked (no need to imitate humans) closed-world (limited background knowledge required)

Yet broad consensus:

incredibly hard, maybe even AI-complete would be among all-time great achievements of CS winning tech would revolutionize AI, AR, PL, mathematics

Ongoing supply of new problems.

long-standing, global, decentralized process

Well-defined notion of success: winning a gold medal.

14 / 38

slide-41
SLIDE 41

Why the IMO?

Extremely simple setting:

problems are formally specified (no vagueness or ambiguity) solutions can be machine-checked (no need to imitate humans) closed-world (limited background knowledge required)

Yet broad consensus:

incredibly hard, maybe even AI-complete would be among all-time great achievements of CS winning tech would revolutionize AI, AR, PL, mathematics

Ongoing supply of new problems.

long-standing, global, decentralized process

Well-defined notion of success: winning a gold medal. Most importantly:

14 / 38

slide-42
SLIDE 42

Why the IMO?

Extremely simple setting:

problems are formally specified (no vagueness or ambiguity) solutions can be machine-checked (no need to imitate humans) closed-world (limited background knowledge required)

Yet broad consensus:

incredibly hard, maybe even AI-complete would be among all-time great achievements of CS winning tech would revolutionize AI, AR, PL, mathematics

Ongoing supply of new problems.

long-standing, global, decentralized process

Well-defined notion of success: winning a gold medal. Most importantly: we think we have a real chance!

14 / 38

slide-43
SLIDE 43

Why the IMO?

Extremely simple setting:

problems are formally specified (no vagueness or ambiguity) solutions can be machine-checked (no need to imitate humans) closed-world (limited background knowledge required)

Yet broad consensus:

incredibly hard, maybe even AI-complete would be among all-time great achievements of CS winning tech would revolutionize AI, AR, PL, mathematics

Ongoing supply of new problems.

long-standing, global, decentralized process

Well-defined notion of success: winning a gold medal. Most importantly: we think we have a real chance!

but we need to work together as community

14 / 38

slide-44
SLIDE 44

Why the IMO?

Extremely simple setting:

problems are formally specified (no vagueness or ambiguity) solutions can be machine-checked (no need to imitate humans) closed-world (limited background knowledge required)

Yet broad consensus:

incredibly hard, maybe even AI-complete would be among all-time great achievements of CS winning tech would revolutionize AI, AR, PL, mathematics

Ongoing supply of new problems.

long-standing, global, decentralized process

Well-defined notion of success: winning a gold medal. Most importantly: we think we have a real chance!

but we need to work together as community and we need to play the long game

14 / 38

slide-45
SLIDE 45

Outline

1

The Great Myth

2

The Grand Challenge

3

High-Level Strategy

4

Preliminary Roadmap The Search Transformer The Universal Oracle

5

Beyond the IMO

15 / 38

slide-46
SLIDE 46

High-Level Strategy

16 / 38

slide-47
SLIDE 47

High-Level Strategy

1

Formalize historical problems in Lean.

grassroots effort in Mathlib community even before IMO-GC many former winners are involved most of the background math is already there

16 / 38

slide-48
SLIDE 48

High-Level Strategy

1

Formalize historical problems in Lean.

grassroots effort in Mathlib community even before IMO-GC many former winners are involved most of the background math is already there

2

Compress proofs using very high level tactics.

the kinds of strategies that humans are taught e.g. small-n, symmetry, extremes, invariants, pigeonhole challenge: how to manifest these in software?

16 / 38

slide-49
SLIDE 49

High-Level Strategy

1

Formalize historical problems in Lean.

grassroots effort in Mathlib community even before IMO-GC many former winners are involved most of the background math is already there

2

Compress proofs using very high level tactics.

the kinds of strategies that humans are taught e.g. small-n, symmetry, extremes, invariants, pigeonhole challenge: how to manifest these in software?

3

Train neural networks to guide search.

VHL tactics will be riddled with choice points no way to hand-engineer all the low-level heuristics challenge: how to learn heuristics from few examples?

16 / 38

slide-50
SLIDE 50

High-Level Strategy

1

Formalize historical problems in Lean.

grassroots effort in Mathlib community even before IMO-GC many former winners are involved most of the background math is already there

2

Compress proofs using very high level tactics.

the kinds of strategies that humans are taught e.g. small-n, symmetry, extremes, invariants, pigeonhole challenge: how to manifest these in software?

3

Train neural networks to guide search.

VHL tactics will be riddled with choice points no way to hand-engineer all the low-level heuristics challenge: how to learn heuristics from few examples?

4

Finish the job with armada of search.

16 / 38

slide-51
SLIDE 51

Outline

17 / 38

slide-52
SLIDE 52

Outline

Standard advice for talks: stick to the past.

17 / 38

slide-53
SLIDE 53

Outline

Standard advice for talks: stick to the past. Contra advice: rest of talk is preliminary roadmap.

17 / 38

slide-54
SLIDE 54

Outline

Standard advice for talks: stick to the past. Contra advice: rest of talk is preliminary roadmap.

potential solutions to the two main challenges

17 / 38

slide-55
SLIDE 55

Outline

Standard advice for talks: stick to the past. Contra advice: rest of talk is preliminary roadmap.

potential solutions to the two main challenges warning: ideas reasonably fleshed out but far from battle-tested

17 / 38

slide-56
SLIDE 56

Outline

Standard advice for talks: stick to the past. Contra advice: rest of talk is preliminary roadmap.

potential solutions to the two main challenges warning: ideas reasonably fleshed out but far from battle-tested

Two interrelated WIP ideas:

representing strategies with the search transformer guiding search with the universal oracle

17 / 38

slide-57
SLIDE 57

Outline

Standard advice for talks: stick to the past. Contra advice: rest of talk is preliminary roadmap.

potential solutions to the two main challenges warning: ideas reasonably fleshed out but far from battle-tested

Two interrelated WIP ideas:

representing strategies with the search transformer guiding search with the universal oracle

The real War Machine that makes these projects possible:

17 / 38

slide-58
SLIDE 58

Outline

Standard advice for talks: stick to the past. Contra advice: rest of talk is preliminary roadmap.

potential solutions to the two main challenges warning: ideas reasonably fleshed out but far from battle-tested

Two interrelated WIP ideas:

representing strategies with the search transformer guiding search with the universal oracle

The real War Machine that makes these projects possible: Lean4.

17 / 38

slide-59
SLIDE 59

Outline

Standard advice for talks: stick to the past. Contra advice: rest of talk is preliminary roadmap.

potential solutions to the two main challenges warning: ideas reasonably fleshed out but far from battle-tested

Two interrelated WIP ideas:

representing strategies with the search transformer guiding search with the universal oracle

The real War Machine that makes these projects possible: Lean4.

similar logic as battle-tested by Mathlib

17 / 38

slide-60
SLIDE 60

Outline

Standard advice for talks: stick to the past. Contra advice: rest of talk is preliminary roadmap.

potential solutions to the two main challenges warning: ideas reasonably fleshed out but far from battle-tested

Two interrelated WIP ideas:

representing strategies with the search transformer guiding search with the universal oracle

The real War Machine that makes these projects possible: Lean4.

similar logic as battle-tested by Mathlib new in Lean4: real programming language, ridiculous performance

17 / 38

slide-61
SLIDE 61

Outline

Standard advice for talks: stick to the past. Contra advice: rest of talk is preliminary roadmap.

potential solutions to the two main challenges warning: ideas reasonably fleshed out but far from battle-tested

Two interrelated WIP ideas:

representing strategies with the search transformer guiding search with the universal oracle

The real War Machine that makes these projects possible: Lean4.

similar logic as battle-tested by Mathlib new in Lean4: real programming language, ridiculous performance (no need to drop down to C++ for perf-critical tactics)

17 / 38

slide-62
SLIDE 62

Outline

Standard advice for talks: stick to the past. Contra advice: rest of talk is preliminary roadmap.

potential solutions to the two main challenges warning: ideas reasonably fleshed out but far from battle-tested

Two interrelated WIP ideas:

representing strategies with the search transformer guiding search with the universal oracle

The real War Machine that makes these projects possible: Lean4.

similar logic as battle-tested by Mathlib new in Lean4: real programming language, ridiculous performance (no need to drop down to C++ for perf-critical tactics) built by Leonardo de Moura (MSR) and Sebastian Ullrich (KIT)

17 / 38

slide-63
SLIDE 63

Outline

1

The Great Myth

2

The Grand Challenge

3

High-Level Strategy

4

Preliminary Roadmap The Search Transformer The Universal Oracle

5

Beyond the IMO

18 / 38

slide-64
SLIDE 64

Tactics, Not Agents

19 / 38

slide-65
SLIDE 65

Tactics, Not Agents

Standard agent/environment model for ITP:

(Theorems, Goal, Action) → [Goal] loop:

look at theorems, current goal, possible actions select action, apply it add resulting subgoals to goal stack

19 / 38

slide-66
SLIDE 66

Tactics, Not Agents

Standard agent/environment model for ITP:

(Theorems, Goal, Action) → [Goal] loop:

look at theorems, current goal, possible actions select action, apply it add resulting subgoals to goal stack

Appealing, but has limitations.

binary distinction between choices and black-box tactics in much of formal math, the line is very blurred

19 / 38

slide-67
SLIDE 67

Tactics, Not Agents

Standard agent/environment model for ITP:

(Theorems, Goal, Action) → [Goal] loop:

look at theorems, current goal, possible actions select action, apply it add resulting subgoals to goal stack

Appealing, but has limitations.

binary distinction between choices and black-box tactics in much of formal math, the line is very blurred

Tactics are computer programs, not atomic actions.

keep their own kind of state (not necessarily just list of goals) may make internal heuristic decisions may call other tactics recursively compositionality is where their power comes from!

19 / 38

slide-68
SLIDE 68

Tactics, Not Agents

Standard agent/environment model for ITP:

(Theorems, Goal, Action) → [Goal] loop:

look at theorems, current goal, possible actions select action, apply it add resulting subgoals to goal stack

Appealing, but has limitations.

binary distinction between choices and black-box tactics in much of formal math, the line is very blurred

Tactics are computer programs, not atomic actions.

keep their own kind of state (not necessarily just list of goals) may make internal heuristic decisions may call other tactics recursively compositionality is where their power comes from!

Roadmap I: New agent/environment model Write nondeterministic tactics with explicit choice points; agent’s job is to execute these tactics, choosing which branches to go down at each choice point.

19 / 38

slide-69
SLIDE 69

Nondeterministic Tactics

20 / 38

slide-70
SLIDE 70

Nondeterministic Tactics

Status quo: regular tactics hardcode choice-point ordering.

f <|> g means “try f , if it fails, try g” search space and search decisions intertwined

20 / 38

slide-71
SLIDE 71

Nondeterministic Tactics

Status quo: regular tactics hardcode choice-point ordering.

f <|> g means “try f , if it fails, try g” search space and search decisions intertwined

Our approach: reify the choice points.

factor out heuristics from search space allow multiple, modular ways of guiding tactics

20 / 38

slide-72
SLIDE 72

Nondeterministic Tactics

Status quo: regular tactics hardcode choice-point ordering.

f <|> g means “try f , if it fails, try g” search space and search decisions intertwined

Our approach: reify the choice points.

factor out heuristics from search space allow multiple, modular ways of guiding tactics

Silly example (more details to come):

blindRewrite : NondeterministicTactic := do h <- choose env.theorems execute (rewrite h)

20 / 38

slide-73
SLIDE 73

Nondeterministic Tactics

Status quo: regular tactics hardcode choice-point ordering.

f <|> g means “try f , if it fails, try g” search space and search decisions intertwined

Our approach: reify the choice points.

factor out heuristics from search space allow multiple, modular ways of guiding tactics

Silly example (more details to come):

blindRewrite : NondeterministicTactic := do h <- choose env.theorems execute (rewrite h) breadthFirstSearch blindRewrite depthFirstSearch blindRewrite

20 / 38

slide-74
SLIDE 74

Nondeterministic Tactics

Status quo: regular tactics hardcode choice-point ordering.

f <|> g means “try f , if it fails, try g” search space and search decisions intertwined

Our approach: reify the choice points.

factor out heuristics from search space allow multiple, modular ways of guiding tactics

Silly example (more details to come):

blindRewrite : NondeterministicTactic := do h <- choose env.theorems execute (rewrite h) breadthFirstSearch blindRewrite depthFirstSearch blindRewrite

Open question: how best to encode IMO strategies?

extreme 1: detailed proof scripts (no search) extreme 2: choose bits of proof (insane search)

  • bviously: we want something in the middle

20 / 38

slide-75
SLIDE 75

Example: Olympiad Inequalities

21 / 38

slide-76
SLIDE 76

Example: Olympiad Inequalities

Problem (JBMO 2002) Let a, b, c > 0 and prove that: 2(

  • cyc

a)2(

  • cyc

1 a(a + b)) ≥ 27

21 / 38

slide-77
SLIDE 77

Example: Olympiad Inequalities

Problem (JBMO 2002) Let a, b, c > 0 and prove that: 2(

  • cyc

a)2(

  • cyc

1 a(a + b)) ≥ 27

Calculational proof:

2(

  • cyc

a)2(

  • cyc

1 a(a + b) ) =   cyc a     cyc 2a     cyc 1 a(a + b)   (group) =   cyc a     cyc a + b     cyc 1 a(a + b)   (cycle) ≥   cyc a(a + b) a(a + b)   3 (Holder) =   cyc 1   3 (cancel) = 27 (eval) 21 / 38

slide-78
SLIDE 78

Example: Olympiad Inequalities

Problem (JBMO 2002) Let a, b, c > 0 and prove that: 2(

  • cyc

a)2(

  • cyc

1 a(a + b)) ≥ 27

Calculational proof:

2(

  • cyc

a)2(

  • cyc

1 a(a + b) ) =   cyc a     cyc 2a     cyc 1 a(a + b)   (group) =   cyc a     cyc a + b     cyc 1 a(a + b)   (cycle) ≥   cyc a(a + b) a(a + b)   3 (Holder) =   cyc 1   3 (cancel) = 27 (eval)

High-level proof: make LHS look like LHS of Holder’s, then apply it.

21 / 38

slide-79
SLIDE 79

Example: Olympiad Inequalities

Easy to implement nondeterministic strategy that can prove it:

22 / 38

slide-80
SLIDE 80

Example: Olympiad Inequalities

Easy to implement nondeterministic strategy that can prove it:

abstractProveJBMO2002 := do thm <- choose standardDozen makeLookLike (getLHS goal) (getLHS thm) apply thm finish

22 / 38

slide-81
SLIDE 81

Example: Olympiad Inequalities

Easy to implement nondeterministic strategy that can prove it:

abstractProveJBMO2002 := do thm <- choose standardDozen makeLookLike (getLHS goal) (getLHS thm) apply thm finish

May be hard to specify:

which theorem to try next? how to makeLookLike one term into another?

22 / 38

slide-82
SLIDE 82

Example: Olympiad Inequalities

Easy to implement nondeterministic strategy that can prove it:

abstractProveJBMO2002 := do thm <- choose standardDozen makeLookLike (getLHS goal) (getLHS thm) apply thm finish

May be hard to specify:

which theorem to try next? how to makeLookLike one term into another?

But, simple script already extremely useful!

makeLookLike gets a specification/goal can use target to prune search space dramatically

22 / 38

slide-83
SLIDE 83

Example: Olympiad Inequalities

Easy to implement nondeterministic strategy that can prove it:

abstractProveJBMO2002 := do thm <- choose standardDozen makeLookLike (getLHS goal) (getLHS thm) apply thm finish

May be hard to specify:

which theorem to try next? how to makeLookLike one term into another?

But, simple script already extremely useful!

makeLookLike gets a specification/goal can use target to prune search space dramatically

Easy to relax proof further:

getLHS goal → choose (subterms goal) apply → rewrite finish → simplify, recurse

22 / 38

slide-84
SLIDE 84

Example: Geometry

23 / 38

slide-85
SLIDE 85

Example: Geometry

IMO 2018 Problem 1:

23 / 38

slide-86
SLIDE 86

Example: Geometry

IMO 2018 Problem 1: Most Geometry proofs require introducing auxiliary constructions.

e.g. midpoints, feet, intersections, reflections, completions, etc. large (indeed, infinite) set of possibilities

23 / 38

slide-87
SLIDE 87

Example: Geometry

IMO 2018 Problem 1: Most Geometry proofs require introducing auxiliary constructions.

e.g. midpoints, feet, intersections, reflections, completions, etc. large (indeed, infinite) set of possibilities

(Start of human proof) Let M and N be the arc-midpoints of AB and AC respectively. It suffices to show that FGMN and DEMN.

23 / 38

slide-88
SLIDE 88

Example: Geometry

IMO 2018 Problem 1: Most Geometry proofs require introducing auxiliary constructions.

e.g. midpoints, feet, intersections, reflections, completions, etc. large (indeed, infinite) set of possibilities

(Start of human proof) Let M and N be the arc-midpoints of AB and AC respectively. It suffices to show that FGMN and DEMN. Ho, what magic?

how do you know to try M and N? what is the abstract strategy?

23 / 38

slide-89
SLIDE 89

Example: Geometry

Answer: look at the diagram!

24 / 38

slide-90
SLIDE 90

Example: Geometry

Answer: look at the diagram!

24 / 38

slide-91
SLIDE 91

Example: Geometry

Answer: look at the diagram! Simple nondeterministic strategy:

abstractProveGeo := do thm <- choose geoTheorems apply thm when (hasVariables goal) (do points <- chooseFromModel; instantiate points) abstractProveGeo

24 / 38

slide-92
SLIDE 92

Example: Geometry

Answer: look at the diagram! Simple nondeterministic strategy:

abstractProveGeo := do thm <- choose geoTheorems apply thm when (hasVariables goal) (do points <- chooseFromModel; instantiate points) abstractProveGeo

No idea how to specify:

which theorem to try next? which of the promising constructions to try next?

24 / 38

slide-93
SLIDE 93

Example: Geometry

Answer: look at the diagram! Simple nondeterministic strategy:

abstractProveGeo := do thm <- choose geoTheorems apply thm when (hasVariables goal) (do points <- chooseFromModel; instantiate points) abstractProveGeo

No idea how to specify:

which theorem to try next? which of the promising constructions to try next?

But simple script is extremely useful!

candidate constructions pruned by several OOM no loss of power (as long as model is correct)

24 / 38

slide-94
SLIDE 94

Decisions, Decisions

25 / 38

slide-95
SLIDE 95

Decisions, Decisions

The best tactics will still induce intractable search spaces.

we can only introspect so much we can only provide so much structure before we dull the system

25 / 38

slide-96
SLIDE 96

Decisions, Decisions

The best tactics will still induce intractable search spaces.

we can only introspect so much we can only provide so much structure before we dull the system

Can we leverage learning to navigate these spaces?

25 / 38

slide-97
SLIDE 97

Decisions, Decisions

The best tactics will still induce intractable search spaces.

we can only introspect so much we can only provide so much structure before we dull the system

Can we leverage learning to navigate these spaces? Hypothesis: deep learning has failed to advance AR because:

search spaces too low-level wrong agent models and obviously: not enough data

25 / 38

slide-98
SLIDE 98

Decisions, Decisions

The best tactics will still induce intractable search spaces.

we can only introspect so much we can only provide so much structure before we dull the system

Can we leverage learning to navigate these spaces? Hypothesis: deep learning has failed to advance AR because:

search spaces too low-level wrong agent models and obviously: not enough data

Roadmap II: Extreme Genericity Embed search problems generically so that a single neural network can pool data across all conceivable search problems and provide zero-shot guidance.

25 / 38

slide-99
SLIDE 99

Pooling Data

26 / 38

slide-100
SLIDE 100

Pooling Data

Want to pool training data across many domains:

26 / 38

slide-101
SLIDE 101

Pooling Data

Want to pool training data across many domains:

IMO problems

26 / 38

slide-102
SLIDE 102

Pooling Data

Want to pool training data across many domains:

IMO problems Mathlib proper

26 / 38

slide-103
SLIDE 103

Pooling Data

Want to pool training data across many domains:

IMO problems Mathlib proper

  • ther formal math libraries (e.g. Metamath)

26 / 38

slide-104
SLIDE 104

Pooling Data

Want to pool training data across many domains:

IMO problems Mathlib proper

  • ther formal math libraries (e.g. Metamath)

computer algebra (e.g. integrals, sums)

26 / 38

slide-105
SLIDE 105

Pooling Data

Want to pool training data across many domains:

IMO problems Mathlib proper

  • ther formal math libraries (e.g. Metamath)

computer algebra (e.g. integrals, sums) synthesis problems (e.g. ARC)

26 / 38

slide-106
SLIDE 106

Pooling Data

Want to pool training data across many domains:

IMO problems Mathlib proper

  • ther formal math libraries (e.g. Metamath)

computer algebra (e.g. integrals, sums) synthesis problems (e.g. ARC) puzzles (e.g. Sudoku)

26 / 38

slide-107
SLIDE 107

Pooling Data

Want to pool training data across many domains:

IMO problems Mathlib proper

  • ther formal math libraries (e.g. Metamath)

computer algebra (e.g. integrals, sums) synthesis problems (e.g. ARC) puzzles (e.g. Sudoku) verification problems?

26 / 38

slide-108
SLIDE 108

Pooling Data

Want to pool training data across many domains:

IMO problems Mathlib proper

  • ther formal math libraries (e.g. Metamath)

computer algebra (e.g. integrals, sums) synthesis problems (e.g. ARC) puzzles (e.g. Sudoku) verification problems? code optimizers?

26 / 38

slide-109
SLIDE 109

Pooling Data

Want to pool training data across many domains:

IMO problems Mathlib proper

  • ther formal math libraries (e.g. Metamath)

computer algebra (e.g. integrals, sums) synthesis problems (e.g. ARC) puzzles (e.g. Sudoku) verification problems? code optimizers? query planners?

26 / 38

slide-110
SLIDE 110

Pooling Data

Want to pool training data across many domains:

IMO problems Mathlib proper

  • ther formal math libraries (e.g. Metamath)

computer algebra (e.g. integrals, sums) synthesis problems (e.g. ARC) puzzles (e.g. Sudoku) verification problems? code optimizers? query planners? board games?

26 / 38

slide-111
SLIDE 111

Pooling Data

Want to pool training data across many domains:

IMO problems Mathlib proper

  • ther formal math libraries (e.g. Metamath)

computer algebra (e.g. integrals, sums) synthesis problems (e.g. ARC) puzzles (e.g. Sudoku) verification problems? code optimizers? query planners? board games? ...

26 / 38

slide-112
SLIDE 112

Pooling Data

Want to pool training data across many domains:

IMO problems Mathlib proper

  • ther formal math libraries (e.g. Metamath)

computer algebra (e.g. integrals, sums) synthesis problems (e.g. ARC) puzzles (e.g. Sudoku) verification problems? code optimizers? query planners? board games? ... (endless possibilities)

26 / 38

slide-113
SLIDE 113

Pooling Data

Want to pool training data across many domains:

IMO problems Mathlib proper

  • ther formal math libraries (e.g. Metamath)

computer algebra (e.g. integrals, sums) synthesis problems (e.g. ARC) puzzles (e.g. Sudoku) verification problems? code optimizers? query planners? board games? ... (endless possibilities) (empirical/economic question where to draw the line)

26 / 38

slide-114
SLIDE 114

Pooling Data

Want to pool training data across many domains:

IMO problems Mathlib proper

  • ther formal math libraries (e.g. Metamath)

computer algebra (e.g. integrals, sums) synthesis problems (e.g. ARC) puzzles (e.g. Sudoku) verification problems? code optimizers? query planners? board games? ... (endless possibilities) (empirical/economic question where to draw the line)

To pool: search problems must be made commensurable.

26 / 38

slide-115
SLIDE 115

Generic Search Problems

27 / 38

slide-116
SLIDE 116

Generic Search Problems

New abstraction SearchT for representing arbitrary search problems.

27 / 38

slide-117
SLIDE 117

Generic Search Problems

New abstraction SearchT for representing arbitrary search problems. Basic idea: a search problem is an arbitrary program that either:

27 / 38

slide-118
SLIDE 118

Generic Search Problems

New abstraction SearchT for representing arbitrary search problems. Basic idea: a search problem is an arbitrary program that either:

fails

27 / 38

slide-119
SLIDE 119

Generic Search Problems

New abstraction SearchT for representing arbitrary search problems. Basic idea: a search problem is an arbitrary program that either:

fails successfully returns a value

27 / 38

slide-120
SLIDE 120

Generic Search Problems

New abstraction SearchT for representing arbitrary search problems. Basic idea: a search problem is an arbitrary program that either:

fails successfully returns a value returns “choice point”

27 / 38

slide-121
SLIDE 121

Generic Search Problems

New abstraction SearchT for representing arbitrary search problems. Basic idea: a search problem is an arbitrary program that either:

fails successfully returns a value returns “choice point”

Choice point:

user-specified data deemed relevant for decision list of possible choices

27 / 38

slide-122
SLIDE 122

Generic Search Problems

New abstraction SearchT for representing arbitrary search problems. Basic idea: a search problem is an arbitrary program that either:

fails successfully returns a value returns “choice point”

Choice point:

user-specified data deemed relevant for decision list of possible choices

Choice:

some data summarizing the choice another arbitrary search problem, i.e. a continuation

27 / 38

slide-123
SLIDE 123

Generic Search Problems

New abstraction SearchT for representing arbitrary search problems. Basic idea: a search problem is an arbitrary program that either:

fails successfully returns a value returns “choice point”

Choice point:

user-specified data deemed relevant for decision list of possible choices

Choice:

some data summarizing the choice another arbitrary search problem, i.e. a continuation

Can “run” a SearchT program in variety of generic ways.

depth-first search breadth-first search later: heuristic search

27 / 38

slide-124
SLIDE 124

Example: ARC

28 / 38

slide-125
SLIDE 125

Example: ARC

28 / 38

slide-126
SLIDE 126

Example: ARC

28 / 38

slide-127
SLIDE 127

Example: ARC

28 / 38

slide-128
SLIDE 128

Example: ARC

High-level solution:

28 / 38

slide-129
SLIDE 129

Example: ARC

High-level solution: split input into shapes by color and connectivity

28 / 38

slide-130
SLIDE 130

Example: ARC

High-level solution: split input into shapes by color and connectivity find the special shape that touches a grey cell

28 / 38

slide-131
SLIDE 131

Example: ARC

High-level solution: split input into shapes by color and connectivity find the special shape that touches a grey cell guess the smallest square containing the special shape

28 / 38

slide-132
SLIDE 132

Example: ARC

29 / 38

slide-133
SLIDE 133

Example: ARC

29 / 38

slide-134
SLIDE 134

Example: ARC

High-level solution:

29 / 38

slide-135
SLIDE 135

Example: ARC

High-level solution: split input into subgrids by stripping the partitions

29 / 38

slide-136
SLIDE 136

Example: ARC

High-level solution: split input into subgrids by stripping the partitions find the special subgrid that has only four non-blank cells

29 / 38

slide-137
SLIDE 137

Example: ARC

High-level solution: split input into subgrids by stripping the partitions find the special subgrid that has only four non-blank cells guess an upscaled, grey-separated version of special subgrid

29 / 38

slide-138
SLIDE 138

Example: ARC

SearchT program that can solve both:

30 / 38

slide-139
SLIDE 139

Example: ARC

SearchT program that can solve both:

def abstactSolveSpecial = do inThings <- splitInputIntoThings (specialInput, in2outFn) <- synthAlignSpecialThingFn inThings pickSpecialFn <- synthPickSpecialFn specialInput inThings guess (in2outFn (pickSpecialFn inThings.test))

30 / 38

slide-140
SLIDE 140

Example: ARC

SearchT program that can solve both:

def abstactSolveSpecial = do inThings <- splitInputIntoThings (specialInput, in2outFn) <- synthAlignSpecialThingFn inThings pickSpecialFn <- synthPickSpecialFn specialInput inThings guess (in2outFn (pickSpecialFn inThings.test))

A handful of SearchT tactics like this placed us in top 2%.

  • n ARC Kaggle competition

with no heuristics, only blind iterative-deepening joint with: Ryan Krueger (UMich) and Jesse Michael Han (UPitt)

30 / 38

slide-141
SLIDE 141

Example: ARC

SearchT program that can solve both:

def abstactSolveSpecial = do inThings <- splitInputIntoThings (specialInput, in2outFn) <- synthAlignSpecialThingFn inThings pickSpecialFn <- synthPickSpecialFn specialInput inThings guess (in2outFn (pickSpecialFn inThings.test))

A handful of SearchT tactics like this placed us in top 2%.

  • n ARC Kaggle competition

with no heuristics, only blind iterative-deepening joint with: Ryan Krueger (UMich) and Jesse Michael Han (UPitt)

Takeaway: programs + nondeterminism let you write:

convenient, abstract, compositional strategies that solve superficially diverse problems

30 / 38

slide-142
SLIDE 142

Example: ARC

SearchT program that can solve both:

def abstactSolveSpecial = do inThings <- splitInputIntoThings (specialInput, in2outFn) <- synthAlignSpecialThingFn inThings pickSpecialFn <- synthPickSpecialFn specialInput inThings guess (in2outFn (pickSpecialFn inThings.test))

A handful of SearchT tactics like this placed us in top 2%.

  • n ARC Kaggle competition

with no heuristics, only blind iterative-deepening joint with: Ryan Krueger (UMich) and Jesse Michael Han (UPitt)

Takeaway: programs + nondeterminism let you write:

convenient, abstract, compositional strategies that solve superficially diverse problems

Note: we needed to be conservative to keep search tractable.

could have written much more flexible tactics with good heuristics

30 / 38

slide-143
SLIDE 143

Generic Heuristics

31 / 38

slide-144
SLIDE 144

Generic Heuristics

Recall a choice point consists of:

user-specificed data deemed relevant for decision list of possible choices, each with summary data and a continuation

31 / 38

slide-145
SLIDE 145

Generic Heuristics

Recall a choice point consists of:

user-specificed data deemed relevant for decision list of possible choices, each with summary data and a continuation

But: the datatypes involved may be arbitrary.

inequalities: regular tactic state Geometry: E-graph, sets for lines/circles, diagram ARC: input and output grids

31 / 38

slide-146
SLIDE 146

Generic Heuristics

Recall a choice point consists of:

user-specificed data deemed relevant for decision list of possible choices, each with summary data and a continuation

But: the datatypes involved may be arbitrary.

inequalities: regular tactic state Geometry: E-graph, sets for lines/circles, diagram ARC: input and output grids

In all three examples, subproblems see different data.

inequalities: makeLookLike sees the target pattern Geometry: chooseFromModel sees desired property ARC: synthPickSpecialFn sees labels, i.e. the special things

31 / 38

slide-147
SLIDE 147

Generic Heuristics

Recall a choice point consists of:

user-specificed data deemed relevant for decision list of possible choices, each with summary data and a continuation

But: the datatypes involved may be arbitrary.

inequalities: regular tactic state Geometry: E-graph, sets for lines/circles, diagram ARC: input and output grids

In all three examples, subproblems see different data.

inequalities: makeLookLike sees the target pattern Geometry: chooseFromModel sees desired property ARC: synthPickSpecialFn sees labels, i.e. the special things

Q: how to share statistical strength across all problems?

31 / 38

slide-148
SLIDE 148

Compositional Embeddings

32 / 38

slide-149
SLIDE 149

Compositional Embeddings

First thought: emit tokens and use single transformer.

appealingly simple! but: na¨ ıve, bad asymptotics, limited inductive bias

32 / 38

slide-150
SLIDE 150

Compositional Embeddings

First thought: emit tokens and use single transformer.

appealingly simple! but: na¨ ıve, bad asymptotics, limited inductive bias

Proposal: compositional embeddings.

32 / 38

slide-151
SLIDE 151

Compositional Embeddings

First thought: emit tokens and use single transformer.

appealingly simple! but: na¨ ıve, bad asymptotics, limited inductive bias

Proposal: compositional embeddings.

List<α> as (say) LSTM on embeddings of αs

32 / 38

slide-152
SLIDE 152

Compositional Embeddings

First thought: emit tokens and use single transformer.

appealingly simple! but: na¨ ıve, bad asymptotics, limited inductive bias

Proposal: compositional embeddings.

List<α> as (say) LSTM on embeddings of αs Vector<α> as (say) transformer of αs

32 / 38

slide-153
SLIDE 153

Compositional Embeddings

First thought: emit tokens and use single transformer.

appealingly simple! but: na¨ ıve, bad asymptotics, limited inductive bias

Proposal: compositional embeddings.

List<α> as (say) LSTM on embeddings of αs Vector<α> as (say) transformer of αs Set<α> as AC-invariant repr of αs

32 / 38

slide-154
SLIDE 154

Compositional Embeddings

First thought: emit tokens and use single transformer.

appealingly simple! but: na¨ ıve, bad asymptotics, limited inductive bias

Proposal: compositional embeddings.

List<α> as (say) LSTM on embeddings of αs Vector<α> as (say) transformer of αs Set<α> as AC-invariant repr of αs Grid<α> as CNN over αs

32 / 38

slide-155
SLIDE 155

Compositional Embeddings

First thought: emit tokens and use single transformer.

appealingly simple! but: na¨ ıve, bad asymptotics, limited inductive bias

Proposal: compositional embeddings.

List<α> as (say) LSTM on embeddings of αs Vector<α> as (say) transformer of αs Set<α> as AC-invariant repr of αs Grid<α> as CNN over αs custom Term type as GNN

32 / 38

slide-156
SLIDE 156

Compositional Embeddings

First thought: emit tokens and use single transformer.

appealingly simple! but: na¨ ıve, bad asymptotics, limited inductive bias

Proposal: compositional embeddings.

List<α> as (say) LSTM on embeddings of αs Vector<α> as (say) transformer of αs Set<α> as AC-invariant repr of αs Grid<α> as CNN over αs custom Term type as GNN share e.g. Vector parameters across all vectors

32 / 38

slide-157
SLIDE 157

Compositional Embeddings

First thought: emit tokens and use single transformer.

appealingly simple! but: na¨ ıve, bad asymptotics, limited inductive bias

Proposal: compositional embeddings.

List<α> as (say) LSTM on embeddings of αs Vector<α> as (say) transformer of αs Set<α> as AC-invariant repr of αs Grid<α> as CNN over αs custom Term type as GNN share e.g. Vector parameters across all vectors (call this Phase I of the embedding)

32 / 38

slide-158
SLIDE 158

Compositional Embeddings

First thought: emit tokens and use single transformer.

appealingly simple! but: na¨ ıve, bad asymptotics, limited inductive bias

Proposal: compositional embeddings.

List<α> as (say) LSTM on embeddings of αs Vector<α> as (say) transformer of αs Set<α> as AC-invariant repr of αs Grid<α> as CNN over αs custom Term type as GNN share e.g. Vector parameters across all vectors (call this Phase I of the embedding)

Embed the continuations as the program Exprs.

32 / 38

slide-159
SLIDE 159

Compositional Embeddings

First thought: emit tokens and use single transformer.

appealingly simple! but: na¨ ıve, bad asymptotics, limited inductive bias

Proposal: compositional embeddings.

List<α> as (say) LSTM on embeddings of αs Vector<α> as (say) transformer of αs Set<α> as AC-invariant repr of αs Grid<α> as CNN over αs custom Term type as GNN share e.g. Vector parameters across all vectors (call this Phase I of the embedding)

Embed the continuations as the program Exprs. Phase II: after embedding all datatypes to same space:

run single generic model (e.g. transformer) then at end, output floats giving scores to choices

32 / 38

slide-160
SLIDE 160

Universal Oracle

Now: we can embed arbitrary types into one space.

language features (e.g. typeclasses) make most plumbing transparent

33 / 38

slide-161
SLIDE 161

Universal Oracle

Now: we can embed arbitrary types into one space.

language features (e.g. typeclasses) make most plumbing transparent

This lets us build: Universal Oracle A trainable procedure that can map any choice point with n choices encountered by any SearchT program into a vector of n floats, representing heuristic preferences among the choices.

33 / 38

slide-162
SLIDE 162

Universal Oracle

Now: we can embed arbitrary types into one space.

language features (e.g. typeclasses) make most plumbing transparent

This lets us build: Universal Oracle A trainable procedure that can map any choice point with n choices encountered by any SearchT program into a vector of n floats, representing heuristic preferences among the choices. And finally we can implement: Self-Improving Universal Search A generic way of executing a SearchT program that queries the universal

  • racle at every choice point and trains the oracle based on new data

arising from the search.

33 / 38

slide-163
SLIDE 163

Outline

1

The Great Myth

2

The Grand Challenge

3

High-Level Strategy

4

Preliminary Roadmap The Search Transformer The Universal Oracle

5

Beyond the IMO

34 / 38

slide-164
SLIDE 164

Beyond the IMO

35 / 38

slide-165
SLIDE 165

Beyond the IMO

Hypothesis: If we can win the IMO, we could use a similar methodology to automate any class of problems that

are formally specified, and that very smart humans can be trained to solve reliably.

35 / 38

slide-166
SLIDE 166

Beyond the IMO

Hypothesis: If we can win the IMO, we could use a similar methodology to automate any class of problems that

are formally specified, and that very smart humans can be trained to solve reliably.

Does not include:

personal assistants (no formal spec) solving Clay Millenium Problems (not trainable)

35 / 38

slide-167
SLIDE 167

Beyond the IMO

Hypothesis: If we can win the IMO, we could use a similar methodology to automate any class of problems that

are formally specified, and that very smart humans can be trained to solve reliably.

Does not include:

personal assistants (no formal spec) solving Clay Millenium Problems (not trainable)

But it is still is a huge class of important problems.

35 / 38

slide-168
SLIDE 168

Beyond the IMO

Includes most proofs in CS and Stats research.

convergence rates regret/generalization bounds asymptotic time/space arguments

36 / 38

slide-169
SLIDE 169

Beyond the IMO

Includes most proofs in CS and Stats research.

convergence rates regret/generalization bounds asymptotic time/space arguments

Includes many subproblems that appear in “real” mathematics.

(many IMO problems arise this way)

36 / 38

slide-170
SLIDE 170

Beyond the IMO

Includes most proofs in CS and Stats research.

convergence rates regret/generalization bounds asymptotic time/space arguments

Includes many subproblems that appear in “real” mathematics.

(many IMO problems arise this way)

Includes big chunk of software verification!

36 / 38

slide-171
SLIDE 171

Long-Term Aspiration

37 / 38

slide-172
SLIDE 172

Long-Term Aspiration

High-performance synthesis from extremely high-level code.

37 / 38

slide-173
SLIDE 173

Long-Term Aspiration

High-performance synthesis from extremely high-level code. Universal finding: can be very difficult to write “good” specs.

37 / 38

slide-174
SLIDE 174

Long-Term Aspiration

High-performance synthesis from extremely high-level code. Universal finding: can be very difficult to write “good” specs. But: it is always easier to write slow code than fast code!

37 / 38

slide-175
SLIDE 175

Long-Term Aspiration

High-performance synthesis from extremely high-level code. Universal finding: can be very difficult to write “good” specs. But: it is always easier to write slow code than fast code! (Mostly) Well-defined and teachable to smart people:

start with: extremely high-level, na¨ ıve code perform all sorts of high-, medium-, and low-level optimizations end with: correct, super-high-performance code (+ additional desiderata)

37 / 38

slide-176
SLIDE 176

Long-Term Aspiration

High-performance synthesis from extremely high-level code. Universal finding: can be very difficult to write “good” specs. But: it is always easier to write slow code than fast code! (Mostly) Well-defined and teachable to smart people:

start with: extremely high-level, na¨ ıve code perform all sorts of high-, medium-, and low-level optimizations end with: correct, super-high-performance code (+ additional desiderata)

If we can win the IMO, perhaps we can automate this too.

37 / 38

slide-177
SLIDE 177

Thank You

Lean4: https://github.com/leanprover/lean4 IMO Grand Challenge website: https://imo-grand-challenge.github.io/ Zulip channel: https://leanprover.zulipchat.com/

38 / 38