Its Not What Machines Can Learn, Its What We Cannot Teach ICML 2020 - PowerPoint PPT Presentation

It’s Not What Machines Can Learn, It’s What We Cannot Teach ICML 2020 Gal Yehuda, Moshe Gabel, Assaf Schuster

Applications of machine learning G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 2

Example : TSP Given a graph, we feed it to a model which outputs whether a route with cost < C exists YES GNN NO Prates, Avelar, Lemos, Lamb, Vardi, Learning to Solve NP-Complete Problems - A Graph Neural Network for Decision TSP ,AAAI 2019 G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 3

The machine learning process Propose: architecture, Generate Data Train model Evaluate SUCCESS features, embedding G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 4

Current Data Generation SotA ML methods are data hungry • Need many labeled examples YES ? Labeling training data is slow • Need to solve TSP, check 3-SAT, etc. NO ? Instead, data augmentation : • Start with small labeled training set • Apply label-preserving transformation G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 5

Our Main Result When starting with NP-hard problem, any efficient data generation or augmentation provably results in easier NP-hard subproblem. slow (non-poly-time) data generation This creates a catch-22: • Slow data generation à dataset too small NP ∩ coNP fast (poly-time) data • Fast data generation à easier subproblem generation or augmentation G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 6

Case Study: Conjunctive Query Containment Experiment on a case study, CQC. 100 90 80 Used common data sampling + 70 augmentation approach 60 Accuracy 50 40 Model appears to learn well! 30 20 10 Results on “real” space much lower. 0 • Up to 30% drop Augmented Sampled G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 7

Takeaways Efficient data generation results in easier subproblem when training. Can cause overestimation of accuracy when testing. Results in catch-22 : • small amounts of training data from right problem? • or large amounts of training data from easier subproblem? G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 8

Let’s dive deeper G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 9

What exactly did we show? Let L be an NP -hard language The binary classification problem: is x ∈ L or not? Sampler for L : probabilistic algorithm that generates labeled instances Efficient Sampler for L : a sampler that runs in poly-time , YES Sampler , NO G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 10

Result 1: All polynomial time samplers are incomplete • There are infinitely many instances it cannot generate ! The problem space, seen by efficient sampler The original problem space poly-time sampler G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 11

Result 2: Poly-time sampler yields easier subproblem If 𝑇 ! is a polynomial time sampler for a language 𝑀 , then the classification task over the instances 𝑇 ! generates is in NP ∩ coNP. ( , YES/NO) poly-time Is in L? L sampler The original problem was NP-hard Resulting problem is NP ∩ coNP G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 12

Meaning: efficient sampling harder does not preserve hardness NP-hard Even if we started with an NP-hard problem, NP-complete what’s left after an efficient sampling is an easier sub-problem NP coNP P easier G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 13

Proof NP = easy to verify that x ∈ L For all x, ∃𝑣 such that M(x,u) = 1 ⟺ x ∈ L coNP = easy to verify that x ∉ L For all x, ∃𝑣 such that M(x,u) = 1 ⟺ x ∉ L G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 14

Proof If x was generated by an efficient sampler 𝑇 ! , we can use the randomness used by the sampler both as a membership certificate and a non-membership cetificate To show that x ∈ L , check if 𝑇 ! (u) outpus (x, YES) è L ∈ NP To show that x ∉ L, check if 𝑇 ! (u) outputs (x, NO) è L ∈ co-NP G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 15

Result 3: It can get really bad… We show an L such that: 1. Original L is NP-hard. 2. Output of any polynomial time sampler for L is trivial to classify: the first bit of X is the label with high probability. sample x poly-time L sampler 1 0 NO YES G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 16

harder It can get really bad… NP-hard Meaning: any learning algorithm trained NP-complete on efficiently generated data ”thinks” it has 100% accuracy, where in fact it learns nothing about the original problem. NP coNP P constant time easier G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 17

Case study: Conjunctive Query Containment • A conjunctive query q over a dataset is a first order predicate of the form: • The task: given two queries q and p , are the results of q contained in the results of p regardless of database they run on? • This is an NP-complete problem. • Implications on query optimization, cache management, and more. G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 18

Case study: CQC sample from label using label preserving phase transition solver transformations N Y Y Y Y N N Vampire N N N N data Y theorem Y Y Y augmentation N prover Y Y N N N N N Y Y N N G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 19

Case study: CQC Proposed an architecture and trained it to high validation accuracy 95 Accuracy (%) 0 . 5 90 Accuracy 85 0 . 4 Loss Loss 80 0 . 3 75 0 . 2 70 0 . 0 2 . 5 5 . 0 7 . 5 10 . 0 12 . 5 15 . 0 Million samples G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 20

Case study: CQC Evaluate 0.942 aug 30% Test set accuracy 0.804 all-cqc drop 0.647 µ (10 , 8) 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1 . 0 Accuracy G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 21

In Summary • Can we use Machine Learning to approximately solve NP-hard problems? • Not enough to worry about the representation power of the network. Also worry about the procedure used to generate the data. • All poly-time data generators result in easier sub-problems. • And it may be very easy. • We must be careful when we evaluate our models. G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach. ICML 2020 22

THANK YOU! We will he bappy to discuss the work and answer questions. ygal@cs.technion.ac.il mgabel@cs.toronto.edu

Its Not What Machines Can Learn, Its What We Cannot Teach ICML 2020 - PowerPoint PPT Presentation

Its Not What Machines Can Learn, Its What We Cannot Teach ICML 2020 Gal Yehuda, Moshe Gabel, Assaf Schuster Applications of machine learning G. Yehuda, M. Gabel, A. Schuster. It's Not What Machines Can Learn, It's What We Cannot Teach.

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

GOOD MORNING! 1. Find a partner. 2. Tell them about why you teach what you teach, and how you

You will learn what git is . You will learn how you can use git . You will learn how to learn more

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

While Policymakers Teach, they Learn While Policymakers Teach, they Learn Assessing the

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Learn Blackboard Learn Learn with others Learn in your own time, pace, space Learn through

Finite State Machines (FSM) AKA Finite State Automat on State Machines Introduction State

Pearson update Level 2: BTEC Firsts BTEC Level 2 Technicals Which BTECs can you teach, and

You Can Teach Problem Solving and You Should Elizabeth Zwicky Great Circle, Inc Why do I

OFF CAMPUS LEADERSHIP SEMINAR Welcome to Saint-Cyr 2019 ! We cannot teach people anything ;

Elementary Education Teacher Licensure Program For over 100 years.. Why teach? What

Customer teach-in 1 Customer teach-in October 2017 Introduction Chris Weston CHIEF EXECUTIVE

Elementary Education Teacher Licensure Program For over 100 years.. Why teach? What

TEACH Grant Teacher Education Assistance for College and Higher Education (TEACH) Grant Program

Subsection 2 NP -hardness 36 / 109 NP -Hardness Do hard problems exist? Depends on P = NP

Chapter 11 Tree-based models Statistical Machine Translation Tree-Based Models Traditional

Probabilistic Graphical Models David Sontag New York University Lecture 4, February 21, 2013

In Its Usual Formulation, Fuzzy Computation Is, In General, NP-Hard, But a More Realistic

( ) n ( ) = ( ) k n k + q = P k p q k = = p 1 n

CS 1501 www.cs.pitt.edu/~nlf4/cs1501/ P vs NP But first, something completely different... Some

An Abridged Guide to P, NP and Some Things in Between Nicholas LaRacuente Image by Self,

Flip Distances between Graph Orientations Jean Cardinal Joint work with Oswin Aichholzer, Tony