Constraint solving meets machine learning and data mining Algorithm - - PowerPoint PPT Presentation

constraint solving meets machine learning and data mining
SMART_READER_LITE
LIVE PREVIEW

Constraint solving meets machine learning and data mining Algorithm - - PowerPoint PPT Presentation

Constraint solving meets machine learning and data mining Algorithm portfolios Kustaa Kangas University of Helsinki, Finland November 8, 2012 K. Kangas (U. Helsinki) Constraint solving meetsmachine learning and data mining November 8, 2012


slide-1
SLIDE 1

Constraint solving meets machine learning and data mining Algorithm portfolios

Kustaa Kangas

University of Helsinki, Finland

November 8, 2012

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 1 / 29

slide-2
SLIDE 2

Constraint problems

Boolean satisifiability, integer linear programming etc.

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 2 / 29

slide-3
SLIDE 3

Constraint problems

Boolean satisifiability, integer linear programming etc. NP-hard

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 2 / 29

slide-4
SLIDE 4

Constraint problems

Boolean satisifiability, integer linear programming etc. NP-hard Many instances solvable with heuristic algorithms

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 2 / 29

slide-5
SLIDE 5

Constraint problems

Boolean satisifiability, integer linear programming etc. NP-hard Many instances solvable with heuristic algorithms High variance in performance, from milliseconds to weeks

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 2 / 29

slide-6
SLIDE 6

Constraint problems

Boolean satisifiability, integer linear programming etc. NP-hard Many instances solvable with heuristic algorithms High variance in performance, from milliseconds to weeks Different algorithms are fast on different instances

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 2 / 29

slide-7
SLIDE 7

Constraint problems

Boolean satisifiability, integer linear programming etc. NP-hard Many instances solvable with heuristic algorithms High variance in performance, from milliseconds to weeks Different algorithms are fast on different instances Typically no single best algorithm

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 2 / 29

slide-8
SLIDE 8

Algorithm selection

Algorithm selection problem

Given a problem instance, which algorithm should we run? Ideally: run the algorithm that’s fastest on the instance Problem: we cannot know without running the algorithms Traditional solution: run the average-case best algorithm Might be reasonably good Can be very bad on some instances Ignores algorithms that are good on some instances

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 3 / 29

slide-9
SLIDE 9

Algorithm portfolios

Idea: Use several algorithms to improve expected performance.

Algorithm portfolio

1 a collection of algorithms 2 a strategy for running them

A variety of strategies

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 4 / 29

slide-10
SLIDE 10

Algorithm portfolios

Idea: Use several algorithms to improve expected performance.

Algorithm portfolio

1 a collection of algorithms 2 a strategy for running them

A variety of strategies Run all algorithms (sequentially / in parallel)

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 4 / 29

slide-11
SLIDE 11

Algorithm portfolios

Idea: Use several algorithms to improve expected performance.

Algorithm portfolio

1 a collection of algorithms 2 a strategy for running them

A variety of strategies Run all algorithms (sequentially / in parallel) Select one algorithm based on the instance

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 4 / 29

slide-12
SLIDE 12

Algorithm portfolios

Idea: Use several algorithms to improve expected performance.

Algorithm portfolio

1 a collection of algorithms 2 a strategy for running them

A variety of strategies Run all algorithms (sequentially / in parallel) Select one algorithm based on the instance Anything from between

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 4 / 29

slide-13
SLIDE 13

SATzilla

A highly successful SAT portfolio solver Uses state-of-the-art SAT solvers

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 5 / 29

slide-14
SLIDE 14

SATzilla

A highly successful SAT portfolio solver Uses state-of-the-art SAT solvers Trains an empirical hardness model for each algorithm

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 5 / 29

slide-15
SLIDE 15

SATzilla

A highly successful SAT portfolio solver Uses state-of-the-art SAT solvers Trains an empirical hardness model for each algorithm

◮ explains how hard instances are and why

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 5 / 29

slide-16
SLIDE 16

SATzilla

A highly successful SAT portfolio solver Uses state-of-the-art SAT solvers Trains an empirical hardness model for each algorithm

◮ explains how hard instances are and why ◮ an approximate predictor of running time

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 5 / 29

slide-17
SLIDE 17

SATzilla

A highly successful SAT portfolio solver Uses state-of-the-art SAT solvers Trains an empirical hardness model for each algorithm

◮ explains how hard instances are and why ◮ an approximate predictor of running time ◮ predicts hardness based on instance features

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 5 / 29

slide-18
SLIDE 18

SATzilla

A highly successful SAT portfolio solver Uses state-of-the-art SAT solvers Trains an empirical hardness model for each algorithm

◮ explains how hard instances are and why ◮ an approximate predictor of running time ◮ predicts hardness based on instance features

Selects the algorithm predicted to be fastest

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 5 / 29

slide-19
SLIDE 19

SATzilla

A highly successful SAT portfolio solver Uses state-of-the-art SAT solvers Trains an empirical hardness model for each algorithm

◮ explains how hard instances are and why ◮ an approximate predictor of running time ◮ predicts hardness based on instance features

Selects the algorithm predicted to be fastest Performed well in 2007 SAT Competition

◮ 1st place in 3 categories, one 2nd place and one 3rd place

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 5 / 29

slide-20
SLIDE 20

Empirical hardness models

Where do empirical hardness models come from?

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 6 / 29

slide-21
SLIDE 21

Empirical hardness models

Where do empirical hardness models come from? They must be learned from data:

1

a set of algorithms

2

a set of training instances

3

a set of instance features

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 6 / 29

slide-22
SLIDE 22

Empirical hardness models

Where do empirical hardness models come from? They must be learned from data:

1

a set of algorithms

2

a set of training instances

3

a set of instance features

We use machine learning to exploit correlations between features and algorithm performance

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 6 / 29

slide-23
SLIDE 23

Empirical hardness models

problem instance x1 x2 x3 x4 x5 x6 running time instance 1 instance 2 instance 3 instance 4 instance 5 instance 6 instance 7 instance 8 instance 9 instance 10 instance 11 instance 12 . . . . . . . . . . . . . . . . . . . . . . . .

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 7 / 29

slide-24
SLIDE 24

Empirical hardness models

problem instance x1 x2 x3 x4 x5 x6 running time instance 1 18 101 1222 6 1 16 instance 2 23 8124 241 2 1 8 instance 3 57 32 683 3 5 42 instance 4 17 435 153 4 1 10 instance 5 46 76 346 3 4 30 instance 6 57 32 327 2 11 12 instance 7 26 62149 2408 3 2 15 instance 8 70 226 498 3 4 30 instance 9 30 194 20060 5 2 25 instance 10 13 108 614 8 4 3 instance 11 36 307 556 2 5 11 instance 12 56 100 728 4 13 7 . . . . . . . . . . . . . . . . . . . . . . . .

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 8 / 29

slide-25
SLIDE 25

Empirical hardness models

problem instance x1 x2 x3 x4 x5 x6 running time instance 1 18 101 1222 6 1 16 15698 instance 2 23 8124 241 2 1 8 129 instance 3 57 32 683 3 5 42 14680 instance 4 17 435 153 4 1 10 14 instance 5 46 76 346 3 4 30 493 instance 6 57 32 327 2 11 12 7332 instance 7 26 62149 2408 3 2 15 31709 instance 8 70 226 498 3 4 30 214 instance 9 30 194 20060 5 2 25 131 instance 10 13 108 614 8 4 3 1 instance 11 36 307 556 2 5 11 2026 instance 12 56 100 728 4 13 7 60 . . . . . . . . . . . . . . . . . . . . . . . .

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 9 / 29

slide-26
SLIDE 26

Empirical hardness models

problem instance x1 x2 x3 x4 x5 x6 running time instance 1 18 101 1222 6 1 16 15698 instance 2 23 8124 241 2 1 8 129 instance 3 57 32 683 3 5 42 14680 instance 4 17 435 153 4 1 10 14 instance 5 46 76 346 3 4 30 493 instance 6 57 32 327 2 11 12 7332 instance 7 26 62149 2408 3 2 15 31709 instance 8 70 226 498 3 4 30 214 instance 9 30 194 20060 5 2 25 131 instance 10 13 108 614 8 4 3 1 instance 11 36 307 556 2 5 11 2026 instance 12 56 100 728 4 13 7 60 . . . . . . . . . . . . . . . . . . . . . . . . new instance

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 10 / 29

slide-27
SLIDE 27

Empirical hardness models

problem instance x1 x2 x3 x4 x5 x6 running time instance 1 18 101 1222 6 1 16 15698 instance 2 23 8124 241 2 1 8 129 instance 3 57 32 683 3 5 42 14680 instance 4 17 435 153 4 1 10 14 instance 5 46 76 346 3 4 30 493 instance 6 57 32 327 2 11 12 7332 instance 7 26 62149 2408 3 2 15 31709 instance 8 70 226 498 3 4 30 214 instance 9 30 194 20060 5 2 25 131 instance 10 13 108 614 8 4 3 1 instance 11 36 307 556 2 5 11 2026 instance 12 56 100 728 4 13 7 60 . . . . . . . . . . . . . . . . . . . . . . . . new instance 62 3190 1716 3 14 5

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 11 / 29

slide-28
SLIDE 28

Empirical hardness models

problem instance x1 x2 x3 x4 x5 x6 running time instance 1 18 101 1222 6 1 16 15698 instance 2 23 8124 241 2 1 8 129 instance 3 57 32 683 3 5 42 14680 instance 4 17 435 153 4 1 10 14 instance 5 46 76 346 3 4 30 493 instance 6 57 32 327 2 11 12 7332 instance 7 26 62149 2408 3 2 15 31709 instance 8 70 226 498 3 4 30 214 instance 9 30 194 20060 5 2 25 131 instance 10 13 108 614 8 4 3 1 instance 11 36 307 556 2 5 11 2026 instance 12 56 100 728 4 13 7 60 . . . . . . . . . . . . . . . . . . . . . . . . new instance 62 3190 1716 3 14 5 ?

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 12 / 29

slide-29
SLIDE 29

Linear regression

running time feature

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 13 / 29

slide-30
SLIDE 30

Linear regression

running time feature

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 14 / 29

slide-31
SLIDE 31

Linear regression

Not limited to just one variable For features x1, x2, . . . , xm we fit a hyperplane fw of form fw(x) = w1x1 + w2x2 + · · · + wmxm We fit w1, w2, . . . , wm to minimize prediction error, e.g.

n

  • i=1

(fw(xi) − yi)2 where yi is the running time on instance i.

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 15 / 29

slide-32
SLIDE 32

Linear regression

Easily minimized by setting w = (ΦTΦ)−1ΦTy where Φ is the feature matrix. Dominated by matrix inversion, which is O(n3) Used by SATzilla Simple and works well in practice

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 16 / 29

slide-33
SLIDE 33

Identifying features

Success of the model depends crucially on the features. Features must be Correlated with running time Cheap to compute

◮ Feature computation is part of portfolio’s running time!

How do we find such features? Features are problem-specific No automatic way to find them Requires domain expertise

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 17 / 29

slide-34
SLIDE 34

Linear regression

running time feature

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 18 / 29

slide-35
SLIDE 35

SAT features

SATzilla uses 84 features related to e.g. instance size

◮ number of variables, number of clauses ◮ ratio between these two

balance

◮ ratio of positive and negative literals ◮ fraction of binary and ternary clauses

variable–clause graph

◮ variable degrees: average, min, max ◮ clause degrees: average, min, max

local search probe statistics

◮ number of steps to a local optimum ◮ average improvement per step

proximity to Horn formula

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 19 / 29

slide-36
SLIDE 36

Feature selection

Features can be uninformative: no correlation with running time redundant: highly correlated with other features Problematic:

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 20 / 29

slide-37
SLIDE 37

Feature selection

Features can be uninformative: no correlation with running time redundant: highly correlated with other features Problematic: Unnecessary feature computation

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 20 / 29

slide-38
SLIDE 38

Feature selection

Features can be uninformative: no correlation with running time redundant: highly correlated with other features Problematic: Unnecessary feature computation Learned models are harder to interpret

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 20 / 29

slide-39
SLIDE 39

Feature selection

Features can be uninformative: no correlation with running time redundant: highly correlated with other features Problematic: Unnecessary feature computation Learned models are harder to interpret Regression becomes unstable

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 20 / 29

slide-40
SLIDE 40

Feature selection

Useless features can be pruned automatically A subset of features can be evaluated by cross-validation

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 21 / 29

slide-41
SLIDE 41

Feature selection

Useless features can be pruned automatically A subset of features can be evaluated by cross-validation Exhaustive search of all subsets

◮ Infeasible for many features

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 21 / 29

slide-42
SLIDE 42

Feature selection

Useless features can be pruned automatically A subset of features can be evaluated by cross-validation Exhaustive search of all subsets

◮ Infeasible for many features

Greedy heuristic search

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 21 / 29

slide-43
SLIDE 43

Feature selection

Useless features can be pruned automatically A subset of features can be evaluated by cross-validation Exhaustive search of all subsets

◮ Infeasible for many features

Greedy heuristic search

◮ Forward selection: start with no features, add greedily

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 21 / 29

slide-44
SLIDE 44

Feature selection

Useless features can be pruned automatically A subset of features can be evaluated by cross-validation Exhaustive search of all subsets

◮ Infeasible for many features

Greedy heuristic search

◮ Forward selection: start with no features, add greedily ◮ Backward elimination: start with all features, remove greedily

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 21 / 29

slide-45
SLIDE 45

Feature selection

Useless features can be pruned automatically A subset of features can be evaluated by cross-validation Exhaustive search of all subsets

◮ Infeasible for many features

Greedy heuristic search

◮ Forward selection: start with no features, add greedily ◮ Backward elimination: start with all features, remove greedily ◮ Sequential replacement: add and replace greedily

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 21 / 29

slide-46
SLIDE 46

Basis function expansion

Linear regression is limited to linear correlations. Problematic: not all useful correlations are linear

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 22 / 29

slide-47
SLIDE 47

Basis function expansion

Linear regression is limited to linear correlations. Problematic: not all useful correlations are linear Generalizing regression gets complicated

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 22 / 29

slide-48
SLIDE 48

Basis function expansion

Linear regression is limited to linear correlations. Problematic: not all useful correlations are linear Generalizing regression gets complicated Quadratically dependent on x ⇐ ⇒ linearly dependent on x2

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 22 / 29

slide-49
SLIDE 49

Basis function expansion

Linear regression is limited to linear correlations. Problematic: not all useful correlations are linear Generalizing regression gets complicated Quadratically dependent on x ⇐ ⇒ linearly dependent on x2 Solution: add functions of original features Known as basis function expansion

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 22 / 29

slide-50
SLIDE 50

Basis function expansion

SATzilla uses quadratic expansion: Add all pairwise products of features

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 23 / 29

slide-51
SLIDE 51

Basis function expansion

SATzilla uses quadratic expansion: Add all pairwise products of features Number of features can explode: 842 = 7056

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 23 / 29

slide-52
SLIDE 52

Basis function expansion

SATzilla uses quadratic expansion: Add all pairwise products of features Number of features can explode: 842 = 7056 Regression becomes slow

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 23 / 29

slide-53
SLIDE 53

Basis function expansion

SATzilla uses quadratic expansion: Add all pairwise products of features Number of features can explode: 842 = 7056 Regression becomes slow Can lead to overfitting

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 23 / 29

slide-54
SLIDE 54

Basis function expansion

SATzilla uses quadratic expansion: Add all pairwise products of features Number of features can explode: 842 = 7056 Regression becomes slow Can lead to overfitting Many new features are useless: feature selection before and after expansion!

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 23 / 29

slide-55
SLIDE 55

Terminated runs

What if gathering running time data takes too long? Algorithms can run literally for weeks Such runs must be terminated prematurely How to use these runs to build models?

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 24 / 29

slide-56
SLIDE 56

Terminated runs

Solution 1: discard all such runs Not very sensible We want to learn that such instances are hard Solution 2: pretend they stopped at the cutoff limit Better: takes hardness into account Still systematically underestimates hardness

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 25 / 29

slide-57
SLIDE 57

Terminated runs

Solution 3: treat cutoff times as lower bounds Known as censoring in statistics Makes use of all information available

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 26 / 29

slide-58
SLIDE 58
  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 27 / 29

slide-59
SLIDE 59

Algorithm portfolios

Have been applied to various constraint problems: Boolean satisfiability (SAT) MaxSAT Mixed integer programming Constraint satisfaction (CSP) Combinatorial auctions Answer set programming Zero-one integer programming

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 28 / 29

slide-60
SLIDE 60

Conclusions

Algorithm portfolios can improve expected performance when no single algorithm is dominant. Particularly useful for NP-hard constraint problems where the running times exhibit high variance. In addition to predicting running time, empirical hardness models are valuable tools for understanding the hardness of problems.

  • K. Kangas (U. Helsinki)

Constraint solving meetsmachine learning and data mining Algorithm portfolios November 8, 2012 29 / 29