and sum of squares polynomials Georgina Hall INSEAD, Decision - PowerPoint PPT Presentation

Shape-constrained regression and sum of squares polynomials Georgina Hall INSEAD, Decision Sciences Joint work with Mihaela Curmei (Berkeley, EECS) 1

Shape-constrained regression (1/2) 𝑗 𝑗=1,…,𝑛 where 𝑌 𝑗 ∈ 𝐶 ⊂ ℝ 𝑜 ( 𝐶 is a box) and 𝑍 Data: 𝑌 𝑗 , 𝑍 𝑗 ∈ ℝ Goal : Fit a polynomial ො 𝑕 𝑛,𝑒 of degree 𝑒 to the data that minimizes 𝑗 − 𝑕(𝑌 𝑗 ) 2 σ 𝑗=1…𝑛 𝑍 and that has certain constraints on its shape . 2

Shape-constrained regression (2/2) Convexity over B Monotonicity over B Lipschitz with constant 𝑳 For any function 𝑔 and a fixed For a continuously For a full-dimensional box scalar 𝐿 > 0: differentiable function 𝑔: B and a twice-continuously differentiable function 𝑔: 𝑔 is Lipschitz with constant 𝑔 is increasing K ⇔ (resp. decreasing) 𝑔 is convex over 𝐶 ⇔ 𝑔 𝑦 − 𝑔 𝑧 ≤ 𝐿 𝑦 − 𝑧 , in component 𝑦 𝑗 ⇔ ∇ 2 𝑔 𝑦 ≽ 0, ∀𝑦 ∈ 𝐶 ∀𝑦, 𝑧 ∈ 𝐶 𝜖𝑔 𝑦 ≥ 0 𝑠𝑓𝑡𝑞. ≤ 0 , ∀𝑦 ∈ 𝐶 𝜖𝑦 𝑗 Use as a regularizer: Example: stops 𝑔 from growing too Example: steeply • Price of a car as a function • Demand as a function of of age price Focus on convex regression here. 3

Convex regression – possible candidate A candidate for our regressor: 2 𝑕 σ 𝑗=1..𝑛 𝑍 𝑕 𝑛,𝑒 𝑦 ≔ arg min ො 𝑗 − 𝑕 𝑌 𝑗 s.t. 𝑕 is a polynomial of degree 𝑒 ∇ 2 𝑕 𝑦 ≽ 0, ∀𝑦 ∈ 𝐶 But… Theorem [Ahmadi, H.]: It is (strongly) NP-hard to test whether a polynomial 𝑞 of degree ≥ 3 is convex over a box 𝐶. (Reduction from problem of testing whether a matrix whose entries are affine polynomials in 𝑦 is positive semidefinite for all 𝑦 in 𝐶. ) 4

A detour via sum of squares (1/5) ∇ 2 𝑕(𝑦) ≽ 0, 𝑧 𝑈 ∇ 2 𝑕(𝑦)𝑧 ≥ 0, 𝑕(𝑦) convex ⇔ ⇔ ∀𝑦 ∈ 𝐶, ∀𝑧 ∈ ℝ 𝑜 ∀𝑦 ∈ 𝐶 over B Polynomial in 𝒚 and 𝒛 • If we can find a way of imposing that a polynomial be nonnegative, then we are in business! • Unfortunately, hard to test whether a polynomial 𝑞 is nonnegative for degree of 𝑞 ≥ 4. • What to do? 5

A detour via sum of squares (2/5) Idea Find a property that implies nonnegativity but that is easy to test. ② ① = sum of squares (sos) Definition: A polynomial 𝑞 is sos if it can be written as 𝑞 𝑦 = σ 𝑗 𝑟 𝑗 𝑦 2 . ① Yes! Nonnegative polynomials Even equal sometimes : 𝑜 = 1, 𝑒 = 2, 𝑜, 𝑒 = (3,4) [Hilbert] Sos polynomials What about ② ? Also yes! Let’s see why. 6

A detour via sum of squares (3/5) A polynomial 𝑞(𝑦) of degree 2d is sos if and only if ∃𝑅 ≽ 0 such that 𝑞 𝑦 = 𝒜 𝒚 𝑼 𝑹𝒜(𝒚) 𝑒 𝑈 is the vector of monomials of degree up to 𝑒. where 𝑨 = 1, 𝑦 1 , … , 𝑦 𝑜 , 𝑦 1 𝑦 2 , … , 𝑦 𝑜 4 − 6𝑦 1 2 + 9𝑦 1 2 − 6𝑦 1 2 + 4𝑦 1 𝑦 3 3 𝑦 2 + 2𝑦 1 3 𝑦 3 + 6𝑦 1 2 𝑦 3 2 𝑦 2 2 𝑦 2 𝑦 3 − 14𝑦 1 𝑦 2 𝑦 3 3 Ex: 𝑞 𝑦 = 𝑦 1 4 − 7𝑦 2 2 + 16𝑦 2 2 𝑦 3 4 +5𝑦 3 2 2 + 𝑦 1 𝑦 3 − 𝑦 2 𝑦 3 2 + 4𝑦 2 2 2 2 − 3𝑦 1 𝑦 2 + 𝑦 1 𝑦 3 + 2𝑦 3 2 − 𝑦 3 = 𝑦 1 T 𝟑 𝟑 𝒚 𝟐 𝒚 𝟐 𝟐 −𝟒 𝟏 𝟐 𝟏 𝟑 𝒚 𝟐 𝒚 𝟑 𝒚 𝟐 𝒚 𝟑 −𝟒 𝟘 𝟏 −𝟒 𝟏 −𝟕 𝟑 𝟑 𝒚 𝟑 𝒚 𝟑 𝟏 𝟏 𝟐𝟕 𝟏 𝟏 −𝟓 = 𝒚 𝟐 𝒚 𝟒 𝟐 −𝟒 𝟏 𝟑 −𝟐 𝟑 𝒚 𝟐 𝒚 𝟒 𝒚 𝟑 𝒚 𝟒 𝟏 𝟏 𝟏 −𝟐 𝟐 𝟏 𝒚 𝟑 𝒚 𝟒 𝟑 −𝟕 𝟓 𝟑 𝟏 𝟔 𝟑 𝟑 𝒚 𝟒 𝒚 𝟒 7

A detour via sum of squares (4/5) • Testing if a polynomial is sos is a semidefinite program (SDP). min 𝑅 0 Linear equations involving s.t. 𝑞 𝑦 = 𝑨 𝑦 𝑈 𝑅𝑨 𝑦 ∀𝑦 coefficients of 𝑞 and entries of 𝑅 𝑅 ≽ 0 • In fact, even optimizing over the set of sos polynomials (of fixed degree) is an SDP. c 1 ,c 2 ,𝑅 𝑑 1 + 𝑑 2 min c 1 ,c 2 𝑑 1 + 𝑑 2 min 𝑡. 𝑢. 𝑑 1 − 3𝑑 2 = 4 𝑡. 𝑢. 𝑑 1 − 3𝑑 2 = 4 2 − 2𝑑 2 𝑦 1 𝑦 2 + 5𝑦 2 4 = 𝑨 𝑦 𝑈 𝑅𝑨 𝑦 𝑑 1 𝑦 1 2 − 2𝑑 2 𝑦 1 𝑦 2 + 5𝑦 2 4 sos 𝑑 1 𝑦 1 𝑅 ≽ 0 8

A detour via sum of squares (5/5) • Slight subtlety here: ∇ 2 𝑕(𝑦) ≽ 0, 𝑧 𝑈 ∇ 2 𝑕(𝑦)𝑧 ≥ 0, 𝑕(𝑦) convex ⇔ ⇔ ∀𝒚 ∈ 𝑪, ∀𝑧 ∈ ℝ 𝑜 over B ∀𝑦 ∈ 𝐶 • How to impose nonnegativity over a set ? Theorem [Putinar ’93]: For a box 𝐶 = 𝑦 1 , … , 𝑦 𝑜 𝑚 1 ≤ 𝑦 1 ≤ 𝑣 1 , … , 𝑚 𝑜 ≤ 𝑦 𝑜 ≤ 𝑣 𝑜 } , we write instead: 𝑧 𝑈 ∇ 2 𝑕 𝑦 𝑧 = 𝜏 0 𝑦, 𝑧 + 𝜏 1 𝑦, 𝑧 𝑣 1 − 𝑦 1 𝑦 1 − 𝑚 1 + ⋯ + 𝜏 𝑜 𝑦, 𝑧 𝑣 𝑜 − 𝑦 𝑜 (𝑦 𝑜 − 𝑚 𝑜 ) where 𝜏 0 𝑦, 𝑧 , 𝜏 1 𝑦, 𝑧 , … 𝜏 𝑜 (𝑦, 𝑧) are sos polynomials in 𝑦 and 𝑧 9

Convex regression – a new candidate A new candidate for the regressor: 2 𝑕,𝜏 0 ,…𝜏 𝑜 σ 𝑗=1..𝑛 𝑍 𝑕 𝑛,𝑒,𝑠 𝑦 ≔ arg ෤ min 𝑗 − 𝑕 𝑌 𝑗 s.t. 𝑕 is a polynomial of degree 𝑒 𝑧 𝑈 ∇ 2 𝑕 𝑦 𝑧 = 𝜏 0 𝑦, 𝑧 + ⋯ + 𝜏 𝑜 𝑦, 𝑧 𝑣 𝑜 − 𝑦 𝑜 𝑦 𝑜 − 𝑚 𝑜 𝜏 0 𝑦, 𝑧 , 𝜏 1 𝑦, 𝑧 , … , 𝜏 𝑜 (𝑦, 𝑧) are sos of degree 𝑠 in 𝑦 (and 2 in 𝑧) • When 𝑠 is fixed, this is a semidefinite program to solve. • As 𝑠 → ∞ , we recover ො 𝑕 𝑛,𝑒 . 10

Comparison with existing methods Our method Existing method [Lim & Glynn, Seijo & Sen] • Semidefinite program to obtain estimator • Quadratic program to obtain estimator • Number of datapoints does not impact size • Number of variables (resp. constraints) scales of semidefinite program linearly (resp. quadratically) with number of datapoints • Size of the semidefinite program scales • Obtaining a prediction: requires solving a linear polynomially in number of features . program • Obtaining a prediction: evaluation of our • Piecewise affine estimator (can be smoothed polynomial estimator see [Mazumder et al.]) • Smooth estimator • Can be combined with monotonicity constraints • Can be combined with monotonocity (see [Lim & Glynn]) and Lipschitz constraints constraints and Lipschitz constraints (see [Mazumder et al.]) 11

Consistency of ෤ 𝑕 𝑛,𝑒,𝑠 (1/4) • Estimator of [Lim & Glynn, Seijo & Sen] is shown to be consistent. What about ours? Assumptions on the data: For 𝒀 𝒋 For 𝒁 𝒋 For 𝒈 𝑍 𝑗 = 𝑔 𝑌 𝑗 + 𝜗 𝑗 for 𝑗 = 1, … , 𝑛 𝑌 𝑗 are iid, with support 𝐶 , and 𝑔 is twice continuously 2 < ∞ 2 < ∞ 𝐹 𝑌 𝑗 with 𝐹 𝜗 𝑗 𝑌 𝑗 = 0 a.s. and 𝐹 𝜗 𝑗 differentiable and convex over 𝐶 Theorem [Curmei, H.] The regressor ෤ 𝑕 𝑛,𝑒,𝑠 is a consistent estimator of 𝑔 over any compact 𝐷 ⊂ 𝐶, i.e., sup 𝑕 𝑛,𝑒,𝑠 (𝑦) − 𝑔 𝑦 ෤ → 0 a.s., when 𝑒, 𝑛, 𝑠 → ∞ 𝑦∈𝐷 12

Consistency of ො 𝑕 𝑛,𝑒 (2/4) Proof ideas: inspired by [Lim and Glynn, OR’12] 1. Write 𝑔 𝑦 − ෤ 𝑕 𝑛,𝑒,𝑠 (𝑦) ≤ 𝑔 𝑦 − ො 𝑕 𝑛,𝑒 𝑦 + ො 𝑕 𝑛,𝑒 𝑦 − ෤ 𝑕 𝑛,𝑒,𝑠 𝑦 Can show sup 𝑕 𝑛,𝑒 𝑦 − ෤ ො 𝑕 𝑛,𝑒,𝑠 𝑦 → 0 when 𝑠 → ∞ 𝑦∈𝐷 2. Introduce a polynomial approximation of 𝑔 : for any 𝜗 > 0 , ∃ 𝑒 and a convex polynomial 𝑕 𝑒 of degree 𝑒 such that sup 𝑔 𝑦 − 𝑕 𝑒 𝑦 < 𝜗 𝑦∈𝐷 13

Consistency of ො 𝑕 𝑛,𝑒 (3/4) 3. For 𝑦 ∈ 𝐷 and "𝑌 𝑗 close to 𝑦": 𝑔 𝑦 − ො 𝑕 𝑛,𝑒 𝑦 ≤ 𝑔 𝑦 − 𝑕 𝑒 𝑦 + 𝑕 𝑒 𝑦 − 𝑕 𝑒 𝑌 𝑗 + 𝑕 𝑒 𝑌 𝑗 − ො 𝑕 𝑛,𝑒 𝑌 𝑗 + | ො 𝑕 𝑛,𝑒 𝑌 𝑗 − ො 𝑕 𝑛,𝑒 𝑦 | Upper bound with 𝜗 Show that ො 𝑕 𝑛,𝑒 is Lipschitz (uniformly in 𝑛) (bound | ො 𝑕 𝑛,𝑒 | over 𝐷 unif. in Show that 𝑕 𝑒 is Lipschitz 𝑛 and use convexity) (use convexity of 𝑕 𝑒 over B) Upper bound this (algebra) by 2 1 𝑛 σ 𝑗=1..𝑛 𝑕 𝑒 𝑌 𝑗 − ො 𝑕 𝑛,𝑒 𝑌 𝑗 2 → 0 a.s. when 𝑛 → ∞ 1 𝑛 σ 𝑗=1..𝑛 𝑕 𝑒 𝑌 𝑗 − ො Remains to show that 𝑕 𝑛,𝑒 𝑌 𝑗 14

Consistency of ො 𝑕 𝑛,𝑒 (4/4) 2 → 0 a.s. when 𝑛 → ∞ 1 𝑛 σ 𝑗=1..𝑛 𝑕 𝑒 𝑌 𝑗 − ො 𝑕 𝑛,𝑒 𝑌 𝑗 3. Show that 2 to obtain • Use the fact that ො 𝑕 𝑛,𝑒 is a minimizer of σ 𝑗 𝑍 𝑗 − 𝑕 𝑌 𝑗 2 ≤ 1 2 𝑛 σ 𝑗=1..𝑛 𝑕 𝑒 𝑌 𝑗 − ො 𝑛 σ 𝑗 𝑍 𝑕 𝑛,𝑒 𝑌 𝑗 𝑗 − 𝑕 𝑒 𝑌 𝑗 ⋅ ( ො 𝑕 𝑛,𝑒 𝑌 𝑗 − 𝑕 𝑒 𝑌 𝑗 ) • Can’t use SLLN because ො 𝑕 𝑛,𝑒 is a polynomial that depends on 𝑌 𝑗 and 𝑍 𝑗 • Idea: approximate ෝ 𝒉 𝒏,𝒆 by a deterministic function which is bounded over 𝐷 . • Show that ො 𝑕 𝑛,𝑒 belongs (for large enough 𝑛) to a compact set whose elements are bounded over 𝐷 • Construct 𝜗 -net of this set • Replace ො 𝑕 𝑛,𝑒 by an element of this set which is 𝜗 -close and bounded over 𝐷 • Use SLLN now with 𝒁 𝒋 − 𝒉 𝒆 𝒀 𝒋 ≈ 𝝑 𝒋 15

and sum of squares polynomials Georgina Hall INSEAD, Decision - PowerPoint PPT Presentation

Shape-constrained regression and sum of squares polynomials Georgina Hall INSEAD, Decision Sciences Joint work with Mihaela Curmei (Berkeley, EECS) 1 Shape-constrained regression (1/2) =1,, where

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

Sums of Squares Bianca Homberg and Minna Liu June 24, 2010 Abstract For our exploration topic,

ex Addition: 1-bit half adder A + Sum B Carry out Carry A B Sum out 0 0 A 0 1 Sum

Squares of function spaces and function spaces on squares Miko laj Krupski University of

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Hyperbolic Polynomials, Interlacers, and Sums of Squares Cynthia Vinzant University of Michigan

Basic Ruby Syntax sum = 0 Newline is statement separator i = 1 while i <= 10 do sum += i*i

Small-span characteristic polynomials of integer symmetric matrices James McKee (RHUL) ANTS 9,

A monolithic recursive solu#on A monolithic solu#on that counts up This starts at n, counts down

Lecture 1: Introduction to the Sum of Squares Hierarchy Lecture Outline Part I:

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Whats My Identity? By Miss Elliott Squares vs. Rectangles Squares Rectangles 4 sides

Group embeddings of partial Latin squares Ian Wanless Monash University Latin squares Latin

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

Dixons random squares method Last time we discuss Dixons random squares method to

Basic Ruby Syntax No variable declarations sum = 0 Newline is statement separator i = 1 while

Lowe and Cobb-Douglas CPIs and their substitution bias Bert M. Balk Statistics Netherlands and

MATHEMATICS 1 CONTENTS Unconstrained optimization Constrained optimization Lagrange method

Use of Symmetries in Additivity How Utility u Depends . . . Economics Probabilistic Choice

Coalition : a simple and useful tool to distribute R-works on a set of computers Marie-Pierre

G30D, Week 2 Martin K. Jensen (U. Bham) October 2012 Martin K. Jensen (U. Bham) G30D,

Endogenous labour supply, endogenous lifetime and economic growth: local and global indeterminacy

Convergence of Ttonnement in Fisher Markets P R E S E N T E D B Y N O A A V I G D O R - E L G

Welfare stigma allowing for psychological and cultural effects. An Agent-Based simulation study.

Sambuz

Useful Links

Newsletter

Mail Us

and sum of squares polynomials Georgina Hall INSEAD, Decision - PowerPoint PPT Presentation

Shape-constrained regression and sum of squares polynomials Georgina Hall INSEAD, Decision Sciences Joint work with Mihaela Curmei (Berkeley, EECS) 1 Shape-constrained regression (1/2) =1,, where

The Mathemagic of Magic Squares History of Magic Squares Mathematics and Magic Squares

Sums of Squares Bianca Homberg and Minna Liu June 24, 2010 Abstract For our exploration topic,

ex Addition: 1-bit half adder A + Sum B Carry out Carry A B Sum out 0 0 A 0 1 Sum

Squares of function spaces and function spaces on squares Miko laj Krupski University of

Practical Least-Squares for Computer Graphics Siggraph Course 11 Siggraph Course 11 Practical

Hyperbolic Polynomials, Interlacers, and Sums of Squares Cynthia Vinzant University of Michigan

Basic Ruby Syntax sum = 0 Newline is statement separator i = 1 while i &lt;= 10 do sum += i*i

Small-span characteristic polynomials of integer symmetric matrices James McKee (RHUL) ANTS 9,

A monolithic recursive solu#on A monolithic solu#on that counts up This starts at n, counts down

Lecture 1: Introduction to the Sum of Squares Hierarchy Lecture Outline Part I:

Statistical Properties of the Regularized Least Squares Functional and a hybrid LSQR Newton method

Whats My Identity? By Miss Elliott Squares vs. Rectangles Squares Rectangles 4 sides

Group embeddings of partial Latin squares Ian Wanless Monash University Latin squares Latin

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression

Dixons random squares method Last time we discuss Dixons random squares method to

Basic Ruby Syntax No variable declarations sum = 0 Newline is statement separator i = 1 while

Lowe and Cobb-Douglas CPIs and their substitution bias Bert M. Balk Statistics Netherlands and

MATHEMATICS 1 CONTENTS Unconstrained optimization Constrained optimization Lagrange method

Use of Symmetries in Additivity How Utility u Depends . . . Economics Probabilistic Choice

Coalition : a simple and useful tool to distribute R-works on a set of computers Marie-Pierre

G30D, Week 2 Martin K. Jensen (U. Bham) October 2012 Martin K. Jensen (U. Bham) G30D,

Endogenous labour supply, endogenous lifetime and economic growth: local and global indeterminacy

Convergence of Ttonnement in Fisher Markets P R E S E N T E D B Y N O A A V I G D O R - E L G

Welfare stigma allowing for psychological and cultural effects. An Agent-Based simulation study.

Sambuz

Useful Links

Newsletter

Mail Us

Basic Ruby Syntax sum = 0 Newline is statement separator i = 1 while i <= 10 do sum += i*i