and sum of squares polynomials Georgina Hall INSEAD, Decision - - PowerPoint PPT Presentation

β–Ά
and sum of squares polynomials
SMART_READER_LITE
LIVE PREVIEW

and sum of squares polynomials Georgina Hall INSEAD, Decision - - PowerPoint PPT Presentation

Shape-constrained regression and sum of squares polynomials Georgina Hall INSEAD, Decision Sciences Joint work with Mihaela Curmei (Berkeley, EECS) 1 Shape-constrained regression (1/2) =1,, where


slide-1
SLIDE 1

Shape-constrained regression and sum of squares polynomials

Georgina Hall INSEAD, Decision Sciences Joint work with Mihaela Curmei (Berkeley, EECS)

1

slide-2
SLIDE 2

Shape-constrained regression (1/2)

Data: π‘Œπ‘—, 𝑍

𝑗 𝑗=1,…,𝑛 where π‘Œπ‘— ∈ 𝐢 βŠ‚ β„π‘œ (𝐢 is a box) and 𝑍 𝑗 ∈ ℝ

Goal: Fit a polynomial ො 𝑕𝑛,𝑒 of degree 𝑒 to the data that minimizes σ𝑗=1…𝑛 𝑍

𝑗 βˆ’ 𝑕(π‘Œπ‘—) 2

and that has certain constraints on its shape.

2

slide-3
SLIDE 3

Shape-constrained regression (2/2)

3

Convexity over B Monotonicity over B Lipschitz with constant 𝑳 For a full-dimensional box B and a twice-continuously differentiable function 𝑔: 𝑔 is convex over 𝐢 ⇔ βˆ‡2𝑔 𝑦 ≽ 0, βˆ€π‘¦ ∈ 𝐢 Example:

  • Price of a car as a function
  • f age

For a continuously differentiable function 𝑔: 𝑔 is increasing (resp. decreasing) in component 𝑦𝑗 ⇔

πœ–π‘” 𝑦 πœ–π‘¦π‘— β‰₯ 0 π‘ π‘“π‘‘π‘ž. ≀ 0 , βˆ€π‘¦ ∈ 𝐢

Example:

  • Demand as a function of

price For any function 𝑔 and a fixed scalar 𝐿 > 0: 𝑔 is Lipschitz with constant K ⇔ 𝑔 𝑦 βˆ’ 𝑔 𝑧 ≀ 𝐿 𝑦 βˆ’ 𝑧 , βˆ€π‘¦, 𝑧 ∈ 𝐢 Use as a regularizer: stops 𝑔 from growing too steeply Focus on convex regression here.

slide-4
SLIDE 4

Convex regression – possible candidate

4

A candidate for our regressor: ො 𝑕𝑛,𝑒 𝑦 ≔ arg min

𝑕 σ𝑗=1..𝑛 𝑍 𝑗 βˆ’ 𝑕 π‘Œπ‘— 2

s.t. 𝑕 is a polynomial of degree 𝑒 βˆ‡2𝑕 𝑦 ≽ 0, βˆ€π‘¦ ∈ 𝐢 But… Theorem [Ahmadi, H.]: It is (strongly) NP-hard to test whether a polynomial π‘ž of degree β‰₯ 3 is convex over a box 𝐢. (Reduction from problem of testing whether a matrix whose entries are affine polynomials in 𝑦 is positive semidefinite for all 𝑦 in 𝐢.)

slide-5
SLIDE 5

A detour via sum of squares (1/5)

5

  • If we can find a way of imposing that a polynomial be nonnegative,

then we are in business!

  • Unfortunately, hard to test whether a polynomial π‘ž is nonnegative

for degree of π‘ž β‰₯ 4.

  • What to do?

𝑕(𝑦) convex

  • ver B

⇔ π‘§π‘ˆβˆ‡2𝑕(𝑦)𝑧 β‰₯ 0, βˆ€π‘¦ ∈ 𝐢, βˆ€π‘§ ∈ β„π‘œ βˆ‡2𝑕(𝑦) ≽ 0, βˆ€π‘¦ ∈ 𝐢 ⇔ Polynomial in π’š and 𝒛

slide-6
SLIDE 6

A detour via sum of squares (2/5)

6

Idea Find a property that implies nonnegativity but that is easy to test. = sum of squares (sos) Definition: A polynomial π‘ž is sos if it can be written as π‘ž 𝑦 = σ𝑗 π‘Ÿπ‘— 𝑦 2.

Sos polynomials Nonnegative polynomials

β‘  β‘‘ β‘  Yes! Even equal sometimes : π‘œ = 1, 𝑒 = 2, π‘œ, 𝑒 = (3,4) [Hilbert] What about β‘‘? Also yes! Let’s see why.

slide-7
SLIDE 7

A detour via sum of squares (3/5)

A polynomial π‘ž(𝑦) of degree 2d is sos if and only if βˆƒπ‘… ≽ 0 such that where 𝑨 = 1, 𝑦1, … , π‘¦π‘œ, 𝑦1𝑦2, … , π‘¦π‘œ

𝑒 π‘ˆ is the vector of monomials of degree up to 𝑒.

π‘ž 𝑦 = 𝑦1

4 βˆ’ 6𝑦1 3𝑦2 + 2𝑦1 3𝑦3 + 6𝑦1 2𝑦3 2 + 9𝑦1 2𝑦2 2 βˆ’ 6𝑦1 2𝑦2𝑦3 βˆ’ 14𝑦1𝑦2𝑦3 2 + 4𝑦1𝑦3 3

+5𝑦3

4 βˆ’ 7𝑦2 2𝑦3 2 + 16𝑦2 4

= 𝑦1

2 βˆ’ 3𝑦1𝑦2 + 𝑦1𝑦3 + 2𝑦3 2 2 + 𝑦1𝑦3 βˆ’ 𝑦2𝑦3 2 + 4𝑦2 2 βˆ’ 𝑦3 2 2

Ex:

= π’šπŸ

πŸ‘

π’šπŸπ’šπŸ‘ π’šπŸ‘

πŸ‘

π’šπŸπ’šπŸ’ π’šπŸ‘π’šπŸ’ π’šπŸ’

πŸ‘ T

𝟐 βˆ’πŸ’ 𝟏 𝟐 𝟏 πŸ‘ βˆ’πŸ’ 𝟘 𝟏 βˆ’πŸ’ 𝟏 βˆ’πŸ• 𝟏 𝟏 πŸπŸ• 𝟏 𝟏 βˆ’πŸ“ 𝟐 βˆ’πŸ’ 𝟏 πŸ‘ βˆ’πŸ πŸ‘ 𝟏 𝟏 𝟏 βˆ’πŸ 𝟐 𝟏 πŸ‘ βˆ’πŸ• πŸ“ πŸ‘ 𝟏 πŸ” π’šπŸ

πŸ‘

π’šπŸπ’šπŸ‘ π’šπŸ‘

πŸ‘

π’šπŸπ’šπŸ’ π’šπŸ‘π’šπŸ’ π’šπŸ’

πŸ‘

π‘ž 𝑦 = π’œ π’š π‘Όπ‘Ήπ’œ(π’š)

7

slide-8
SLIDE 8

A detour via sum of squares (4/5)

  • Testing if a polynomial is sos is a semidefinite program (SDP).
  • In fact, even optimizing over the set of sos polynomials (of fixed

degree) is an SDP.

8

min

𝑅 0

s.t. π‘ž 𝑦 = 𝑨 𝑦 π‘ˆπ‘…π‘¨ 𝑦 βˆ€π‘¦ 𝑅 ≽ 0

Linear equations involving coefficients of π‘ž and entries of 𝑅

min

c1,c2 𝑑1 + 𝑑2

𝑑. 𝑒. 𝑑1 βˆ’ 3𝑑2 = 4 𝑑1𝑦1

2 βˆ’ 2𝑑2𝑦1𝑦2 + 5𝑦2 4 sos

min

c1,c2,𝑅 𝑑1 + 𝑑2

𝑑. 𝑒. 𝑑1 βˆ’ 3𝑑2 = 4 𝑑1𝑦1

2 βˆ’ 2𝑑2𝑦1𝑦2 + 5𝑦2 4 = 𝑨 𝑦 π‘ˆπ‘…π‘¨ 𝑦

𝑅 ≽ 0

slide-9
SLIDE 9

A detour via sum of squares (5/5)

  • Slight subtlety here:
  • How to impose nonnegativity over a set?

9

𝑕(𝑦) convex

  • ver B

⇔ π‘§π‘ˆβˆ‡2𝑕(𝑦)𝑧 β‰₯ 0, βˆ€π’š ∈ π‘ͺ, βˆ€π‘§ ∈ β„π‘œ βˆ‡2𝑕(𝑦) ≽ 0, βˆ€π‘¦ ∈ 𝐢 ⇔ Theorem [Putinar ’93]: For a box 𝐢 = 𝑦1, … , π‘¦π‘œ π‘š1 ≀ 𝑦1 ≀ 𝑣1, … , π‘šπ‘œ ≀ π‘¦π‘œ ≀ π‘£π‘œ}, we write instead: π‘§π‘ˆβˆ‡2𝑕 𝑦 𝑧 = 𝜏0 𝑦, 𝑧 + 𝜏1 𝑦, 𝑧 𝑣1 βˆ’ 𝑦1 𝑦1 βˆ’ π‘š1 + β‹― + πœπ‘œ 𝑦, 𝑧 π‘£π‘œ βˆ’ π‘¦π‘œ (π‘¦π‘œ βˆ’ π‘šπ‘œ) where 𝜏0 𝑦, 𝑧 , 𝜏1 𝑦, 𝑧 , … πœπ‘œ(𝑦, 𝑧) are sos polynomials in 𝑦 and 𝑧

slide-10
SLIDE 10

Convex regression – a new candidate

10

A new candidate for the regressor: ΰ·€ 𝑕𝑛,𝑒,𝑠 𝑦 ≔ arg min

𝑕,𝜏0,β€¦πœπ‘œ σ𝑗=1..𝑛 𝑍 𝑗 βˆ’ 𝑕 π‘Œπ‘— 2

s.t. 𝑕 is a polynomial of degree 𝑒 π‘§π‘ˆβˆ‡2𝑕 𝑦 𝑧 = 𝜏0 𝑦, 𝑧 + β‹― + πœπ‘œ 𝑦, 𝑧 π‘£π‘œ βˆ’ π‘¦π‘œ π‘¦π‘œ βˆ’ π‘šπ‘œ 𝜏0 𝑦, 𝑧 , 𝜏1 𝑦, 𝑧 , … , πœπ‘œ(𝑦, 𝑧) are sos of degree 𝑠 in 𝑦 (and 2 in 𝑧)

  • When 𝑠 is fixed, this is a semidefinite program to solve.
  • As 𝑠 β†’ ∞, we recover ො

𝑕𝑛,𝑒.

slide-11
SLIDE 11

Comparison with existing methods

11

Our method Existing method

[Lim & Glynn, Seijo & Sen]

  • Semidefinite program to obtain estimator
  • Number of datapoints does not impact size
  • f semidefinite program
  • Size of the semidefinite program scales

polynomially in number of features.

  • Obtaining a prediction: evaluation of our

polynomial estimator

  • Smooth estimator
  • Can be combined with monotonocity

constraints and Lipschitz constraints

  • Quadratic program to obtain estimator
  • Number of variables (resp. constraints) scales

linearly (resp. quadratically) with number of datapoints

  • Obtaining a prediction: requires solving a linear

program

  • Piecewise affine estimator (can be smoothed

see [Mazumder et al.])

  • Can be combined with monotonicity constraints

(see [Lim & Glynn]) and Lipschitz constraints (see [Mazumder et al.])

slide-12
SLIDE 12

Consistency of ΰ·€ 𝑕𝑛,𝑒,𝑠 (1/4)

  • Estimator of [Lim & Glynn, Seijo & Sen] is shown to be consistent.

What about ours?

12

Theorem [Curmei, H.] The regressor ΰ·€ 𝑕𝑛,𝑒,𝑠 is a consistent estimator of 𝑔 over any compact 𝐷 βŠ‚ 𝐢, i.e., sup

π‘¦βˆˆπ·

ΰ·€ 𝑕𝑛,𝑒,𝑠(𝑦) βˆ’ 𝑔 𝑦 β†’ 0 a.s., when 𝑒, 𝑛, 𝑠 β†’ ∞

For 𝒀𝒋 π‘Œπ‘— are iid, with support 𝐢, and 𝐹 π‘Œπ‘—

2 < ∞

For 𝒁𝒋 𝑍

𝑗 = 𝑔 π‘Œπ‘— + πœ—π‘— for 𝑗 = 1, … , 𝑛

with 𝐹 πœ—π‘— π‘Œπ‘— = 0 a.s. and 𝐹 πœ—π‘—

2 < ∞

For π’ˆ 𝑔 is twice continuously differentiable and convex over 𝐢

Assumptions on the data:

slide-13
SLIDE 13

Consistency of ො 𝑕𝑛,𝑒 (2/4)

Proof ideas: inspired by [Lim and Glynn, OR’12]

  • 1. Write

𝑔 𝑦 βˆ’ ΰ·€ 𝑕𝑛,𝑒,𝑠(𝑦) ≀ 𝑔 𝑦 βˆ’ ො 𝑕𝑛,𝑒 𝑦 + ො 𝑕𝑛,𝑒 𝑦 βˆ’ ΰ·€ 𝑕𝑛,𝑒,𝑠 𝑦

  • 2. Introduce a polynomial approximation of 𝑔: for any πœ— > 0, βˆƒ 𝑒 and

a convex polynomial 𝑕𝑒 of degree 𝑒 such that sup

π‘¦βˆˆπ·

𝑔 𝑦 βˆ’ 𝑕𝑒 𝑦 < πœ—

13

Can show sup

π‘¦βˆˆπ·

ො 𝑕𝑛,𝑒 𝑦 βˆ’ ΰ·€ 𝑕𝑛,𝑒,𝑠 𝑦 β†’ 0 when 𝑠 β†’ ∞

slide-14
SLIDE 14

Consistency of ො 𝑕𝑛,𝑒 (3/4)

  • 3. For 𝑦 ∈ 𝐷 and "π‘Œπ‘— close to 𝑦":

𝑔 𝑦 βˆ’ ො 𝑕𝑛,𝑒 𝑦 ≀ 𝑔 𝑦 βˆ’ 𝑕𝑒 𝑦 + 𝑕𝑒 𝑦 βˆ’ 𝑕𝑒 π‘Œπ‘— + 𝑕𝑒 π‘Œπ‘— βˆ’ ො 𝑕𝑛,𝑒 π‘Œπ‘— + | ො 𝑕𝑛,𝑒 π‘Œπ‘— βˆ’ ො 𝑕𝑛,𝑒 𝑦 | Remains to show that

1 𝑛 σ𝑗=1..𝑛 𝑕𝑒 π‘Œπ‘— βˆ’ ො

𝑕𝑛,𝑒 π‘Œπ‘—

2 β†’ 0 a.s. when 𝑛 β†’ ∞

14

Upper bound with πœ—

Show that 𝑕𝑒 is Lipschitz (use convexity of 𝑕𝑒 over B) Show that ො 𝑕𝑛,𝑒 is Lipschitz (uniformly in 𝑛) (bound | ො 𝑕𝑛,𝑒| over 𝐷 unif. in 𝑛 and use convexity) Upper bound this (algebra) by

1 𝑛 σ𝑗=1..𝑛 𝑕𝑒 π‘Œπ‘— βˆ’ ො

𝑕𝑛,𝑒 π‘Œπ‘—

2

slide-15
SLIDE 15

Consistency of ො 𝑕𝑛,𝑒 (4/4)

  • 3. Show that

1 𝑛 σ𝑗=1..𝑛 𝑕𝑒 π‘Œπ‘— βˆ’ ො

𝑕𝑛,𝑒 π‘Œπ‘—

2 β†’ 0 a.s. when 𝑛 β†’ ∞

  • Use the fact that ො

𝑕𝑛,𝑒 is a minimizer of σ𝑗 𝑍

𝑗 βˆ’ 𝑕 π‘Œπ‘— 2 to obtain 1 𝑛 σ𝑗=1..𝑛 𝑕𝑒 π‘Œπ‘— βˆ’ ො

𝑕𝑛,𝑒 π‘Œπ‘—

2 ≀ 2 𝑛 σ𝑗 𝑍 𝑗 βˆ’ 𝑕𝑒 π‘Œπ‘—

β‹… ( ො 𝑕𝑛,𝑒 π‘Œπ‘— βˆ’ 𝑕𝑒 π‘Œπ‘— )

  • Can’t use SLLN because ො

𝑕𝑛,𝑒 is a polynomial that depends on π‘Œπ‘— and 𝑍

𝑗

  • Idea: approximate ෝ

𝒉𝒏,𝒆 by a deterministic function which is bounded over 𝐷.

  • Show that ො

𝑕𝑛,𝑒 belongs (for large enough 𝑛) to a compact set whose elements are bounded over 𝐷

  • Construct πœ—-net of this set
  • Replace ො

𝑕𝑛,𝑒 by an element of this set which is πœ—-close and bounded over 𝐷

  • Use SLLN now with 𝒁𝒋 βˆ’ 𝒉𝒆 𝒀𝒋 β‰ˆ 𝝑𝒋

15

slide-16
SLIDE 16

Experiments: synthetic data (1/2)

16

𝑍

𝑗 = 𝑔 π‘Œπ‘— + 𝝑 β‹… 𝜏 ΰ΄€

𝑍 β‹… πœ‰π‘—

  • 𝑔 is a convex and monotonous

function

  • πœ‰π‘— ∼ 𝑂 0,1
  • 𝜏(ΰ΄€

𝑍) is the (empirical) standard deviation of (𝑔 π‘Œ1 , … , 𝑔 π‘Œπ‘œ )

slide-17
SLIDE 17

Experiments: synthetic data (2/2)

17

slide-18
SLIDE 18

Experiments: production functions (1/3)

  • In economics, production output (Out) is a function of labor (L),

capital (K) and intermediate goods (I).

  • Out is assumed to be concave in L,K,I (diminishing returns)
  • Out is assumed to be monotonous in L,K,I
  • K,L,I and Out available in the KLEMS dataset for 65 different

industries

18

slide-19
SLIDE 19

Experiments: production functions (2/3)

  • How are production functions fitted to data in economics generally?

19

Cobb Douglas functions 𝑃𝑣𝑒 = 𝑏 β‹… 𝐿𝑐𝑀𝑑𝐽𝑒 𝑏 > 0, b, c, d > 0 and b + c + 𝑒 ≀ 1 β‡’ concave +monotone Fit in log-space: linear regression

slide-20
SLIDE 20

Experiments: production functions (3/3)

20

Outperforms Cobb-Douglas on 50/65 industries.

slide-21
SLIDE 21

Main messages

  • Shape-constrained regression: convexity, monotonicity, Lipschitz
  • Discussed convex regression (results also hold for other shape

constraints)

  • Proposed estimator: polynomial, obtained via SDP, consistent
  • Numerical experiments: synthetic + production functions in economics

21

slide-22
SLIDE 22

Thank you for listening

Questions? Want to know more? https://sites.google.com/view/georgina-hall

22