Inverse Optimization with Online Data and Multiobjectives: Models, - - PowerPoint PPT Presentation

inverse optimization with online data and multiobjectives
SMART_READER_LITE
LIVE PREVIEW

Inverse Optimization with Online Data and Multiobjectives: Models, - - PowerPoint PPT Presentation

Inverse Optimization with Online Data and Multiobjectives: Models, Insights and Algorithms Bo Zeng Department of Industrial Engineering, University of Pittsburgh Joint work with Chaosheng Dong, currently at Amazon November 13, 2020, Texas


slide-1
SLIDE 1

Inverse Optimization with Online Data and Multiobjectives: Models, Insights and Algorithms

Bo Zeng Department of Industrial Engineering, University of Pittsburgh

Joint work with Chaosheng Dong, currently at Amazon

November 13, 2020, Texas A&M Institute of Data Science

slide-2
SLIDE 2

Introduction

◮ We (humans, enterprises and organizations) are decision makers, making

various decisions every moment everywhere.

1

slide-3
SLIDE 3

Introduction

◮ We (humans, enterprises and organizations) are decision makers, making

various decisions every moment everywhere.

◮ Decision makers are driven by their interests, desires, preferences, or utility

in general, subject to different restrictions

1

slide-4
SLIDE 4

Introduction

◮ We (humans, enterprises and organizations) are decision makers, making

various decisions every moment everywhere.

◮ Decision makers are driven by their interests, desires, preferences, or utility

in general, subject to different restrictions

◮ Decisions, represented in the form of choices, behaviors, operations et al.,

are generally observable ⇒ stored as data.

1

slide-5
SLIDE 5

Introduction

◮ We (humans, enterprises and organizations) are decision makers, making

various decisions every moment everywhere.

◮ Decision makers are driven by their interests, desires, preferences, or utility

in general, subject to different restrictions

◮ Decisions, represented in the form of choices, behaviors, operations et al.,

are generally observable ⇒ stored as data.

◮ For a service provider/manufacturer/supplier, developing a sound

understanding on decision makers (i.e., their interests, desires, preferences, and/or restrictions) is critical and fundamental, ⇒ how to convert data into information or knowledge?

1

slide-6
SLIDE 6

Introduction

◮ We (humans, enterprises and organizations) are decision makers, making

various decisions every moment everywhere.

◮ Decision makers are driven by their interests, desires, preferences, or utility

in general, subject to different restrictions

◮ Decisions, represented in the form of choices, behaviors, operations et al.,

are generally observable ⇒ stored as data.

◮ For a service provider/manufacturer/supplier, developing a sound

understanding on decision makers (i.e., their interests, desires, preferences, and/or restrictions) is critical and fundamental, ⇒ how to convert data into information or knowledge?

◮ Example: you notice from your office that people are using umbrella.

Using umbrella indicates people protect themselves from rain. So, an inference is that it is raining now and people do not like wet clothes.

1

slide-7
SLIDE 7

Introduction

◮ We believe that decision makers are rational, i.e., they acquire and carry

  • ut optimal decisions in their decision making problems.

2

slide-8
SLIDE 8

Introduction

◮ We believe that decision makers are rational, i.e., they acquire and carry

  • ut optimal decisions in their decision making problems.

◮ Decision makers are concerned with

2

slide-9
SLIDE 9

Introduction

◮ We believe that decision makers are rational, i.e., they acquire and carry

  • ut optimal decisions in their decision making problems.

◮ Decision makers are concerned with

– a single objective, e.g., the shortest distance

2

slide-10
SLIDE 10

Introduction

◮ We believe that decision makers are rational, i.e., they acquire and carry

  • ut optimal decisions in their decision making problems.

◮ Decision makers are concerned with

– a single objective, e.g., the shortest distance – multiple objectives, e.g., risk and returns

2

slide-11
SLIDE 11

Introduction

◮ We believe that decision makers are rational, i.e., they acquire and carry

  • ut optimal decisions in their decision making problems.

◮ Decision makers are concerned with

– a single objective, e.g., the shortest distance – multiple objectives, e.g., risk and returns

◮ To understand those decision makers, the fundamental issue:

how to recover the decision making problem (DMP) from observed decisions, e.g., utility functions, restrictions and the overall decision making scheme.

2

slide-12
SLIDE 12

Introduction

◮ We believe that decision makers are rational, i.e., they acquire and carry

  • ut optimal decisions in their decision making problems.

◮ Decision makers are concerned with

– a single objective, e.g., the shortest distance – multiple objectives, e.g., risk and returns

◮ To understand those decision makers, the fundamental issue:

how to recover the decision making problem (DMP) from observed decisions, e.g., utility functions, restrictions and the overall decision making scheme.

◮ Inverse Optimization – a data-driven learning approach from observed

decisions.

2

slide-13
SLIDE 13

What is inverse optimization problem (IOP)?

◮ Given a set of observations that are (probably noisy or suboptimal)

  • ptimal solutions collected from the decision maker under different

external signals, the inverse optimization model is to infer the parameter θ

  • f the DMP with a single objective.

1mas1995microeconomic. 3

slide-14
SLIDE 14

What is inverse optimization problem (IOP)?

◮ Given a set of observations that are (probably noisy or suboptimal)

  • ptimal solutions collected from the decision maker under different

external signals, the inverse optimization model is to infer the parameter θ

  • f the DMP with a single objective.

◮ Consider the consumer’s behavior problem in a market with n products.

The prices for the products are denoted by pt which varies over different time t ∈ [T]. The consumer’s decision making problem can be stated as the following utility maximization problem1 max

x∈Rn

+

u(x) s.t. pT

t x ≤ b

UMP where pT

t x ≤ b is the budget constraint at time t.

1mas1995microeconomic. 3

slide-15
SLIDE 15

Motivation of our research

◮ Tools of traditional IOP theory (typically for batch setting) have not

proven fully applicable to support recent attempts in AI to automate the elicitation of human decision maker’s preferences.

2aswani2016inverse. 4

slide-16
SLIDE 16

Motivation of our research

◮ Tools of traditional IOP theory (typically for batch setting) have not

proven fully applicable to support recent attempts in AI to automate the elicitation of human decision maker’s preferences.

2aswani2016inverse. 4

slide-17
SLIDE 17

Motivation of our research

◮ Tools of traditional IOP theory (typically for batch setting) have not

proven fully applicable to support recent attempts in AI to automate the elicitation of human decision maker’s preferences.

2aswani2016inverse. 4

slide-18
SLIDE 18

Motivation of our research

◮ Tools of traditional IOP theory (typically for batch setting) have not

proven fully applicable to support recent attempts in AI to automate the elicitation of human decision maker’s preferences.

– Recommender systems utilized by

  • nline retailers to increase product

sales: they elicit a user’s preferences

  • r restrictions from a sequence of

historical records of her purchasing behaviors, and then make predictions about future shopping decisions.

2aswani2016inverse. 4

slide-19
SLIDE 19

Motivation of our research

◮ Tools of traditional IOP theory (typically for batch setting) have not

proven fully applicable to support recent attempts in AI to automate the elicitation of human decision maker’s preferences.

– Recommender systems utilized by

  • nline retailers to increase product

sales: they elicit a user’s preferences

  • r restrictions from a sequence of

historical records of her purchasing behaviors, and then make predictions about future shopping decisions. – Access to large data sets (online/sequential data).

2aswani2016inverse. 4

slide-20
SLIDE 20

Motivation of our research

◮ Tools of traditional IOP theory (typically for batch setting) have not

proven fully applicable to support recent attempts in AI to automate the elicitation of human decision maker’s preferences.

– Recommender systems utilized by

  • nline retailers to increase product

sales: they elicit a user’s preferences

  • r restrictions from a sequence of

historical records of her purchasing behaviors, and then make predictions about future shopping decisions. – Access to large data sets (online/sequential data).

2aswani2016inverse. 4

slide-21
SLIDE 21

Motivation of our research

◮ Tools of traditional IOP theory (typically for batch setting) have not

proven fully applicable to support recent attempts in AI to automate the elicitation of human decision maker’s preferences.

– Recommender systems utilized by

  • nline retailers to increase product

sales: they elicit a user’s preferences

  • r restrictions from a sequence of

historical records of her purchasing behaviors, and then make predictions about future shopping decisions. – Access to large data sets (online/sequential data).

◮ However, using traditional IOP to extract users’ preferences or restrictions

is time consuming, since it is NP-hard (computationally intractable)2.

2aswani2016inverse. 4

slide-22
SLIDE 22

Motivation of our research

◮ To fully unlock the potential of inverse optimization, elicit decision

maker’s preferences or restrictions through online learning.

5

slide-23
SLIDE 23

Motivation of our research

◮ To fully unlock the potential of inverse optimization, elicit decision

maker’s preferences or restrictions through online learning.

◮ We formulate an IOP considering noisy data, develop an online learning

algorithm to derive unknown parameters in objective function and/or constraints.

5

slide-24
SLIDE 24

Motivation of our research

◮ To fully unlock the potential of inverse optimization, elicit decision

maker’s preferences or restrictions through online learning.

◮ We formulate an IOP considering noisy data, develop an online learning

algorithm to derive unknown parameters in objective function and/or constraints.

◮ One key feature: it should incorporates sequentially arrived observations

into this model, without keeping them in memory, to realize incremental elicitation, revision and reuse of old inferences.

5

slide-25
SLIDE 25

Decision making problem

◮ We consider a family of parameterized optimization problem, in which

x ∈ Rn is the decision variable, u ∈ Rm is the external signal, and θ ∈ Θ is the parameter. min

x∈Rn

f(x, u, θ) s.t. g(x, u, θ) ≤ 0 DMP where f : Rn × Rm × Rp → R is a real-valued function, and g : Rn × Rm × Rp → Rq is a vector-valued function.

6

slide-26
SLIDE 26

Decision making problem

◮ We consider a family of parameterized optimization problem, in which

x ∈ Rn is the decision variable, u ∈ Rm is the external signal, and θ ∈ Θ is the parameter. min

x∈Rn

f(x, u, θ) s.t. g(x, u, θ) ≤ 0 DMP where f : Rn × Rm × Rp → R is a real-valued function, and g : Rn × Rm × Rp → Rq is a vector-valued function.

◮ X(u, θ) = {x ∈ Rn : g(x, u, θ) ≤ 0}, the feasible region of DMP.

6

slide-27
SLIDE 27

Decision making problem

◮ We consider a family of parameterized optimization problem, in which

x ∈ Rn is the decision variable, u ∈ Rm is the external signal, and θ ∈ Θ is the parameter. min

x∈Rn

f(x, u, θ) s.t. g(x, u, θ) ≤ 0 DMP where f : Rn × Rm × Rp → R is a real-valued function, and g : Rn × Rm × Rp → Rq is a vector-valued function.

◮ X(u, θ) = {x ∈ Rn : g(x, u, θ) ≤ 0}, the feasible region of DMP. ◮ Key concept: S(u, θ) = arg min{f(x, u, θ) : x ∈ X(u, θ)}, the optimal

solution set of DMP.

6

slide-28
SLIDE 28

Decision making problem

◮ We consider a family of parameterized optimization problem, in which

x ∈ Rn is the decision variable, u ∈ Rm is the external signal, and θ ∈ Θ is the parameter. min

x∈Rn

f(x, u, θ) s.t. g(x, u, θ) ≤ 0 DMP where f : Rn × Rm × Rp → R is a real-valued function, and g : Rn × Rm × Rp → Rq is a vector-valued function.

◮ X(u, θ) = {x ∈ Rn : g(x, u, θ) ≤ 0}, the feasible region of DMP. ◮ Key concept: S(u, θ) = arg min{f(x, u, θ) : x ∈ X(u, θ)}, the optimal

solution set of DMP.

◮ Assumption

Set Θ is a convex compact set. There exists D > 0 such that θ2 ≤ D for all θ ∈ Θ. For each u ∈ U, θ ∈ Θ, both f(x, u, θ) and g(x, u, θ) are convex in x.

6

slide-29
SLIDE 29

Inverse optimization problem in batch setting

As in conventional studies of3456, we consider a situation where a decision yi (with respect to an external signal ui) for i ∈ [N] are observed and recorded, the IOP model is to infer the parameter θ in DMP by minimizing an empirical loss min

θ∈Θ

1 N

N

  • i=1

l(yi, ui, θ) (Batch-IOP) where l(yi, ui, θ) is a loss function that captures the discrepancy between the model inferred from data and the actual one.

3bertsimas2015data. 4aswani2016inverse. 5keshavarz2011imputing. 6esfahani2017data. 7

slide-30
SLIDE 30

Online setting

◮ In our online learning setting, the signal-noisy decision pair becomes

available to the learner one by one.

8

slide-31
SLIDE 31

Online setting

◮ In our online learning setting, the signal-noisy decision pair becomes

available to the learner one by one.

◮ The learning algorithm produces a sequence of hypotheses (θ1, . . . , θT ).

8

slide-32
SLIDE 32

Online setting

◮ In our online learning setting, the signal-noisy decision pair becomes

available to the learner one by one.

◮ The learning algorithm produces a sequence of hypotheses (θ1, . . . , θT ). ◮ Let l(yt, ut, θt) denote the loss the learning algorithm suffers when it tries

to predict the tth decision given ut based on {(u1, y1), · · · , (ut−1, yt−1)}.

8

slide-33
SLIDE 33

Online setting

◮ In our online learning setting, the signal-noisy decision pair becomes

available to the learner one by one.

◮ The learning algorithm produces a sequence of hypotheses (θ1, . . . , θT ). ◮ Let l(yt, ut, θt) denote the loss the learning algorithm suffers when it tries

to predict the tth decision given ut based on {(u1, y1), · · · , (ut−1, yt−1)}.

◮ The goal of the learner is to minimize the regret: the cumulative loss

  • t∈[T ] l(yt, ut, θt) against the possible loss when the whole batch of

decisions are available RT =

  • t∈[T ]

l(yt, ut, θt) − min

θ∈Θ

  • t∈[T ]

l(yt, ut, θ).

8

slide-34
SLIDE 34

Loss function and implicit update rule

◮ Given a (signal,noisy decision) pair (u, y) and a hypothesis θ, we set the

loss function as the minimum (squared) distance between y and the

  • ptimal solution set S(u, θ) in the following.

l(y, u, θ) = min

x∈S(u,θ)y − x2 2.

(1)

9

slide-35
SLIDE 35

Loss function and implicit update rule

◮ Given a (signal,noisy decision) pair (u, y) and a hypothesis θ, we set the

loss function as the minimum (squared) distance between y and the

  • ptimal solution set S(u, θ) in the following.

l(y, u, θ) = min

x∈S(u,θ)y − x2 2.

(1)

◮ Once receiving the tth (signal,noisy decision) pair (ut, yt), θt+1 can be

  • btained by solving the following optimization problem:

θt+1 = arg min

θ∈Θ 1 2θ − θt2 2 + ηtl(yt, ut, θ),

(2) where ηt is the learning rate in round t, and l(yt, ut, θ) is defined in (1).

9

slide-36
SLIDE 36

Loss function and implicit update rule

◮ Given a (signal,noisy decision) pair (u, y) and a hypothesis θ, we set the

loss function as the minimum (squared) distance between y and the

  • ptimal solution set S(u, θ) in the following.

l(y, u, θ) = min

x∈S(u,θ)y − x2 2.

(1)

◮ Once receiving the tth (signal,noisy decision) pair (ut, yt), θt+1 can be

  • btained by solving the following optimization problem:

θt+1 = arg min

θ∈Θ 1 2θ − θt2 2 + ηtl(yt, ut, θ),

(2) where ηt is the learning rate in round t, and l(yt, ut, θ) is defined in (1).

◮ Seek to balance the tradeoff between ”conservativeness” and

”correctiveness”.

9

slide-37
SLIDE 37

Loss function and implicit update rule

◮ Given a (signal,noisy decision) pair (u, y) and a hypothesis θ, we set the

loss function as the minimum (squared) distance between y and the

  • ptimal solution set S(u, θ) in the following.

l(y, u, θ) = min

x∈S(u,θ)y − x2 2.

(1)

◮ Once receiving the tth (signal,noisy decision) pair (ut, yt), θt+1 can be

  • btained by solving the following optimization problem:

θt+1 = arg min

θ∈Θ 1 2θ − θt2 2 + ηtl(yt, ut, θ),

(2) where ηt is the learning rate in round t, and l(yt, ut, θ) is defined in (1).

◮ Seek to balance the tradeoff between ”conservativeness” and

”correctiveness”.

◮ As there is no closed form for θt+1 in general, we call (2) an implicit

update rule.

9

slide-38
SLIDE 38

Algorithm

Algorithm 1 1 Implicit Online Learning for Generalized Inverse Optimization

1: Input: (signal,noisy decision) pairs {(ut, yt)}t∈[T ] 2: Initialization: θ1 could be an arbitrary hypothesis of the parameter. 3: for t = 1 to T do 4:

receive (ut, yt)

5:

suffer loss l(yt, ut, θt)

6:

if l(yt, ut, θt) = 0 then

7:

θt+1 ← θt

8:

else

9:

set learning rate ηt ∝ 1/ √ t

10:

update θt+1 = arg min

θ∈Θ 1 2θ − θt2 2 + ηtl(yt, ut, θ) (i.e., solve (2))

11:

end if

12: end for

10

slide-39
SLIDE 39

Theoretical analysis Theorem (Regret bound)

Suppose some technical assumptions hold. Then, choosing ηt =

Dλ 2 √ 2(B+R)κ 1 √ t,

RT ≤ 4 √ 2(B + R)Dκ λ √ T. wher λ and κ are related to the smoothness of the objective functions.

Theorem (Risk consistency)

Suppose some technical assumptions hold. Then, choosing ηt =

Dλ 2 √ 2(B+R)κ 1 √ t,

1 T

  • t∈[T ]

l(yt, ut, θt)

p

− → E l(y, u, θ∗) as T approaches to infinity. Here, θ∗ minimizes the true risk E [l(y, u, θ)].

11

slide-40
SLIDE 40

Learning consumer behavior

Consider the consumer’s behavior problem in a market with n products. The prices for the products are denoted by pt ∈ Rn

+ which varies over different time

t ∈ [T].

◮ The consumer’s decision making problem can be stated as the following

utility maximization problem (UMP)7 max

x∈Rn

+

u(x) s.t. pT

t x ≤ b

UMP where pT

t x ≤ b is the budget constraint at time t. ◮ For this application, we will consider a concave quadratic representation

for u(x). That is, u(x) = 1

2xT Qx + rT x, where Q ∈ Sn − (the set of

symmetric negative semidefinite matrices), r ∈ Rn.

7mas1995microeconomic. 12

slide-41
SLIDE 41

Learning utility function

200 400 600 800 1000 2 4 6 8 10

Estimation error per round Average estimation error

(a)

200 400 600 800 1000 0.5 1 1.5 2

Loss

Average cumulative loss Loss per round E[ T ]

(b) Figure: Learning the Utility Function

13

slide-42
SLIDE 42

Learning budget

200 400 600 800 1000 10 20 30 40

Estimation error per round Average estimation error

(a)

200 400 600 800 1000 0.2 0.4 0.6 0.8 1 1.2

Loss

Average cumulative loss Loss per round E[ T ]

(b) Figure: Learning the Budget

14

slide-43
SLIDE 43

Learning transportation cost

We now consider the transshipment network G = (Vs ∪ Vd, E), where nodes Vs are producers and the remaining nodes Vd are consumers. Variables xe and yv represent transportation quantity and production quantity,

  • respectively. The transshipment problem is

min

  • v∈Vs

Cv(yv) +

e∈E

cexe s.t.

  • e∈δ+(v)

xe −

  • e∈δ−(v)

xe = yv ∀v ∈ Vs

  • e∈δ+(v)

xe −

  • e∈δ−(v)

xe = dt

v

∀v ∈ Vd 0 ≤ xe ≤ ue, 0 ≤ yv ≤ wv ∀e ∈ E, ∀v ∈ Vs TP where we want to learn the transportation cost ce for e ∈ E.

15

slide-44
SLIDE 44

Learning transportation cost

3 4 1 2 5

(a)

200 400 600 800 1000 1 2 3 4

Estimation error per round Average estimation error

(b)

200 400 600 800 1000 0.2 0.4 0.6 0.8 1 1.2

Loss

Average cumulative loss Loss per round E[ T ]

(c) Figure: Learning the Transportation Cost

16

slide-45
SLIDE 45

What is inverse multiobjective optimization problem (IMOP)?

◮ A decision is often a trade-off among multiple criteria.

17

slide-46
SLIDE 46

What is inverse multiobjective optimization problem (IMOP)?

◮ A decision is often a trade-off among multiple criteria. ◮ Decision makers generally share similar evaluation criteria but have

different priorities/preferences

17

slide-47
SLIDE 47

What is inverse multiobjective optimization problem (IMOP)?

◮ A decision is often a trade-off among multiple criteria. ◮ Decision makers generally share similar evaluation criteria but have

different priorities/preferences

◮ With various responses towards a same input, we are interested in

(1) the common ground (i.e., multiple evaluation criteria) shared by decision makers, and (2) how varying they are in this decision maker population.

17

slide-48
SLIDE 48

What is inverse multiobjective optimization problem (IMOP)?

◮ A decision is often a trade-off among multiple criteria. ◮ Decision makers generally share similar evaluation criteria but have

different priorities/preferences

◮ With various responses towards a same input, we are interested in

(1) the common ground (i.e., multiple evaluation criteria) shared by decision makers, and (2) how varying they are in this decision maker population.

◮ Given a set of observations that are noisy efficient (Pareto optimal)

solutions collected from a population of decision makers, the inverse multiobjective optimization model is to infer parameter θ of a multiobjective DMP.

17

slide-49
SLIDE 49

An example of multiobjective optimization

Consider a portfolio selection problem, where investors need to determine the fraction of their wealth to invest in each security in order to maximize the total return and minimize the total risk. The classical Markowitz mean-variance portfolio selection8 is min

x

f1(x) = −rT x f2(x) = xT Qx

  • s.t.

0 ≤ xi ≤ bi, ∀i ∈ [n],

n

  • i=1

xi = 1,

8markowitz1952portfolio. 18

slide-50
SLIDE 50

Pareto optimality Example

Consider the following multiobjective quadratic programming problem. min

x∈R2

+

f1(x) = 1

2xT Q1x + cT 1 x

f2(x) = 1

2xT Q2x + cT 2 x

  • s.t.

Ax ≥ b, where parameters of the objective functions and the constraints are Q1 = 1 2

  • , c1 =

3 1

  • , Q2 =

2 1

  • , c2 =

−6 −5

  • , A =

−3 1 −1

  • , b =

−6 −3

  • .

1 2 3 4 1 2 3 4

Pareto optimal set

19

slide-51
SLIDE 51

Pareto optimal solution

A common way to derive a Pareto optimal solution is to solve the following problem9. min wT f(x, θ) s.t. x ∈ X(θ) WP where w = (w1, . . . , wp)T is the nonnegative weight vector in the (p − 1)-simplex Wp ≡ {w ∈ Rp

+ : 1T w = 1}.

9Saul et al. 1955 20

slide-52
SLIDE 52

Motivation of our research

◮ Can we use inverse optimization as a surrogate for inverse multiobjective

  • ptimization?

21

slide-53
SLIDE 53

Motivation of our research

◮ Can we use inverse optimization as a surrogate for inverse multiobjective

  • ptimization?

21

slide-54
SLIDE 54

Motivation of our research

◮ Can we use inverse optimization as a surrogate for inverse multiobjective

  • ptimization?

21

slide-55
SLIDE 55

Motivation of our research

◮ Can we use inverse optimization as a surrogate for inverse multiobjective

  • ptimization?

Consider the bi-objective LP problem min x1 x2

  • s.t. ax1 + bx2 ≥ 0,

bx1 + ax2 ≥ 0, x1 + x2 ≤ c. where a > b > 0 and c > 0. Right figure displays the feasible region of an instance with a = 6, b = 1, c = 1, i.e., the triangle AOB.

21

slide-56
SLIDE 56

Motivation of our research

◮ Can we use inverse optimization as a surrogate for inverse multiobjective

  • ptimization?

Consider the bi-objective LP problem min x1 x2

  • s.t. ax1 + bx2 ≥ 0,

bx1 + ax2 ≥ 0, x1 + x2 ≤ c. where a > b > 0 and c > 0. Right figure displays the feasible region of an instance with a = 6, b = 1, c = 1, i.e., the triangle AOB.

0.5 1 0.5 1

Figure: OA and OB are the efficient (solution) set for the bi-objective linear programming problem.

21

slide-57
SLIDE 57

Motivation of our research

◮ Can we use inverse optimization as a surrogate for inverse multiobjective

  • ptimization?

Consider the bi-objective LP problem min x1 x2

  • s.t. ax1 + bx2 ≥ 0,

bx1 + ax2 ≥ 0, x1 + x2 ≤ c. where a > b > 0 and c > 0. Right figure displays the feasible region of an instance with a = 6, b = 1, c = 1, i.e., the triangle AOB.

0.5 1 0.5 1

Figure: OA and OB are the efficient (solution) set for the bi-objective linear programming problem.

◮ The answer is NO! We would get the obj = −x1 − x2 using IOP, which

reflects opposite information regarding decision makers’ intentions.

21

slide-58
SLIDE 58

IMOP is an unsupervised learning task

The only data we have for IMOP is the noisy decisions {yi}i∈[N].

w θ y

22

slide-59
SLIDE 59

Loss function of unsupervised learning type

23

slide-60
SLIDE 60

Loss function of unsupervised learning type

l(y, θ) = min

x∈XP (θ)y − x2 2,

23

slide-61
SLIDE 61

Loss function of unsupervised learning type

l(y, θ) = min

x∈XP (θ)y − x2 2,

0.2 0.2 0.2 0.4 0.4

w3 w1

0.4 0.6

w2

0.6 0.8 0.6 1 0.8 0.8 1 1

23

slide-62
SLIDE 62

Loss function of unsupervised learning type

l(y, θ) = min

x∈XP (θ)y − x2 2,

0.2 0.2 0.2 0.4 0.4

w3 w1

0.4 0.6

w2

0.6 0.8 0.6 1 0.8 0.8 1 1

23

slide-63
SLIDE 63

Loss function of unsupervised learning type

l(y, θ) = min

x∈XP (θ)y − x2 2,

0.2 0.2 0.2 0.4 0.4

w3 w1

0.4 0.6

w2

0.6 0.8 0.6 1 0.8 0.8 1 1

lK(y, θ) = min

xk,zk∈{0,1} y − k∈[K]

zkxk2

2

s.t.

  • k∈[K]

zk = 1, xk ∈ S(wk, θ). surrogate loss function

23

slide-64
SLIDE 64

Convergence rate Theorem

Under certein assumptions, we have that ∀y ∈ Y, ∀θ ∈ Θ, 0 ≤ lK(y, θ) − l(y, θ) ≤ 16e(B + R)ζ λ · 1 K

1 p−1

.

Example

When p = 2, i.e., a bi-objective decision making problem, theorem shows that lK(y, θ) − l(y, θ) is of O(1/K).

24

slide-65
SLIDE 65

Model for IMOP

◮ Given a set of observations that are noisy efficient solutions {yi}i∈[N],

construct an optimization model to infer parameter θ of a multiobjective DMP.

10dong2018inferring. 25

slide-66
SLIDE 66

Model for IMOP

◮ Given a set of observations that are noisy efficient solutions {yi}i∈[N],

construct an optimization model to infer parameter θ of a multiobjective DMP.

◮ Loss function. We adopt a sampling approach to generate weights

wk ∈ Wp for each k ∈ [K] and approximate XE(θ) as the union of their S(wk, θ)s. Then, by utilizing binary variables, the loss function is lK(y, θ) = minxk,zk∈{0,1} y −

k∈[K]

zkxk2

2

s.t.

  • k∈[K]

zk = 1, xk ∈ S(wk, θ).

10dong2018inferring. 25

slide-67
SLIDE 67

Model for IMOP

◮ Given a set of observations that are noisy efficient solutions {yi}i∈[N],

construct an optimization model to infer parameter θ of a multiobjective DMP.

◮ Loss function. We adopt a sampling approach to generate weights

wk ∈ Wp for each k ∈ [K] and approximate XE(θ) as the union of their S(wk, θ)s. Then, by utilizing binary variables, the loss function is lK(y, θ) = minxk,zk∈{0,1} y −

k∈[K]

zkxk2

2

s.t.

  • k∈[K]

zk = 1, xk ∈ S(wk, θ).

◮ Model for IMOP10

min

θ∈Θ

M N

K (θ) ≡ 1 N

  • i∈[N]

yi −

k∈[K]

zikxk2

2

s.t. xk ∈ S(wk, θ), ∀k ∈ [K],

  • k∈[K]

zik = 1, ∀i ∈ [N], zik ∈ {0, 1}, ∀i ∈ [N], k ∈ [K]. IMOP

10dong2018inferring. 25

slide-68
SLIDE 68

Statistical properties of IMOP Theorem (Consistency of IMOP )

M N

K (θ)

M N(θ) MK(θ) M(θ) P P P P

Figure: Uniform convergence diagram for empirical risks.

P

− → means convergence in

  • probability. −

→ indicates the convergence of a sequence of numbers.

P

means convergence in probability for double-index random variable.

26

slide-69
SLIDE 69

K-means Clustering

K-means clustering aims to partition the observations into K clusters (or groups) such that the average squared distance between each observation and its closest cluster centroid is minimized. Given observations {yi}i∈[N]

11,

min

xk,zik 1 N

  • i∈[N]

yi −

k∈[K]

zikxk2

2

s.t.

  • k∈[K]

zik = 1, ∀i ∈ [N], xk ∈ Rn, zik ∈ {0, 1}, ∀i ∈ [N], k ∈ [K], K-means clustering where K is the number of clusters, and {xk}k∈[K] are the centroids of the clusters.

11bagirov2008modified; aloise2009branch. 27

slide-70
SLIDE 70

Connection between IMOP and Clustering Theorem

Given any K-means clustering problem, we can construct an instance of IMOP, such that solving K-means clustering is equivalent to solving the IMOP. The key to prove the theorem:

min

θ∈Θ

MN

K (θ) ≡ 1 N

  • i∈[N]

yi −

  • k∈[K]

zikxk2

2

s.t. xk ∈ S(wk, θ),

  • k∈[K]

zik = 1, zik ∈ {0, 1}. min

xk,zik 1 N

  • i∈[N]

yi −

  • k∈[K]

zikxk2

2

s.t. xk ∈ Rn,

  • k∈[K]

zik = 1, zik ∈ {0, 1}.

where K is the number of clusters, and {xk}k∈[K] are the centroids.

28

slide-71
SLIDE 71

Complexity of IMOP Lemma (NP-hardness of K-means clustering)

K–means clustering is NP-hard to solve even for instances in the plane12, or with two clusters in the general dimension13.

Theorem (NP-hardness of IMOP)

IMOP is NP-hard to solve.

12Meena et al. 2012 13Daniel et al. 2009 29

slide-72
SLIDE 72

Connection between IMOP and Clustering

◮ Clearly, in both IMOP and K-means clustering, one needs to assign

{yi}i∈[N] to certain clusters in such a way that the average squared distance between yi and its closest xk is minimized.

◮ The difference is whether xk has restriction or not. In IMOP, each xk is

restricted to belong to S(wk, θ), while there is no restriction for xk in K-means clustering.

◮ We partition {yi}i∈[N] into K clusters {Ck}k∈[K]. Let

yk =

1 |Ck|

  • yi∈Ck yi be the centroid of cluster Ck.

min

θ,xk′ 1 N

  • k∈[K]

|Ck|yk −

k′∈[K] zkk′xk′2 2

s.t. xk′ ∈ S(wk′, θ), ∀k′ ∈ [K],

  • k′∈[K]

zkk′ = 1, ∀k ∈ [K], zkk′ ∈ {0, 1}, ∀k ∈ [K], k′ ∈ [K]. Kmeans-IMOP

30

slide-73
SLIDE 73

Algorithm

Algorithm 2 2 Solving IMOP through a Clustering-based Approach

1: Input: Noisy decisions {yi}i∈[N], weight samples {wk}k∈[K]. 2: Initialize Partition {yi}i∈[N] into K clusters using K-means clustering. Cal-

culate {yk}k∈[K]. Solve Kmeans-IMOP and get an initial estimation of θ and {xk}k∈[K].

3: while stopping criterion is not satisfied do 4:

Assignment step: Assign each yi to the closest xk to form new clusters. Calculate their centroids {yk}k∈[K].

5:

Update step: Update θ and {xk}k∈[K] by solving Kmeans-IMOP.

6: end while

31

slide-74
SLIDE 74

Manifold learning

(a) High dimension space (b) Low dimension space

Formally, given a set of high-dimensional data points {yi}i∈[N] in Rn, we are required to find a mapping f : Rd → Rn and an embedding set {xi}i∈[N] in a low-dimensional space Rd (d < n) such that yi = f(xi) + ǫi, i ∈ [N], and the local manifold structure formed by {yi}i∈[N] is preserved in the embedded space14. Here, ǫi represents random noise.

14Joshua et al. 2000; Sam et al. 2000 32

slide-75
SLIDE 75

Pareto manifold Theorem (Pareto manifold)

Suppose certain regularity assumptions hold. For each θ ∈ Θ, the Pareto

  • ptimal set is a (p − 1)-dimensional piecewise continuous manifold.

Corollary

Suppose that both f(x, θ) and g(x, θ) are linear functions in x for all θ ∈ Θ. Then, XP (θ) is a (p − 1)-dimensional piecewise linear manifold for all θ ∈ Θ.

33

slide-76
SLIDE 76

Pareto manifold Theorem (Pareto manifold)

Suppose certain regularity assumptions hold. For each θ ∈ Θ, the Pareto

  • ptimal set is a (p − 1)-dimensional piecewise continuous manifold.

Corollary

Suppose that both f(x, θ) and g(x, θ) are linear functions in x for all θ ∈ Θ. Then, XP (θ) is a (p − 1)-dimensional piecewise linear manifold for all θ ∈ Θ. min

x∈R2

+

f1(x) = 1

2xT Q1x + cT 1 x

f2(x) = 1

2xT Q2x + cT 2 x

  • s.t.

Ax ≥ b,

1 2 3 4 1 2 3 4

Pareto optimal set

33

slide-77
SLIDE 77

Pareto manifold Theorem (Pareto manifold)

Suppose certain regularity assumptions hold. For each θ ∈ Θ, the Pareto

  • ptimal set is a (p − 1)-dimensional piecewise continuous manifold.

Corollary

Suppose that both f(x, θ) and g(x, θ) are linear functions in x for all θ ∈ Θ. Then, XP (θ) is a (p − 1)-dimensional piecewise linear manifold for all θ ∈ Θ. min {−x1, −x2, −x3} s.t. x1 + x2 + x3 ≤ 5, x1 + x2 + 3x3 ≤ 9, x1, x2, x3 ≥ 0. 2 4 1 3 5

33

slide-78
SLIDE 78

An enhanced algorithm with manifold learning

Algorithm 3 3 Initialization with manifold learning

1: Input: Noisy decision {yi}i∈[N], evenly sampled weights {wk}k∈[K]. 2: Apply any nonlinear manifold learning algorithm:

yi ∈ Rn → xi ∈ Rp−1, ∀i ∈ [N].

3: Group {xi}i∈[N] into K clusters by K-means clustering.

Denote IK the set of labels of {xi}i∈[N]. Find the clusters {Ck}k∈[K] and centroids {yk}k∈[K] of {yi}i∈[N] according to IK.

4: Solve (Kmeans-IMOP) and get ˆ

θ and {xk}k∈[K].

5: Run Step 3 - 6 in Algorithm 2.

  • 1

1 2 3 4

x1

  • 1

1 2 3 4

x2

  • 1

1 2 3 4

x1

  • 1

1 2 3 4

x2

Estimated Pareto optimal set Observations Real Pareto optimal set Clustering Centers

34

slide-79
SLIDE 79

Learning the expected returns

The classical Markovitz mean-variance portfolio selection in the following is

  • ften used by analysts.

min

x

f1(x) = −rT x f2(x) = xT Qx

  • s.t.

0 ≤ xi ≤ bi, ∀i ∈ [n],

n

  • i=1

xi = 1,

1.0 1.5 2.0 2.5 3.0 3.5

Standard Deviation of Portfolio Returns

−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 0.20

Mean of Portfolio Returns

True efficient frontier Estimated efficient frontier: K = 3 Estimated efficient frontier: K = 6 Estimated efficient frontier: K = 11 Estimated efficient frontier: K = 21 35

slide-80
SLIDE 80

Inverse optimization

◮ Propose the first general framework for eliciting decision maker’s

preferences and restrictions using inverse optimization through online learning.

36

slide-81
SLIDE 81

Inverse optimization

◮ Propose the first general framework for eliciting decision maker’s

preferences and restrictions using inverse optimization through online learning.

◮ Learn general convex utility functions and restrictions with observed noisy

signal-decision pairs.

36

slide-82
SLIDE 82

Inverse optimization

◮ Propose the first general framework for eliciting decision maker’s

preferences and restrictions using inverse optimization through online learning.

◮ Learn general convex utility functions and restrictions with observed noisy

signal-decision pairs.

◮ Prove that the online learning algorithm has a O(

√ T) regret under certain regularity conditions. Hence, this method has a fast convergence rate.

36

slide-83
SLIDE 83

Inverse optimization

◮ Propose the first general framework for eliciting decision maker’s

preferences and restrictions using inverse optimization through online learning.

◮ Learn general convex utility functions and restrictions with observed noisy

signal-decision pairs.

◮ Prove that the online learning algorithm has a O(

√ T) regret under certain regularity conditions. Hence, this method has a fast convergence rate.

◮ The algorithm can learn the parameters with great accuracy and is very

robust to noises, and achieves drastic improvement in computational efficiency over the batch learning approach.

36

slide-84
SLIDE 84

Inverse optimization

◮ Propose the first general framework for eliciting decision maker’s

preferences and restrictions using inverse optimization through online learning.

◮ Learn general convex utility functions and restrictions with observed noisy

signal-decision pairs.

◮ Prove that the online learning algorithm has a O(

√ T) regret under certain regularity conditions. Hence, this method has a fast convergence rate.

◮ The algorithm can learn the parameters with great accuracy and is very

robust to noises, and achieves drastic improvement in computational efficiency over the batch learning approach.

◮ Future work for the inverse optimization will mainly focus on the

application of the online learning methods, e.g., in designing recommender systems.

36

slide-85
SLIDE 85

Inverse multiobjective optimization

◮ Develop a new inverse multiobjective optimization problem (IMOP) that is

able to infer multiple criteria (or constraints) over which the Pareto

  • ptimal decisions are made.

37

slide-86
SLIDE 86

Inverse multiobjective optimization

◮ Develop a new inverse multiobjective optimization problem (IMOP) that is

able to infer multiple criteria (or constraints) over which the Pareto

  • ptimal decisions are made.

◮ Provide a solid analysis to ensure the statistical significance of the

inference results from our IMOP model.

37

slide-87
SLIDE 87

Inverse multiobjective optimization

◮ Develop a new inverse multiobjective optimization problem (IMOP) that is

able to infer multiple criteria (or constraints) over which the Pareto

  • ptimal decisions are made.

◮ Provide a solid analysis to ensure the statistical significance of the

inference results from our IMOP model.

◮ Reveal a hidden connection between our IMOP and the K-means

clustering problem, and leverage the connection and its manifold structure in designing powerful algorithms to handle many noisy data.

37

slide-88
SLIDE 88

Inverse multiobjective optimization

◮ Develop a new inverse multiobjective optimization problem (IMOP) that is

able to infer multiple criteria (or constraints) over which the Pareto

  • ptimal decisions are made.

◮ Provide a solid analysis to ensure the statistical significance of the

inference results from our IMOP model.

◮ Reveal a hidden connection between our IMOP and the K-means

clustering problem, and leverage the connection and its manifold structure in designing powerful algorithms to handle many noisy data.

◮ On-going work

37

slide-89
SLIDE 89

Inverse multiobjective optimization

◮ Develop a new inverse multiobjective optimization problem (IMOP) that is

able to infer multiple criteria (or constraints) over which the Pareto

  • ptimal decisions are made.

◮ Provide a solid analysis to ensure the statistical significance of the

inference results from our IMOP model.

◮ Reveal a hidden connection between our IMOP and the K-means

clustering problem, and leverage the connection and its manifold structure in designing powerful algorithms to handle many noisy data.

◮ On-going work

– Developing online learning algorithms for IMOP

37

slide-90
SLIDE 90

Inverse multiobjective optimization

◮ Develop a new inverse multiobjective optimization problem (IMOP) that is

able to infer multiple criteria (or constraints) over which the Pareto

  • ptimal decisions are made.

◮ Provide a solid analysis to ensure the statistical significance of the

inference results from our IMOP model.

◮ Reveal a hidden connection between our IMOP and the K-means

clustering problem, and leverage the connection and its manifold structure in designing powerful algorithms to handle many noisy data.

◮ On-going work

– Developing online learning algorithms for IMOP – Investigation of the robustness of IMOP

37

slide-91
SLIDE 91

Inverse multiobjective optimization

◮ Develop a new inverse multiobjective optimization problem (IMOP) that is

able to infer multiple criteria (or constraints) over which the Pareto

  • ptimal decisions are made.

◮ Provide a solid analysis to ensure the statistical significance of the

inference results from our IMOP model.

◮ Reveal a hidden connection between our IMOP and the K-means

clustering problem, and leverage the connection and its manifold structure in designing powerful algorithms to handle many noisy data.

◮ On-going work

– Developing online learning algorithms for IMOP – Investigation of the robustness of IMOP – Applications to address real problems: Dong and Vanguard’s work on learning time varying risk preferences from investment portfolios

37

slide-92
SLIDE 92

Thank you! Any question? Contact: bzeng@pitt.edu

38