Liege University: Francqui Chair 2011-2012 Lecture 5: Algorithmic - - PowerPoint PPT Presentation

liege university francqui chair 2011 2012 lecture 5
SMART_READER_LITE
LIVE PREVIEW

Liege University: Francqui Chair 2011-2012 Lecture 5: Algorithmic - - PowerPoint PPT Presentation

Liege University: Francqui Chair 2011-2012 Lecture 5: Algorithmic models of human behavior Yurii Nesterov, CORE/INMA (UCL) March 23, 2012 Yu. Nesterov () Algorithmic models of human behavior 1/31 March 23, 2012 1 / 31 Main problem with the


slide-1
SLIDE 1

Liege University: Francqui Chair 2011-2012 Lecture 5: Algorithmic models of human behavior

Yurii Nesterov, CORE/INMA (UCL) March 23, 2012

  • Yu. Nesterov ()

Algorithmic models of human behavior 1/31 March 23, 2012 1 / 31

slide-2
SLIDE 2

Main problem with the Rational Choice

Rational choice assumption is introduced for better understanding and predicting the human behavior. It forms the basis of Neoclassical Economics (1900). The player (Homo Economicus ≡ HE) wants to maximize his utility function by an appropriate adjustment of the consumption pattern. As a consequence, we can speak about equilibrium in economical systems. Existing literature is immense. It concentrates also on ethical, moral, religious, social, and other consequences of rationality. (HE = super-powerful aggressively selfish immoral individualist.) NB: The only missing topic is the Algorithmic Aspects of rationality.

  • Yu. Nesterov ()

Algorithmic models of human behavior 2/31 March 23, 2012 2 / 31

slide-3
SLIDE 3

What do we know now?

Starting from 1977 (Complexity Theory, Nemirovski & Yudin), we know that optimization problems in general are unsolvable. They are very difficult (and will be always difficult) for computers, independently on their speed. How they can be solved by us, taking into account our natural weakness in arithmetics? NB: Mathematical consequences of unreasonable assumptions can be disastrous. Perron paradox: The maximal integer is equal to one. Proof: Denote by N the maximal integer. Then 1 ≤ N ≤ N2 ≤ N. Hence, N = 1.

  • Yu. Nesterov ()

Algorithmic models of human behavior 3/31 March 23, 2012 3 / 31

slide-4
SLIDE 4

What we do not know

In which sense the human beings can solve the optimization problems? What is the accuracy of the solution? What is the convergence rate? Main question: What are the optimization methods? NB: Forget about Simplex Algorithm and Interior Point Methods! Be careful with gradients (dimension, non-smoothness).

  • Yu. Nesterov ()

Algorithmic models of human behavior 4/31 March 23, 2012 4 / 31

slide-5
SLIDE 5

Outline

1 Intuitive optimization (Random Search) 2 Rational activity in stochastic environment

(Stochastic Optimization)

3 Models and algorithms of rational behavior

  • Yu. Nesterov ()

Algorithmic models of human behavior 5/31 March 23, 2012 5 / 31

slide-6
SLIDE 6

Intuitive Optimization

Problem: min

x∈Rn f (x),

where x is the consumption pattern. Main difficulties: High dimension of x (difficult to evaluate/observe). Possible non-smoothness of f (x). Theoretical advice: apply gradient method xk+1 = xk − hf ′(xk). (In the space of all available products!) Hint: we live in an uncertain world.

  • Yu. Nesterov ()

Algorithmic models of human behavior 6/31 March 23, 2012 6 / 31

slide-7
SLIDE 7

Gaussian smoothing

Let f : E → R be differentiable along any direction at any x ∈ E. Let us form its Gaussian approximation fµ(x) =

1 κ

  • E

f (x + µu)e− 1

2 u2du,

where κ def =

  • E

e− 1

2 u2du = (2π)n/2.

In this definition, µ ≥ 0 plays a role of the smoothing parameter.

Why this is interesting? Define y = x + µu. Then

fµ(x) =

1 µnκ

  • E

f (y)e

1 2µ2 y−x2

dy. Hence, ∇fµ(x) =

1 µn+2κ

  • E

f (y)e

1 2µ2 y−x2

(y − x) dy =

1 µκ

  • E

f (x + µu)e− 1

2 u2u du

(!)

=

1 κ

  • E

f (x+µu)−f (x) µ

e− 1

2 u2u du.

  • Yu. Nesterov ()

Algorithmic models of human behavior 7/31 March 23, 2012 7 / 31

slide-8
SLIDE 8

Properties of Gaussian smoothing

If f is convex, then fµ is convex and fµ(x) ≥ f (x). If f ∈ C 0,0, then fµ ∈ C 0,0 and L0(fµ) ≤ L0(f ). If f ∈ C 0,0(E), then, |fµ(x) − f (x)| ≤ µL0(f )n1/2. Random gradient-free oracle: Generate random u ∈ E. Return gµ(x) = f (x+µu)−f (x)

µ

· u. If f ∈ C 0,0(E), then Eu(gµ(x)2

∗) ≤ L2 0(f )(n + 4)2.

  • Yu. Nesterov ()

Algorithmic models of human behavior 8/31 March 23, 2012 8 / 31

slide-9
SLIDE 9

Random intuitive optimization

Problem: f ∗

def

= min

x∈Q f (x)

, where Q ⊆ E is a closed convex set, and f is a nonsmooth convex function. Let us choose a sequence of positive steps {hk}k≥0. Method RSµ: Choose x0 ∈ Q. For k ≥ 0: a). Generate uk. b). Compute ∆k = 1

µ[f (xk + µuk) − f (xk)].

c). Compute xk+1 = πQ (xk − hk∆kuk). NB: µ can be arbitrary small.

  • Yu. Nesterov ()

Algorithmic models of human behavior 9/31 March 23, 2012 9 / 31

slide-10
SLIDE 10

Convergence results

This method generates random {xk}k≥0. Denote SN =

N

  • k=0

hk, Uk = (u0, . . . , uk), φ0 = f (x0), and φk

def

= EUk−1(f (xk)), k ≥ 1. Theorem: Let {xk}k≥0 be generated by RSµ with µ > 0. Then,

N

  • k=0

hk SN (φk − f ∗) ≤ µL0(f )n1/2 + 1 2SN x0 − x∗2 + (n+4)2 2SN L2 0(f ) N

  • k=0

h2

k.

In order to guarantee EUN−1 (f (ˆ xN)) − f ∗ ≤ ǫ, we choose µ =

ǫ 2L0(f )n1/2 ,

hk =

R (n+4)(N+1)1/2L0(f ),

N = 4(n+4)2

ǫ2

L2

0(f )R2.

  • Yu. Nesterov ()

Algorithmic models of human behavior 10/31 March 23, 2012 10 / 31

slide-11
SLIDE 11

Interpretation

Disturbance µuk may be caused by external random factors. For small µ, the sign and the value of ∆k can be treated as an intuition. We use a random experience accumulated by a very small shift along a random direction. The reaction steps hk are big. (Emotions?) The dimension of x slows down the convergence. Main ability: to fulfil a completely opposite action as compared to the proposed one. (Needs training.) NB: Optimization method has a form of emotional reaction. It is efficient in the absence of a stable coordinate system.

  • Yu. Nesterov ()

Algorithmic models of human behavior 11/31 March 23, 2012 11 / 31

slide-12
SLIDE 12

Optimization in Stochastic Environment

Problem: min

x∈Q [ φ(x) = E(f (x, ξ)) ≡

f (x, ξ) p(ξ) dξ ], where f (x, ξ) is convex in x for any ξ ∈ Ω ⊆ Rm, Q is a closed convex set in Rn, p(ξ) is the density of random variable ξ ∈ Ω. Assumption: We can generate a sequence of random events {ξi}:

1 N N

  • i=1

f (x, ξi)

N→∞

→ E(f (x, ξ)), x ∈ Q. Goal: For ǫ > 0 and φ∗ = min

x∈Q φ(x) find ¯

x ∈ Q: φ(¯ x) − φ∗ ≤ ǫ. Main trouble: For finding δ-approximation to φ(x), we need O 1

δ

m computations of f (x, ξ) .

  • Yu. Nesterov ()

Algorithmic models of human behavior 12/31 March 23, 2012 12 / 31

slide-13
SLIDE 13

Stochastic subgradients (Ermoliev, Wetz, 70’s)

Method: Fix some x0 ∈ Q and h > 0. For k ≥ 0, repeat: generate ξk and update xk+1 = πQ (xk − h · f ′(xk, ξk)). Output: ¯ x =

1 N+1 N

  • k=0

xk. Interpretation: Learning process in stochastic environment. Theorem: For h =

R L √ N+1 we get

E(φ(¯ x)) − φ∗ ≤

LR √ N+1 .

NB: This is an estimate for the average performance. Hint: For us, it is enough to ensure a Confidence Level β ∈ (0, 1): Prob [ φ(¯ x) ≥ φ∗ + ǫVφ ] ≤ 1 − β, where Vφ = max

x∈Q φ(x) − φ∗.

In the real world we always apply solutions with β < 1.

  • Yu. Nesterov ()

Algorithmic models of human behavior 13/31 March 23, 2012 13 / 31

slide-14
SLIDE 14

What do we have now?

After N-steps we observe a single implementation of the random variable ¯ x with E(φ(¯ x)) − φ∗ ≤

LR √ N+1.

What about the level of confidence?

  • 1. For random ψ ≥ 0 and T > 0 we have

E(ψ) =

  • ψ =
  • ψ≥T

ψ +

  • ψ<T

ψ ≥ T · Prob [ψ ≥ T].

  • 2. With ψ = φ(¯

x) − φ∗ and T = ǫVφ we need

1 ǫVφ [E(φ(¯

x)) − φ∗] ≤

LR ǫVφ √ N+1 ≤ 1 − β.

Thus, we can take N + 1 =

1 ǫ2(1−β)2

  • LR

2 . NB: 1. For personal needs, this may be OK. What about β → 1?

  • 2. How we increase the confidence level in our life?

Ask for advice as many persons as we can!

  • Yu. Nesterov ()

Algorithmic models of human behavior 14/31 March 23, 2012 14 / 31

slide-15
SLIDE 15

Pooling the experience

Individual learning process (Forms opinion of one expert)

Choose x0 ∈ Q and h > 0. For k = 0, . . . , N repeat generate ξk, and set xk+1 = πQ(xk − hf ′(xk, ξk)). Compute ¯ x =

1 N+1 N

  • k=0

xk.

Pool the experience:

For j = 1, . . . , K compute ¯ xj. Generate the output ˆ x = 1

K K

  • j=1

¯ xj. Note: All learning processes start from the same x0.

  • Yu. Nesterov ()

Algorithmic models of human behavior 15/31 March 23, 2012 15 / 31

slide-16
SLIDE 16

Probabilistic analysis

  • Theorem. Let Zj ∈ [0, V ], j = 1, . . . , K be independent random variables

with the same average µ. Then for ˆ ZK = 1

K K

  • j=1

Zj Prob

  • ˆ

Zk ≥ µ + ˆ ǫ

  • ≤ exp
  • − 2ˆ

ǫ2K V 2

  • .

Corollary. Let us choose K = 2

ǫ2 ln 1 1−β, N = 4 ǫ2

  • LR

2 , and h =

R L √ N+1.

Then the pooling process implements an (ǫ, β)-solution. Note: Each 9 in β = 0.9 · · · 9 costs 4.6

ǫ2 experts.

  • Yu. Nesterov ()

Algorithmic models of human behavior 16/31 March 23, 2012 16 / 31

slide-17
SLIDE 17

Comparison (ǫ is not too small ≡ Q is reasonable)

Denote ρ = LR

Single Expert (SE) Pooling Experience (PE) Number of experts 1

2 ǫ2 ln 1 1−β

Length of life

ρ2 ǫ2(1−β)2 4ρ2 ǫ2

Computational efforts

ρ2 ǫ2(1−β)2 8ρ2 ǫ4 ln 1 1−β

Reasonable computational expenses (for Multi-D Integrals) Number of experts does not depend on dimension. Differences For low level of confidence, SE may be enough. High level of confidence needs independent expertise. Average experience of young population has much higher level of confidence than the experience of a long-life wizard. In PE, the confidence level of “experts” is only 1

2 (!).

  • Yu. Nesterov ()

Algorithmic models of human behavior 17/31 March 23, 2012 17 / 31

slide-18
SLIDE 18

Why this can be useful?

Understanding of the actual role of existing social an political phenomena (education, medias, books, movies, theater, elections, etc.) Future changes (Internet, telecommunications) Development of new averaging instruments (Theory of expertise: mixing opinion of different experts, competitions, etc.)

  • Yu. Nesterov ()

Algorithmic models of human behavior 18/31 March 23, 2012 18 / 31

slide-19
SLIDE 19

Conscious versus subconscious

NB: Conscious behavior can be irrational. Subconscious behavior is often rational. Animals. Children education: First level of knowledge is subconscious. Training in sport (optimal technique ⇒ subconscious level). Examples of subconscious estimates: Mental “image processing”. Tracking the position of your body in space. Regular checking of your status in the society (?) Our model: Conscious behavior based on dynamically updated subconscious estimates.

  • Yu. Nesterov ()

Algorithmic models of human behavior 19/31 March 23, 2012 19 / 31

slide-20
SLIDE 20

Model of consumer: What is easy for us?

Question 1: 123 ∗ 456 = ? Question 2: How often it rains in Belgium? Easy questions: average salary, average gas consumption of your car, average consumption of different food, average commuting time, and many other (survey-type) questions. Main abilities of anybody:

  • 1. Remember the past experience (often by averages).
  • 2. Estimate probabilities of some future events, taking into account their

frequencies in the past. Guess: We are Statistical Homo Economicus? (SHE)

  • Yu. Nesterov ()

Algorithmic models of human behavior 20/31 March 23, 2012 20 / 31

slide-21
SLIDE 21

Main features of SHE

Main passion: Observations. Main abilities: Can select the best variant from several possibilities. Can compute average characteristics for some actions. Can compute frequencies of some events in the past. Can estimate the “faire” prices for products. As compared with HE: A huge step back in the computational power and informational support. Theorem: SHE can be rational. (The proof is constructive.)

  • Yu. Nesterov ()

Algorithmic models of human behavior 21/31 March 23, 2012 21 / 31

slide-22
SLIDE 22

Consumption model

Market

There are n products with unitary prices pj. Each product is described by the vector of qualities aj ∈ Rm. Thus, a(i)

j

is the volume of quality i in the unit of product j.

Consumer SHE

Forms and updates the personal prices y ∈ Rm for qualities. Can estimate the personal quality/price ratio for product j: πj(y) = 1

pj aj, y.

Has standard σi for consumption of quality i,

m

  • i=1

σiyi = 1. Denote A = (a1, . . . , an), σ = (σ1, . . . , σm)T, π(y) = max

1≤j≤n πj(y).

  • Yu. Nesterov ()

Algorithmic models of human behavior 22/31 March 23, 2012 22 / 31

slide-23
SLIDE 23

Consumption algorithm (CA) for kth weekend

For Friday night, SHE has personal prices yk, budget λk, and cumulative consumption vector of qualities sk ∈ Rm, s0 = 0.

1 Define the set Jk = {j : πj(yk) = π(yk)}, containing the products

with the best quality/price ratio.

2 Form partition xk ≥ 0:

n

  • j=1

x(j)

k

= 1, and x(j)

k

= 0 for j ∈ Jk.

3 Buy all products in volumes X (j)

k

= λk · x(j)

k /pj, j = 1, . . . , n.

4 Consume the bought products: sk+1 = sk + AXk. 5 During the next week, SHE watches the results and forms the

personal prices for the next shopping. NB: Only Item 5 is not defined.

  • Yu. Nesterov ()

Algorithmic models of human behavior 23/31 March 23, 2012 23 / 31

slide-24
SLIDE 24

Updating the personal prices for qualities

Define ξi = σiy(i)

k , the relative importance of quality i, m

  • i=1

ξi = 1. Denote by ˆ sk = 1

k sk the average consumption.

  • Assumption. 1. During the week, SHE performs regular detections of the

most deficient quality by computing ψk = min

1≤i≤m ˆ

s(i)

k /σi.

  • 2. This detection is done with random additive errors. Hence, we observe

  • min

1≤i≤m

  • ˆ

s(i)

k

σi + ǫi

  • .

Thus, any quality has a chance to be detected as the worst one.

  • 3. We define ξi as the frequency of detecting the quality i as the most

deficient one with respect to ˆ sk. This is it. Where is Optimization? Objective Function, etc.?

  • Yu. Nesterov ()

Algorithmic models of human behavior 24/31 March 23, 2012 24 / 31

slide-25
SLIDE 25

Algorithmic aspects

  • 1. If ǫi are doubly-exponentially i.i.d. with variance µ, then

y(i)

k

= 1

σi exp

  • − s(i)

k

kσiµ

  • /

m

  • j=1

exp

  • − s(j)

k

kσjµ

  • Therefore, yk = arg min

σ,y=1 {sk, y + γd(y)} ,

where γ = kµ, d(y) =

m

  • i=1

σiy(i) ln(σiy(i)) (prox-function).

  • 2. AXk = λkA
  • xk

p

  • ≡ λkgk, where gk ∈ ∂π(yk) (subgradient).
  • 3. Hence, sk is an accumulated model of function π(y).

Hence, CA is a primal-dual method for solving the (dual) problem min

y≥0

  • π(y) ≡ max

1≤i≤m 1 pi ai, y : σ, y = 1

  • .
  • Yu. Nesterov ()

Algorithmic models of human behavior 25/31 March 23, 2012 25 / 31

slide-26
SLIDE 26

Comments

  • 1. The primal problem is

max

u,τ {τ : Au ≥ τσ, u ≥ 0, p, u = 1}.

We set uk = [xk/p] and approximate u∗ can by averaging {uk}.

  • 2. No “computation” of subgradients (we just buy).

Model is updated implicitly (we just eat).

  • 3. CA is an example of unintentional optimization.

(Other examples in the nature: Fermat principle, etc.)

  • 4. SHE does not recognize the objective. However, it exists.

SHE is rational by behavior, not by the goal (which is absent?).

  • 5. Function π(y) measures the positive appreciation of the market.

By minimizing it, we develop a pessimistic vision of the world. (With time, everything becomes expensive.)

  • 6. For a better life, allow a bit of irrationality.

(Smooth objective, faster convergence.)

  • Yu. Nesterov ()

Algorithmic models of human behavior 26/31 March 23, 2012 26 / 31

slide-27
SLIDE 27

Conclusion

  • 1. Optimization patterns are widely presented in the social life.

Examples: Forming the traditions (Inaugural Lecture) Efficient collaboration between industry, science and government (Lecture 1) Local actions in problems of unlimited size (Lecture 3).

  • 2. The winning social systems give better possibilities for rational behavior
  • f people.

(Forget about ants and bees!)

  • 3. Our role could be the discovering of such patterns and helping to

improve them by an appropriate mathematical analysis.

  • Yu. Nesterov ()

Algorithmic models of human behavior 27/31 March 23, 2012 27 / 31

slide-28
SLIDE 28

References

Lecture 1: Intrinsic complexity of Black-Box Optimization

  • Yu. Nesterov. Introductory Lectures on Convex Optimization.

Chapters 2, 3. Kluwer, Boston, 2004.

  • Yu. Nesterov. A method for unconstrained convex minimization

problem with the rate of convergence O( 1

k2 ). Doklady AN SSSR

(translated as Soviet Math. Dokl.), 1983, v.269, No. 3, 543-547. Lecture 2: Looking into the Black Box

  • Yu. Nesterov. “Smooth minimization of non-smooth functions”,

Mathematical Programming (A), 103 (1), 127-152 (2005).

  • Yu. Nesterov. “Excessive gap technique in nonsmooth convex

minimization”. SIAM J. Optim. 16 (1), 235-249 (2005). Yu.Nesterov. Gradient methods for minimizing composite functions. Accepted by Mathematical Programming.

  • Yu. Nesterov ()

Algorithmic models of human behavior 28/31 March 23, 2012 28 / 31

slide-29
SLIDE 29

References

Lecture 3: Huge-scale optimization problems Yu.Nesterov. Efficiency of coordinate descent methods on large scale

  • ptimization problems. Accepted by SIAM.

Yu.Nesterov. Subgradient methods for huge-scale optimization

  • problems. CORE DP 2012/02.

Lecture 4: Nonlinear analysis of combinatorial problems. Yu.Nesterov. Semidefinite Relaxation and Nonconvex Quadratic

  • Optimization. Optimization Methods and Software, vol.9, 1998,

pp.141–160. Yu.Nesterov. Simple bounds for boolean quadratic problems. EUROPT Newsletters, 18, 19-23 (December 2009).

  • Yu. Nesterov ()

Algorithmic models of human behavior 29/31 March 23, 2012 29 / 31

slide-30
SLIDE 30

References

Lecture 5: Yu.Nesterov, J.-Ph.Vial. Confidence level solutions for stochastic

  • programming. Auromatica, 44(6), 1559-1568 (2008)

Yu.Nesterov. Algorithmic justification of intuitive rationality in consumer behavior. CORE DP.

  • Yu. Nesterov ()

Algorithmic models of human behavior 30/31 March 23, 2012 30 / 31

slide-31
SLIDE 31

Thank you for your attention!

  • Yu. Nesterov ()

Algorithmic models of human behavior 31/31 March 23, 2012 31 / 31