System Analysis and Optimizations of Human Actuated Dynamical - - PowerPoint PPT Presentation

system analysis and optimizations of human actuated
SMART_READER_LITE
LIVE PREVIEW

System Analysis and Optimizations of Human Actuated Dynamical - - PowerPoint PPT Presentation

System Analysis and Optimizations of Human Actuated Dynamical Systems Sangjae Bae, Sang Min Han, Scott J. Moura July 5, 2018 S. Bae et al (UC Berkeley) 2018 TBSI ene2XX July 5, 2018 1 / 35 Human Actuated Dynamical Systems (HADS)? Dynamical


slide-1
SLIDE 1

System Analysis and Optimizations of Human Actuated Dynamical Systems

Sangjae Bae, Sang Min Han, Scott J. Moura July 5, 2018

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 1 / 35

slide-2
SLIDE 2

Human Actuated Dynamical Systems (HADS)? Dynamical systems where the system inputs are induced by human behaviours.

In such system... we cannot directly control human behaviours and therefore the system inputs either. still, their behaviours can be “encouraged” with (price) incentives.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 2 / 35

slide-3
SLIDE 3

Human Actuated Dynamical Systems (HADS)?

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 3 / 35

slide-4
SLIDE 4

Problem Statement

With emerging importance of smart cities, we need an effective system modeling framework to address human behaviors that can be encouraged by incentives. Specifically, we are trying to answer:

1 How to model human behaviours with dynamical systems? 2 How to incentivize human actuators to make desired behaviors for a

system-wide benefit?

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 4 / 35

slide-5
SLIDE 5

Overview

1

Literature

2

System Modeling and Optimization of HADS System modeling with Discrete Choice Model Convex Optimization Framework Application: Demand Response

3

Appendix: Proofs of Theorem and Propositions

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 5 / 35

slide-6
SLIDE 6

Overview

1

Literature

2

System Modeling and Optimization of HADS System modeling with Discrete Choice Model Convex Optimization Framework Application: Demand Response

3

Appendix: Proofs of Theorem and Propositions

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 6 / 35

slide-7
SLIDE 7

Literature

In the existing literature, Human behaviors are addressed as noises/disturbances, [Arnold, 2013, Gray, 2004, Maruyama, 1955]

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 7 / 35

slide-8
SLIDE 8

Literature

In the existing literature, Human behaviors are addressed as noises/disturbances, [Arnold, 2013, Gray, 2004, Maruyama, 1955] Human behaviors are to improve system performance[Leeper, 2012]

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 7 / 35

slide-9
SLIDE 9

Literature

In the existing literature, Human behaviors are addressed as noises/disturbances, [Arnold, 2013, Gray, 2004, Maruyama, 1955] Human behaviors are to improve system performance[Leeper, 2012] Finding optimal behaviors for an individual human actuator. [Mnih, 2015]

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 7 / 35

slide-10
SLIDE 10

Literature

In the existing literature, Human behaviors are addressed as noises/disturbances, [Arnold, 2013, Gray, 2004, Maruyama, 1955] Human behaviors are to improve system performance[Leeper, 2012] Finding optimal behaviors for an individual human actuator. [Mnih, 2015] We add a new perspective: Desired human behaviors are encouraged by incentives [Bae, 2018]

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 7 / 35

slide-11
SLIDE 11

Overview

1

Literature

2

System Modeling and Optimization of HADS System modeling with Discrete Choice Model Convex Optimization Framework Application: Demand Response

3

Appendix: Proofs of Theorem and Propositions

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 8 / 35

slide-12
SLIDE 12

Discrete Choice Model (DCM)

Example of discrete choices:

Figure: Example of discrete choices: choosing a laptop

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 9 / 35

slide-13
SLIDE 13

Discrete Choice Model (DCM)

Uj . = β⊤

j zj + γ⊤ j wj + β0j + ǫj,

(1) where Uj : Utility of j-th alternative, j ∈ {1, ..., J} βj: Parameters of controlled attributes zj: Controlled attributes γj: Parameters of uncontrolled attributes wj: Uncontrolled attributes β0j: Alternative specific constant ǫj: Undefined errors

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 10 / 35

slide-14
SLIDE 14

Probability of Alternatives in DCM

Logit model

Assuming that the undefined errors, ǫj, have iid Extreme Value distribution, the probability of choosing j-th alternative is [Train, 2009]: Pr(u(k) = uj) = Pr  

j=i

(Uj > Ui)   = eVj J

i=1 eVi ,

(2) where Vj . = β⊤

j zk + γ⊤ j wk + β0j.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 11 / 35

slide-15
SLIDE 15

Probability of Alternatives in DCM

Figure: Binary Logit model example

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 12 / 35

slide-16
SLIDE 16

Dynamical Systems with DCM

Discrete-time Linear system x(k + 1) = Ax(k) + B(k)u(k), (3)

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 13 / 35

slide-17
SLIDE 17

Dynamical Systems with DCM

Discrete-time Linear system x(k + 1) = Ax(k) + B(k)u(k), (3)

Figure: Block diagram of Dynamical Systems with DCM

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 13 / 35

slide-18
SLIDE 18

Objective Function

Consider the mean dynamics (with one human actuator): ¯ x(k + 1) = A¯ x(k) + B(k) J

j=1 ujeVj

J

i=1 eVi ,

(4)

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 14 / 35

slide-19
SLIDE 19

Objective Function

Consider the mean dynamics (with one human actuator): ¯ x(k + 1) = A¯ x(k) + B(k) J

j=1 ujeVj

J

i=1 eVi ,

(4) The objective function: f (Z) =

T

  • k=1

¯ x(k), (5)

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 14 / 35

slide-20
SLIDE 20

Objective Function

Consider the mean dynamics (with one human actuator): ¯ x(k + 1) = A¯ x(k) + B(k) J

j=1 ujeVj

J

i=1 eVi ,

(4) The objective function: f (Z) =

T

  • k=1

¯ x(k), (5) Assume A = a ∈ R and B(k) = b(k) ∈ R ∀k: f (Z) =

T

  • k=1

k−1

  • m=0

ak−m−1b(m) J

i=1 eViui(m)

J

j=1 eVj

  • ,

(6) where Vj . = β⊤

j zm + γ⊤ j wm + β0j.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 14 / 35

slide-21
SLIDE 21

Convexity Constraint

Consider the simpler case where the number of alternatives J = 2, i.e. ui(m) ∈ {0, 1} the decision variables zm are scalars, i.e., zm ∈ R.

Theorem (Constraint enforcing convexity)

Minimizing the objective function f (Z) in (6) with respect to zm is a convex optimization problem, if zk(βm0 − βm1) ≥ γm1 − γm0, u0(m) = 0, and u1(m) = 1. where βm0 and βm1: the parameters of zm in Um0 and Um1, respectively γm0 and γm1: the parameters of wm in Um0 and Um1, respectively. (Proof in Appendix)

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 15 / 35

slide-22
SLIDE 22

Projected Gradient Descent Algorithm

Proposition (Gradient of expected sum)

For Z ∈ RT×L, the gradient of f (Z) is ∂f (Z) ∂Z = ∂f (Z) ∂z0 , ∂f (Z) ∂z1 , · · · , ∂f (Z) ∂zT−1 ⊤ , (7) where ∂f (Z) ∂zm = − T−m−1

  • i=0

ai

  • b(m)

˜ βmezm ˜

βm+˜ γm

  • 1 + ezm ˜

βm+˜ γm

2 , (8) where ˜ βm . = βm0 − βm1 and ˜ γm . = γm0 − γm1 for every m = 0, . . . , T − 1. Proof in Appendix. The projected gradient descent finds optima.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 16 / 35

slide-23
SLIDE 23

Application: Demand Response

Problem: System Operator (SO) cannot enforce DR participation (participate: un(k) = 0, non-participate: un(k) = 1) Still, SO gives price compensations to encourage participation. We find: Optimal price compensations zk to minimize power loads

Figure: Smart home example

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 17 / 35

slide-24
SLIDE 24

System Dynamics

Consider the mean dynamics, ¯ x(k + 1) = ¯ x(k) + B(k)⊤ ¯ u(k), (9) ¯ x(k): expected cumulative power consumption ∈ R, B(k): the vector of power loads ∈ RN, ¯ u(k): the vector of expected choice ∈ RN. The choice of each participant is binary; participate (un(k) = 0) or non-participate (un(k) = 1).

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 18 / 35

slide-25
SLIDE 25

System Dynamics

Consider the mean dynamics, ¯ x(k + 1) = ¯ x(k) + B(k)⊤ ¯ u(k), (9) ¯ x(k): expected cumulative power consumption ∈ R, B(k): the vector of power loads ∈ RN, ¯ u(k): the vector of expected choice ∈ RN. The choice of each participant is binary; participate (un(k) = 0) or non-participate (un(k) = 1). We assume knowledge of each participant’s power load discrete choice model parameters

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 18 / 35

slide-26
SLIDE 26

Formulating Optimization Problem

min

Z∈RT T

  • k=1

k−1

  • m=0

B(m)⊤ ¯ u(m)

  • + λZ2

(10) s.t. x(0) = x0 = 0 B(m) = [b1(m) b2(m) · · · bN(m)]⊤ ∈ RN×1 (11) ¯ u(m) = [¯ u1(m), ¯ u2(m), · · · , ¯ uN(m)]⊤ ∈ RN×1 (12) ¯ un(m) = eβ(n)

m1zm+γ(n) m1

eβ(n)

m0zm+γ(n) m0 + eβ(n) m1zm+γ(n) m1

(13) zm(β(n)

m0 − β(n) m1) ≥ γ(n) m1 − γ(n) m0,

(14) where Z = [z0, z1, · · · , zT−1]⊤, bn(m) is the power load of the n-th participant at time-step m, and λ is the regularization parameter penalizing price compensations–i.e. control effort.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 19 / 35

slide-27
SLIDE 27

Simulation Result

1 2 3 4 5 6 7 8 9 10

Operating hour

5 10 15 20 25 30

Average reducible power (kW)

0.95 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45

Price compensation ($/kW) Optimal Price Compensation (100 Trials)

Avg reducible power Price compensation

Figure: Optimal price compensation with total operating hours T = 10 and number of participants N = 5. Exogenous attributes wmi, i ∈ {0, 1}, were sampled randomly from Gaussian.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 20 / 35

slide-28
SLIDE 28

Simulation Result

37 37.2 37.4 37.6 37.8 38 38.2 38.4 38.6 38.8 11.6 11.8 12 12.2 12.4 12.6 12.8 13 13.2

Tradeoff between loads and price compensation (100 Trials) Price compensation Sum of expected non-complied power loads

Figure: Trade-off between the sum of expected power load and price

  • compensation. The compensation penalty parameter was set to λ = 10.
  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 21 / 35

slide-29
SLIDE 29

Conclusion

Contributions: System modeling framework of dynamical systems with discrete choice model. Formulating a convex optimization problem for the control of the system.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 22 / 35

slide-30
SLIDE 30

Conclusion

Contributions: System modeling framework of dynamical systems with discrete choice model. Formulating a convex optimization problem for the control of the system. Limitations: The convexity constraints confine the optimal solutions to reside within the constrained domain. Collecting reasonable amount of data to model human behavior can be quite difficult.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 22 / 35

slide-31
SLIDE 31

Conclusion

Contributions: System modeling framework of dynamical systems with discrete choice model. Formulating a convex optimization problem for the control of the system. Limitations: The convexity constraints confine the optimal solutions to reside within the constrained domain. Collecting reasonable amount of data to model human behavior can be quite difficult.

Related study: S. Bae, S.M. Han, S.J. Moura, “Modeling and Control of Human Actuated Systems”, Submitted to IFAC on Cyber Physical and Human Systems, 2018.

Contact: sangjae.bae@berkeley.edu

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 22 / 35

slide-32
SLIDE 32

Overview

1

Literature

2

System Modeling and Optimization of HADS System modeling with Discrete Choice Model Convex Optimization Framework Application: Demand Response

3

Appendix: Proofs of Theorem and Propositions

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 23 / 35

slide-33
SLIDE 33

Proof of Theorem: Convexity Constraints

In the binomial logit model, the expected value of the input random variable ¯ u(m) is ¯ u(m) = eβm1zm+γm1 eβm0zm+γm0 + eβm1zm+γm1 . (15) Define ˜ βm . = βm0 − βm1 and ˜ γm . = γm0 − γm1. Then, the first derivative of ¯ u(m) can be written ∂ ¯ u(m) ∂zm = −˜ βmezm ˜

βm+˜ γm

  • 1 + ezm ˜

βm+˜ γm

2 , (16) and the second derivative of ¯ u(m) can be written ∂2 ¯ u(m) ∂z2

m

= ˜ β2

me(zm ˜ βm+˜ γm)

(1 + e(zm ˜

βm+˜ γm))2

2e(zm ˜

βm+˜ γm)

1 + e(zm ˜

βm+˜ γm) − 1

  • .

(17) If ∂2 ¯

u(m) ∂z2

m

≥ 0, ¯ u(m) is convex in zm.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 24 / 35

slide-34
SLIDE 34

Proof of Theorem (cont’d)

Observe that ∂2 ¯ u(m) ∂z2

m

≥ 0 ⇐ ⇒ 2e(zm ˜

βm+˜ γm)

1 + e(zm ˜

βm+˜ γm) − 1 ≥ 0

(18) if and only if ezm ˜

βm+˜ γm ≥ 1

⇐ ⇒ zm ˜ βm ≥ −˜ γm. (19) Therefore, the necessary and sufficient condition for convexity is zm(βm0 − βm1) ≥ γm1 − γm0. (20) The objective function f (Z) in (6) is convex with respect to zm under the above constraint because a non-negative weighted sum of convex functions is convex. Moreover, the set of constraints (20) forms a convex set since it is affine in zm. Consequently, we have a convex program and the proof is complete.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 25 / 35

slide-35
SLIDE 35

Proposition 1: Wide-sense stationarity

Proposition (Wide-sense stationarity)

Assume the inputs u(k) are i.i.d. In the limit k → ∞, the states x(k) are wide-sense stationary if |λmax(A)| < 1, where λmax(A) denotes the largest eigenvalue of A.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 26 / 35

slide-36
SLIDE 36

Proof of Proposition 1: Wide-sense stationarity

Without loss of generality, let A = a and B = b, a, b ∈ R. The system is said to be wide-sense stationary if the first moment of the states does not vary with respect to time and the autocovariance between states at two time-steps can be written as a function of only the time difference. The closed-form, analytic solution to the discrete time-invariant dynamical system with random variable input is x(k) = akx(0) +

k−1

  • i=0

ak−i−1bu(i). (21) In the limit k → ∞, the expected value of the state (21) is lim

k→∞ E[x(k)] = lim k→∞ E

k−1

  • i=0

ak−i−1bu(i)

  • = bµ lim

k→∞ k−1

  • i=0

ak−i−1 = bµ 1 − a

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 27 / 35

slide-37
SLIDE 37

Proof of Proposition 1 (cont’d)

where µ is the mean of u(k) over all time-steps k. This indicates that the mean is constant over time. The covariance of the analytical solution with respect to the two time-steps k1 and k2, k1 ≤ k2, is Cxx(k1, k2) = Cov

  • b

k1−1

  • i=0

ak1−i−1u(i), b

k2−1

  • j=0

ak2−j−1u(j)

  • = b2

k1−1

  • i=0

ak1+k2−2i−2σ2 = b2aτσ2 1 − a2 , (22) where σ is the variance of u(k) over all time-steps k, and τ is the time delay, i.e. τ . = k2 − k1.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 28 / 35

slide-38
SLIDE 38

Proposition 2: Mean and variance bounds

Proposition (Maximum mean and variance bounds)

In the limit k → ∞, the absolute value of the expectation and the variance

  • f the above system are upper-bounded by
  • bµmax

1−a

  • and b2σ2

max

1−a2 , respectively,

where µmax is the maximum mean, and σ2

max is the maximum variance of

the input random variables over all time-steps k.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 29 / 35

slide-39
SLIDE 39

Proof of Proposition 2: Mean and variance bounds

In the limit k → ∞, the absolute value of the expected value of the analytic solution to the state (21) is lim

k→∞ |E[x(k)]| = lim k→∞

  • E

k−1

  • i=0

ak−i−1bu(i)

  • ≤ |bµmax| lim

k→∞

  • k−1
  • i=0

ak−i−1

  • =
  • bµmax

1 − a

  • ,

(23) where µmax is the maximum mean of u(k) over all time-steps k.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 30 / 35

slide-40
SLIDE 40

Proof of Proposition 2 (cont’d)

In the limit k → ∞, the variance of (21) is lim

k→∞ Var (x(k)) = lim k→∞ Var

k−1

  • n=0

ak−i−1bu(i)

  • ≤ b2σ2

max lim k→∞ a2(k−1) k−1

  • i=0

a−2i = b2σ2

max

1 − a2 , (24) where the second equality follows from the independence of the random variable inputs, and σ2

max is the maximum variance of u(k) over all

time-steps k.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 31 / 35

slide-41
SLIDE 41

Proof of Proposition 3: Gradient of Objective function

We show by induction that the objective function can be written f (Z) =

T−1

  • m=0

T−m−1

  • i=0

ai

  • b(m)¯

u(m). (25) For the base cases, let T = 1 and T = 2. Note that for T = 1, f (Z) = b(0)¯ u(0) =

  • m=0
  • i=0

ai

  • b(m)¯

u(m), (26) and for T = 2, f (Z) = b(0)¯ u(0) + ab(0)¯ u(0) + b(1)¯ u(1) = (a0 + a1) (b(0)¯ u(0)) + a0 (b(1)¯ u(1)) =

1

  • m=0

1−m

  • i=0

ai

  • b(m)¯

u(m), (27) where ¯ u(m) is defined as above for all m = 0, . . . , T − 1.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 32 / 35

slide-42
SLIDE 42

Proof of Proposition 3 (cont’d)

As the induction hypothesis, assume that for T = M, f (Z) =

M−1

  • m=0

M−m−1

  • i=0

ai

  • b(m)¯

u(m) (28) holds true. Then, for T = M + 1 we have f (Z) =

M

  • m=0

M−m

  • i=0

ai

  • b(m)¯

u(m), (29) and we are done. Because f (Z) is dependent on zm only through ¯ u(m), we can simply ignore all other terms that do not have zm and substitute the expression we obtained for ∂ ¯ u(m)/∂zm in (16) into our expression for f (Z), which yields ∂f (Z)/∂zm every m = 0, . . . , T − 1. This concludes the proof.

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 33 / 35

slide-43
SLIDE 43

References

  • L. Arnold (2013)

Random dynamical systems Springer Science & Business Media

  • R. Gray, L. D. Davisson (2004)

An introduction to statistical signal processing Cambridge University Press

  • G. Maruyama (1955)

Continuous markov processes and stochastic equations Rendiconti del Circolo Matematico di Palermo

  • A. E. Leeper et. al. (2012)

Strategies for human-in-the-loop robotics grasping 17th ACM/IEEE international conference on Human-Robot Interaction

  • V. Mnih et. al. (2015)

Human-level control through deep reinforcement learning Nature

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 34 / 35

slide-44
SLIDE 44

References

K.E. Train (2009) Discrete Choice Methods with Simulation Cambridge University Press

  • S. Bae, S. Han, S. J. Moura (2018)

System Analysis and Optimization of Human Actuated Dynamical Systems 2018 American Control Conference

  • S. Bae et al (UC Berkeley)

2018 TBSI ene2XX July 5, 2018 35 / 35