E XPLORATION DE DONN EES POUR L OPTIMISATION DE TRAJECTOIRES A - - PowerPoint PPT Presentation

e xploration de donn
SMART_READER_LITE
LIVE PREVIEW

E XPLORATION DE DONN EES POUR L OPTIMISATION DE TRAJECTOIRES A - - PowerPoint PPT Presentation

E XPLORATION DE DONN EES POUR L OPTIMISATION DE TRAJECTOIRES A ERIENNES C edric Rommel Directeurs de th` ese: Fr ed eric Bonnans, Pierre Martinon Encadrant Safety Line: Baptiste Gregorutti Soutenance de th` ese, 26 octobre


slide-1
SLIDE 1

1/49

EXPLORATION DE DONN´

EES POUR L’OPTIMISATION DE TRAJECTOIRES A´ ERIENNES

C´ edric Rommel

Directeurs de th` ese: Fr´ ed´ eric Bonnans, Pierre Martinon Encadrant Safety Line: Baptiste Gregorutti

Soutenance de th` ese, 26 octobre 2018

slide-2
SLIDE 2

2/49

CONTEXT

slide-3
SLIDE 3

3/49

MOTIVATION

FIGURE: World air traffic - source: www.flightradar24.com

slide-4
SLIDE 4

3/49

MOTIVATION

FIGURE: World air traffic - source: www.flightradar24.com

20 000 airplanes — 80 000 flights per day,

slide-5
SLIDE 5

3/49

MOTIVATION

FIGURE: World air traffic - source: www.flightradar24.com

20 000 airplanes — 80 000 flights per day, Should double until 2033,

slide-6
SLIDE 6

4/49

MOTIVATION

Most polluting means of transportation,

slide-7
SLIDE 7

4/49

MOTIVATION

Most polluting means of transportation, Responsible for 3% of CO2 emissions,

slide-8
SLIDE 8

4/49

MOTIVATION

Most polluting means of transportation, Responsible for 3% of CO2 emissions,

slide-9
SLIDE 9

4/49

MOTIVATION

Most polluting means of transportation, Responsible for 3% of CO2 emissions, Fuel ≃ 30% of an airline operational cost,

slide-10
SLIDE 10

5/49

MOTIVATION

How to tackle this problem ?

slide-11
SLIDE 11

5/49

MOTIVATION

How to tackle this problem ?

1 New hardware ?

slide-12
SLIDE 12

5/49

MOTIVATION

How to tackle this problem ?

1 New hardware ? 2 Better use of existing fleet,

slide-13
SLIDE 13

5/49

MOTIVATION

How to tackle this problem ?

1 New hardware ? 2 Better use of existing fleet,

Climb is the most consuming flight phase...

slide-14
SLIDE 14

5/49

MOTIVATION

How to tackle this problem ?

1 New hardware ? 2 Better use of existing fleet,

Climb is the most consuming flight phase... Mostly rectilinear trajectories at full thrust,

slide-15
SLIDE 15

5/49

MOTIVATION

How to tackle this problem ?

1 New hardware ? 2 Better use of existing fleet,

Climb is the most consuming flight phase... Mostly rectilinear trajectories at full thrust, Thousands of variables recorded every second,

slide-16
SLIDE 16

5/49

MOTIVATION

How to tackle this problem ?

1 New hardware ? 2 Better use of existing fleet,

Climb is the most consuming flight phase... Mostly rectilinear trajectories at full thrust, Thousands of variables recorded every second,

slide-17
SLIDE 17

6/49

OPTICLIMB

slide-18
SLIDE 18

6/49

OPTICLIMB

slide-19
SLIDE 19

6/49

OPTICLIMB

slide-20
SLIDE 20

6/49

OPTICLIMB

slide-21
SLIDE 21

6/49

OPTICLIMB

slide-22
SLIDE 22

6/49

OPTICLIMB

slide-23
SLIDE 23

6/49

OPTICLIMB

slide-24
SLIDE 24

6/49

OPTICLIMB

slide-25
SLIDE 25

7/49

TRAJECTORY OPTIMIZATION

Dynamics: ˙ x(t) = g(✉(t), ①(t)) + ε(t) ✉ ① ① ① ✉ ① ✉ ①

slide-26
SLIDE 26

7/49

TRAJECTORY OPTIMIZATION

Dynamics: ˙ x(t) = g(✉(t), ①(t)) + ε(t) Optimization objective: tf

0 C(✉(t), ①(t))dt

① ① ✉ ① ✉ ①

slide-27
SLIDE 27

7/49

TRAJECTORY OPTIMIZATION

Dynamics: ˙ x(t) = g(✉(t), ①(t)) + ε(t) Optimization objective: tf

0 C(✉(t), ①(t))dt ⇐

= , ① ① ✉ ① ✉ ①

slide-28
SLIDE 28

7/49

TRAJECTORY OPTIMIZATION

Dynamics: ˙ x(t) = g(✉(t), ①(t)) + ε(t) Optimization objective: tf

0 C(✉(t), ①(t))dt ⇐

= , , ① ① ✉ ① ✉ ①

slide-29
SLIDE 29

7/49

TRAJECTORY OPTIMIZATION

Dynamics: ˙ x(t) = g(✉(t), ①(t)) + ε(t) Optimization objective: tf

0 C(✉(t), ①(t))dt ⇐

= , , Flight constraints:    Φ(①(0), ①(tf )) ∈ KΦ Initial and final conditions ✉(t) ∈ Uad, ①(t) ∈ Xad, Flight domain c(✉(t), ①(t)) ≤ 0, Operational path constraints

slide-30
SLIDE 30

8/49

TRAJECTORY OPTIMIZATION

Optimal Control Problem

min

(①,✉)∈X×U

tf

C(✉(t), ①(t))dt, s.t.        ˙ ①(t) = g(✉(t), ①(t)) + ε(t), a.e. t ∈ [0, tf ], Φ(①(0), ①(tf )) ∈ KΦ, ✉(t) ∈ Uad, ①(t) ∈ Xad, a.e. t ∈ [0, tf ], c(✉(t), ①(t)) ≤ 0, a.e. t ∈ [0, tf ]. (OCP)

slide-31
SLIDE 31

8/49

TRAJECTORY OPTIMIZATION

Optimal Control Problem

min

(①,✉)∈X×U

tf

C(✉(t), ①(t))dt, s.t.        ˙ ①(t) = g(✉(t), ①(t)) + ε(t), a.e. t ∈ [0, tf ], Φ(①(0), ①(tf )) ∈ KΦ, ✉(t) ∈ Uad, ①(t) ∈ Xad, a.e. t ∈ [0, tf ], c(✉(t), ①(t)) ≤ 0, a.e. t ∈ [0, tf ]. (OCP)

slide-32
SLIDE 32

8/49

TRAJECTORY OPTIMIZATION

Optimal Control Problem

min

(①,✉)∈X×U

tf

C(✉(t), ①(t))dt, s.t.        ˙ ①(t) = g(✉(t), ①(t)) + ε(t), a.e. t ∈ [0, tf ], Φ(①(0), ①(tf )) ∈ KΦ, ✉(t) ∈ Uad, ①(t) ∈ Xad, a.e. t ∈ [0, tf ], c(✉(t), ①(t)) ≤ 0, a.e. t ∈ [0, tf ]. (OCP)

System Identification Black box QAR data

slide-33
SLIDE 33

8/49

TRAJECTORY OPTIMIZATION

Approximate Optimal Control Problem

min

(①,✉)∈X×U

tf

C(✉(t), ①(t))dt, s.t.        ˙ ①(t) = ˆ g(✉(t), ①(t)), +ε(t) a.e. t ∈ [0, tf ], Φ(①(0), ①(tf )) ∈ KΦ, ✉(t) ∈ Uad, ①(t) ∈ Xad, a.e. t ∈ [0, tf ], c(✉(t), ①(t)) ≤ 0, a.e. t ∈ [0, tf ]. (A OCP)

System Identification Black box QAR data

slide-34
SLIDE 34

9/49

SYSTEM IDENTIFICATION

slide-35
SLIDE 35

10/49

1 Context - Chapter 1 2 System Identification - Chapter 4 3 Trajectory Acceptability - Chapters 5 and 6

slide-36
SLIDE 36

11/49

SYSTEM IDENTIFICATION

slide-37
SLIDE 37

12/49

FLIGHT DYNAMICS

slide-38
SLIDE 38

12/49

FLIGHT DYNAMICS

                       ˙ h = V sin γ + ˙ Wz ˙ V = T cos α − D − mg sin γ − m ˙ Wxv m ˙ γ = (T sin α + L) cos µ − mg cos γ − m ˙ Wzv mV ˙ m = − T Isp

slide-39
SLIDE 39

12/49

FLIGHT DYNAMICS

                       ˙ h = V sin γ + ˙ Wz ˙ V = T cos α − D − mg sin γ − m ˙ Wxv m ˙ γ = (T sin α + L) cos µ − mg cos γ − m ˙ Wzv mV ˙ m = − T Isp

slide-40
SLIDE 40

12/49

FLIGHT DYNAMICS

                       ˙ h = V sin γ + ˙ Wz ˙ V = T cos α − D − mg sin γ − m ˙ Wxv m ˙ γ = (T sin α + L) cos µ − mg cos γ − m ˙ Wzv mV ˙ m = − T Isp

slide-41
SLIDE 41

12/49

FLIGHT DYNAMICS

States: ① = (h, V , γ, m)

                       ˙ h = V sin γ + ˙ Wz ˙ V = T cos α − D − mg sin γ − m ˙ Wxv m ˙ γ = (T sin α + L) cos µ − mg cos γ − m ˙ Wzv mV ˙ m = − T Isp

slide-42
SLIDE 42

12/49

FLIGHT DYNAMICS

States: ① = (h, V , γ, m) Controls: ✉ = (α, N1)

                       ˙ h = V sin γ + ˙ Wz ˙ V = T cos α − D − mg sin γ − m ˙ Wxv m ˙ γ = (T sin α + L) cos µ − mg cos γ − m ˙ Wzv mV ˙ m = − T Isp

slide-43
SLIDE 43

12/49

FLIGHT DYNAMICS

States: ① = (h, V , γ, m) Controls: ✉ = (α, N1) Unknown functions of ①, ✉

                       ˙ h = V sin γ + ˙ Wz ˙ V = T cos α − D − mg sin γ − m ˙ Wxv m ˙ γ = (T sin α + L) cos µ − mg cos γ − m ˙ Wzv mV ˙ m = − T Isp

slide-44
SLIDE 44

12/49

FLIGHT DYNAMICS

States: ① = (h, V , γ, m) Controls: ✉ = (α, N1) Unknown functions of ①, ✉

                       ˙ h = V sin γ + ˙ Wz ˙ V = T(✉, ①) cos α − D(✉, ①) − mg sin γ − m ˙ Wxv m ˙ γ = (T(✉, ①) sin α + L(✉, ①)) cos µ − mg cos γ − m ˙ Wzv mV ˙ m = − T(✉, ①) Isp(✉, ①)

slide-45
SLIDE 45

13/49

PHYSICAL MODELS OF NESTED FUNCTIONS

           T function of (N1, M, ρ) = ϕT(①, ✉) D function of (q, M, α) = ϕD(①, ✉) L function of (q, M, α) = ϕL(①, ✉) Isp function of (SAT, M, h) = ϕIsp(①, ✉)

slide-46
SLIDE 46

13/49

PHYSICAL MODELS OF NESTED FUNCTIONS

           T function of (N1, M, ρ) = ϕT(①, ✉) D function of (q, M, α) = ϕD(①, ✉) L function of (q, M, α) = ϕL(①, ✉) Isp function of (SAT, M, h) = ϕIsp(①, ✉)

slide-47
SLIDE 47

13/49

PHYSICAL MODELS OF NESTED FUNCTIONS

           T(①, ✉, θT) = N1 × PT(ρ, M) = XT · θT D(①, ✉, θD) = q × PD(α, M) = XD · θD L(①, ✉, θL) = q × PL(α, M) = XL · θL Isp(①, ✉, θIsp) = SAT × PIsp(h, M) = XIsp · θIsp

slide-48
SLIDE 48

13/49

PHYSICAL MODELS OF NESTED FUNCTIONS

           T(①, ✉, θT) = N1 × PT(ρ, M) = XT · θT D(①, ✉, θD) = q × PD(α, M) = XD · θD L(①, ✉, θL) = q × PL(α, M) = XL · θL Isp(①, ✉, θIsp) = SAT × PIsp(h, M) = XIsp · θIsp XT = N1            1 ρ M ρ2 ρM M2 . . .            , XD = XL = q            1 α M α2 αM M2 . . .            , XIsp = SAT            1 h M h2 hM M2 . . .            .

slide-49
SLIDE 49

14/49

STATE-OF-THE-ART - [JATEGAONKAR, 2006]

Output-Error Method Filter-Error Method

slide-50
SLIDE 50

14/49

STATE-OF-THE-ART - [JATEGAONKAR, 2006]

Output-Error Method Filter-Error Method

  • Less scalable to many trajectories
slide-51
SLIDE 51

14/49

STATE-OF-THE-ART - [JATEGAONKAR, 2006]

Output-Error Method Filter-Error Method

  • Less scalable to many trajectories

Equation-Error Method ˙ ①(t) = g(✉(t), ①(t), θ) + ε(t), t ∈ [0, tf ]

slide-52
SLIDE 52

14/49

STATE-OF-THE-ART - [JATEGAONKAR, 2006]

Output-Error Method Filter-Error Method

  • Less scalable to many trajectories

Equation-Error Method ˙ ①i = g(✉i, ①i, θ) + εi, i = 1, . . . , N

slide-53
SLIDE 53

14/49

STATE-OF-THE-ART - [JATEGAONKAR, 2006]

Output-Error Method Filter-Error Method

  • Less scalable to many trajectories

Equation-Error Method min

θ N

i=1

  • ˙

①i, g(✉i, ①i, θ)

slide-54
SLIDE 54

14/49

STATE-OF-THE-ART - [JATEGAONKAR, 2006]

Output-Error Method Filter-Error Method

  • Less scalable to many trajectories

Equation-Error Method Ex: (Nonlinear) Least-Squares min

θ N

i=1

  • ˙

①i − g(✉i, ①i, θ)

  • 2

2

slide-55
SLIDE 55

14/49

STATE-OF-THE-ART - [JATEGAONKAR, 2006]

Output-Error Method Filter-Error Method

  • Less scalable to many trajectories

Equation-Error Method Ex: (Nonlinear) Least-Squares min

θ N

i=1

  • Y (✉i, ①i, ˙

①i) − G(✉i, ①i, ˙ ①i, θ)

  • 2

2

slide-56
SLIDE 56

15/49

LEVERAGING THE DYNAMICS STRUCTURE

                           ˙ h = V sin γ ˙ V = T(✉, ①, θT) cos α − D(✉, ①, θD) − mg sin γ m ˙ γ = T(✉, ①, θT) sin α + L(✉, ①, θL) − mg cos γ mV ˙ m = − T(✉, ①, θT) Isp(✉, ①, θIsp)

slide-57
SLIDE 57

15/49

LEVERAGING THE DYNAMICS STRUCTURE

                           ˙ h = V sin γ ˙ V = T(✉, ①, θT) cos α − D(✉, ①, θD) − mg sin γ m ˙ γ = T(✉, ①, θT) sin α + L(✉, ①, θL) − mg cos γ mV ˙ m = − T(✉, ①, θT) Isp(✉, ①, θIsp) Nonlinear in states and controls

slide-58
SLIDE 58

15/49

LEVERAGING THE DYNAMICS STRUCTURE

                           ˙ h = V sin γ ˙ V = T(✉, ①, θT) cos α − D(✉, ①, θD) − mg sin γ m ˙ γ = T(✉, ①, θT) sin α + L(✉, ①, θL) − mg cos γ mV ˙ m = − T(✉, ①, θT) Isp(✉, ①, θIsp) Nonlinear in states and controls Nonlinear in parameters

slide-59
SLIDE 59

15/49

LEVERAGING THE DYNAMICS STRUCTURE

                           ˙ h = V sin γ m ˙ V + mg sin γ = T(✉, ①, θT) cos α − D(✉, ①, θD) mV ˙ γ + mg cos γ = T(✉, ①, θT) sin α + L(✉, ①, θL) 0 = T(✉, ①, θT) + ˙ mIsp(✉, ①, θIsp) Nonlinear in states and controls Nonlinear in parameters

slide-60
SLIDE 60

15/49

LEVERAGING THE DYNAMICS STRUCTURE

                           ˙ h = V sin γ m ˙ V + mg sin γ = (XT · θT) cos α − XD · θD + ε1 mV ˙ γ + mg cos γ = (XT · θT) sin α + XL · θL + ε2 0 = XT · θT + ˙ m(XIsp · θIsp) + ε3 mV ˙ γ + mg cos γ

  • Y (✉,①, ˙

①)

= (XT · θT) sin α + XL · θL

  • G(✉,①, ˙

①,θ)

+ε2 Nonlinear in states and controls Nonlinear in parameters → Linear in parameters

slide-61
SLIDE 61

15/49

LEVERAGING THE DYNAMICS STRUCTURE

                           ˙ h = V sin γ m ˙ V + mg sin γ = (XT · θT) cos α − XD · θD + ε1 mV ˙ γ + mg cos γ = (XT · θT) sin α + XL · θL + ε2 0 = XT · θT + ˙ m(XIsp · θIsp) + ε3 mV ˙ γ + mg cos γ

  • Y (✉,①, ˙

①)

= (XT · θT) sin α + XL · θL

  • G(✉,①, ˙

①,θ)

+ε2 Nonlinear in states and controls Nonlinear in parameters → Linear in parameters

slide-62
SLIDE 62

15/49

LEVERAGING THE DYNAMICS STRUCTURE

                           ˙ h = V sin γ Y1 = XT1 · θT − XD · θD + ε1 Y2 = XT2 · θT + XL · θL + ε2 Y3 = XT · θT + XIspm · θIsp + ε3 Nonlinear in states and controls Nonlinear in parameters → Linear in parameters

slide-63
SLIDE 63

15/49

LEVERAGING THE DYNAMICS STRUCTURE

                           ˙ h = V sin γ Y1 = XT1 · θT − XD · θD + ε1 Y2 = XT2 · θT + XL · θL + ε2 Y3 = XT · θT + XIspm · θIsp + ε3 Nonlinear in states and controls Nonlinear in parameters → Linear in parameters Structured Coupling

slide-64
SLIDE 64

15/49

LEVERAGING THE DYNAMICS STRUCTURE

                           ˙ h = V sin γ Y1 = XT1 · θT − XD · θD + ε1 Y2 = XT2 · θT + XL · θL + ε2 Y3 = XT · θT + XIspm · θIsp + ε3 Nonlinear in states and controls Nonlinear in parameters → Linear in parameters Structured Coupling Multi-task Learning

slide-65
SLIDE 65

16/49

MULTI-TASK REGRESSION

Aircraft:

       Y1 = XT1 · θT − XD · θD + ε1 Y2 = XT2 · θT + XL · θL + ε2 Y3 = XT · θT + XIspm · θIsp + ε3

General:

             Y1 = Xc,1 · θc + X1 · θ1 + ε1 Y2 = Xc,2 · θc + X2 · θ2 + ε2 . . . . . . YK = Xc,K · θc + XK · θK + εK

Coupling parameters , Task specific parameters

Many other examples: Giant squid neurons [FitzHugh, 1961, Nagumo et al., 1962], Susceptible-infectious-recovered models [Anderson and May, 1992], Mechanical systems,...

slide-66
SLIDE 66

16/49

MULTI-TASK REGRESSION

Aircraft:

       Y1 = XT1 · θT − XD · θD + ε1 Y2 = XT2 · θT + XL · θL + ε2 Y3 = XT · θT + XIspm · θIsp + ε3

General:

             Y1 = Xc,1 · θc + X1 · θ1 + ε1 Y2 = Xc,2 · θc + X2 · θ2 + ε2 . . . . . . YK = Xc,K · θc + XK · θK + εK

Coupling parameters , Task specific parameters

Multi-task Linear Least-Squares: min

θ K

k=1 N

i=1

(Yk,i − Xc,k,i · θc − Xk,i · θk) 2

slide-67
SLIDE 67

16/49

MULTI-TASK REGRESSION

Aircraft:

       Y1 = XT1 · θT − XD · θD + ε1 Y2 = XT2 · θT + XL · θL + ε2 Y3 = XT · θT + XIspm · θIsp + ε3

General:

             Y1 = Xc,1 · θc + X1 · θ1 + ε1 Y2 = Xc,2 · θc + X2 · θ2 + ε2 . . . . . . YK = Xc,K · θc + XK · θK + εK

Coupling parameters , Task specific parameters

Multi-task Linear Least-Squares: Block-sparse Coupling Structure min

θ N

i=1

  Y1,i . . . YK,i    −        X ⊤

c,1,i

X ⊤

1,i

. . . X ⊤

c,2,i

X ⊤

2,i

. . . . . . ... X ⊤

c,K,i

. . . X ⊤

K,i

            θc θ1 . . . θK     

  • 2

2

slide-68
SLIDE 68

16/49

MULTI-TASK REGRESSION

Aircraft:

       Y1 = XT1 · θT − XD · θD + ε1 Y2 = XT2 · θT + XL · θL + ε2 Y3 = XT · θT + XIspm · θIsp + ε3

General:

             Y1 = Xc,1 · θc + X1 · θ1 + ε1 Y2 = Xc,2 · θc + X2 · θ2 + ε2 . . . . . . YK = Xc,K · θc + XK · θK + εK

Coupling parameters , Task specific parameters

Multi-task Linear Least-Squares: min

θ N

i=1

Yi − Xiθ2

2

with θ = (θc, θ1, . . . , θK) ∈ Rp, p = dc + ∑K

k=1 dk,

Yi ∈ RK and Xi ∈ RK×p.

slide-69
SLIDE 69

17/49

FEATURE SELECTION

Our model: T =N1(θT,1 + θT,2ρ + θT,3M + θT,4ρ2 + θT,5ρM + θT,6M2+ θT,7ρ3 + θT,8ρ2M + θT,9ρM2 + θT,10M3 + θT,11ρ4+ θT,12ρ3M + θT,13ρ2M2 + θT,14ρM3 + θT,15M4). Mattingly’s model [Mattingly et al., 1992]: T = N1(θT,1ρ + θT,2ρM3).

slide-70
SLIDE 70

17/49

FEATURE SELECTION

Our model: T =N1(θT,1 + θT,2ρ + θT,3M + θT,4ρ2 + θT,5ρM + θT,6M2+ θT,7ρ3 + θT,8ρ2M + θT,9ρM2 + θT,10M3 + θT,11ρ4+ θT,12ρ3M + θT,13ρ2M2 + θT,14ρM3 + θT,15M4). Mattingly’s model [Mattingly et al., 1992]: T = N1(θT,1ρ + θT,2ρM3). ⇒ High risk of overfitting

slide-71
SLIDE 71

17/49

FEATURE SELECTION

Our (sparse) model: T =N1(θT,1 + θT,2ρ + θT,3M + θT,4ρ2 + θT,5ρM + θT,6M2+ θT,7ρ3 + θT,8ρ2M + θT,9ρM2 + θT,10M3 + θT,11ρ4+ θT,12ρ3M + θT,13ρ2M2 + θT,14ρM3 + θT,15M4). Mattingly’s model [Mattingly et al., 1992]: T = N1(θT,1ρ + θT,2ρM3). ⇒ High risk of overfitting

slide-72
SLIDE 72

17/49

FEATURE SELECTION

Our (sparse) model: T =N1(θT,1 + θT,2ρ + θT,3M + θT,4ρ2 + θT,5ρM + θT,6M2+ θT,7ρ3 + θT,8ρ2M + θT,9ρM2 + θT,10M3 + θT,11ρ4+ θT,12ρ3M + θT,13ρ2M2 + θT,14ρM3 + θT,15M4). Mattingly’s model [Mattingly et al., 1992]: T = N1(θT,1ρ + θT,2ρM3). Sparse models are: Less susceptible to overfitting, More compliant with physical models, More interpretable, Lighter/Faster.

slide-73
SLIDE 73

18/49

BLOCK-SPARSE LASSO

Lasso [Tibshirani, 1994]:

{(Xi, Yi)}N

i=1 ⊂ Rd+1 i.i.d sample,

min

θ N

i=1

(Yi − Xi · θ)2 + λθ1. FIGURE: 1Sparsity induced by L1 norm in Lasso.

Source : Wikipedia, Lasso(statistics)

slide-74
SLIDE 74

18/49

BLOCK-SPARSE LASSO

Block-sparse structure preserved min

θ K

k=1 N

i=1

(Yk,i − Xc,k,i · θc − Xk,i · θk) 2 + λcθc1 +

K

k=1

λkθk1

slide-75
SLIDE 75

18/49

BLOCK-SPARSE LASSO

Block-sparse structure preserved Equivalent to Lasso problem min

θ K

k=1 N

i=1

(Yk,i − Xc,k,i · θc − Xk,i · θk) 2 + λcθc1 +

K

k=1

λkθk1

slide-76
SLIDE 76

18/49

BLOCK-SPARSE LASSO

Block-sparse structure preserved Equivalent to Lasso problem min

β N

i=1

Yi − Bi β2

2 + λcβ1

with β = (θc, λ1

λc θ1, . . . , λK λc θK) ∈ Rp, p = dc + ∑K k=1 dk,

Yi ∈ RK and Bi ∈ RK×p.

slide-77
SLIDE 77

18/49

BLOCK-SPARSE LASSO

Block-sparse structure preserved Equivalent to Lasso problem min

θ N

i=1

Yi − Xiθ2

2 + λcθ1

with θ = (θc, λ1

λc θ1, . . . , λK λc θK) ∈ Rp, p = dc + ∑K k=1 dk,

Yi ∈ RK and Xi ∈ RK×p, In practice, we choose λk = λc, for all k = 1, . . . , 3 and Xi =   X ⊤

T1,i

−X ⊤

D,i

X ⊤

T2,i

X ⊤

L,i

X ⊤

T,i

X ⊤

Ispm,i

  , Yi =   Y1,i Y2,i Y3,i  

slide-78
SLIDE 78

19/49

BOOTSTRAP IMPLEMENTATION

High correlations between features...

slide-79
SLIDE 79

19/49

BOOTSTRAP IMPLEMENTATION

High correlations between features... ⇒ Inconsistent selections via the lasso !

slide-80
SLIDE 80

19/49

BOOTSTRAP IMPLEMENTATION

High correlations between features... ⇒ Inconsistent selections via the lasso !

Bolasso - Bach [2008] Require: training data T = {(Xi, Yi)}N

i=1 ⊂ RK×(K+1) × RK,

number of bootstrap replicates b, L1 penalty parameter λc,

1: for k = 1 to b do 2:

Generate bootstrap sample Tk,

3:

Compute Block sparse Lasso estimate ˆ θk from Tk,

4:

Compute support Jk = {j, ˆ θk

j = 0},

5: end for 6: Compute intersection J = b

k=1 Jk,

7: Compute ˆ

θJ from selected features using Least-Squares.

slide-81
SLIDE 81

19/49

BOOTSTRAP IMPLEMENTATION

High correlations between features... ⇒ Inconsistent selections via the lasso !

Bolasso - Bach [2008] Require: training data T = {(Xi, Yi)}N

i=1 ⊂ RK×(K+1) × RK,

number of bootstrap replicates b, L1 penalty parameter λc,

1: for k = 1 to b do 2:

Generate bootstrap sample Tk,

3:

Compute Block sparse Lasso estimate ˆ θk from Tk,

4:

Compute support Jk = {j, ˆ θk

j = 0},

5: end for 6: Compute intersection J = b

k=1 Jk,

7: Compute ˆ

θJ from selected features using Least-Squares.

Consistency even under high correlations proved in Bach [2008],

slide-82
SLIDE 82

19/49

BOOTSTRAP IMPLEMENTATION

High correlations between features... ⇒ Inconsistent selections via the lasso !

Bolasso - Bach [2008] Require: training data T = {(Xi, Yi)}N

i=1 ⊂ RK×(K+1) × RK,

number of bootstrap replicates b, L1 penalty parameter λc,

1: for k = 1 to b do 2:

Generate bootstrap sample Tk,

3:

Compute Block sparse Lasso estimate ˆ θk from Tk,

4:

Compute support Jk = {j, ˆ θk

j = 0},

5: end for 6: Compute intersection J = b

k=1 Jk,

7: Compute ˆ

θJ from selected features using Least-Squares.

Consistency even under high correlations proved in Bach [2008], Efficient implementations exist: LARS [Efron et al., 2004].

slide-83
SLIDE 83

20/49

PROBLEM WITH INTRA-GROUP CORRELATIONS

min

θ N

i=1

Yi − Xiθ2

2 + λcθ1 ⇒ ˆ

θ❚ = ˆ θ■s♣ = 0!

slide-84
SLIDE 84

20/49

PROBLEM WITH INTRA-GROUP CORRELATIONS

min

θ N

i=1

Yi − Xiθ2

2 + λcθ1 ⇒ ˆ

θ❚ = ˆ θ■s♣ = 0!      Y1 = XT1 · θT − XD · θD +ε1 Y2 = XT2 · θT + XL · θL +ε2 0 = XT · θT + XIspm · θIsp +ε3

slide-85
SLIDE 85

21/49

PROBLEM WITH INTRA-GROUP CORRELATIONS

FIGURE: Features correlations higher than 0.9 in absolute value in white.

slide-86
SLIDE 86

21/49

PROBLEM WITH INTRA-GROUP CORRELATIONS

FIGURE: Features correlations higher than 0.9 in absolute value in white.

⇒ θ → ∑N

i=1 Yi − Xiθ2 2 not

injective... Ill-posed problem !

slide-87
SLIDE 87

22/49

PROBLEM WITH INTRA-GROUP CORRELATIONS

         Y1 = XT1 · θT − XD · θD +ε1 Y2 = XT2 · θT + XL · θL +ε2 = XT · θT + XIspm · θIsp +ε3 λt ˜ Isp = λtXIsp · θIsp +ε4 min

θ N

i=1

Yi − Xiθ2

2 + λcθ1

Prior model ˜ Isp from Roux [2005] ˜ Isp,i = ˜ Isp(✉i, ①i), i = 1, . . . , N.

slide-88
SLIDE 88

22/49

PROBLEM WITH INTRA-GROUP CORRELATIONS

         Y1 = XT1 · θT − XD · θD +ε1 Y2 = XT2 · θT + XL · θL +ε2 = XT · θT + XIspm · θIsp +ε3 λt ˜ Isp = λtXIsp · θIsp +ε4 min

θ N

i=1

  • Yi − Xiθ2

2 + λt ˜

Isp,i − XIsp,i · θIsp2

2

  • + λcθ1

Prior model ˜ Isp from Roux [2005] ˜ Isp,i = ˜ Isp(✉i, ①i), i = 1, . . . , N.

slide-89
SLIDE 89

22/49

PROBLEM WITH INTRA-GROUP CORRELATIONS

         Y1 = XT1 · θT − XD · θD +ε1 Y2 = XT2 · θT + XL · θL +ε2 = XT · θT + XIspm · θIsp +ε3 √ λt ˜ Isp = √ λtXIsp · θIsp +ε4 min

θ N

i=1

  • Yi − Xiθ2

2 + λt ˜

Isp,i − XIsp,i · θIsp2

2

  • + λcθ1

Prior model ˜ Isp from Roux [2005] ˜ Isp,i = ˜ Isp(✉i, ①i), i = 1, . . . , N.

slide-90
SLIDE 90

22/49

PROBLEM WITH INTRA-GROUP CORRELATIONS

         Y1 = XT1 · θT − XD · θD +ε1 Y2 = XT2 · θT + XL · θL +ε2 = XT · θT + XIspm · θIsp +ε3 √ λt ˜ Isp = √ λtXIsp · θIsp +ε4 min

θ N

i=1

˜ Yi − ˜ Xiθ2

2 + λcθ1

˜ Yi =      Y1,i Y2,i √ λt ˜ Isp,i      , ˜ Xi =      X ⊤

T1,i −X ⊤ D,i

X ⊤

T2,i

X ⊤

L,i

X ⊤

T,i

X ⊤

Ispm,i

√ λtX ⊤

Isp,i

     , Prior model ˜ Isp from Roux [2005] ˜ Isp,i = ˜ Isp(✉i, ①i), i = 1, . . . , N.

slide-91
SLIDE 91

23/49

FEATURE SELECTION RESULTS

25 different B737-800, 10 471 flights = 8 261 619 observations,

slide-92
SLIDE 92

23/49

FEATURE SELECTION RESULTS

25 different B737-800, 10 471 flights = 8 261 619 observations, Block sparse Bolasso used for T, D, L and Isp, We expect similar model structures,

slide-93
SLIDE 93

23/49

FEATURE SELECTION RESULTS

Feature selection results for the thrust, drag, lift and specific impulse models.

slide-94
SLIDE 94

24/49

ACCURACY OF DYNAMICS PREDICTIONS

slide-95
SLIDE 95

25/49

REALISM OF HIDDEN ELEMENTS

slide-96
SLIDE 96

26/49

FLIGHT RESIMULATION

① ✉

✉ ✉

① ①

① ① ✉

✉ ①

slide-97
SLIDE 97

26/49

FLIGHT RESIMULATION

Last assessment criterion = static;

① ✉

✉ ✉

① ①

① ① ✉

✉ ①

slide-98
SLIDE 98

26/49

FLIGHT RESIMULATION

Last assessment criterion = static; Does not incorporate the fact that the observations are time dependent;

① ✉

✉ ✉

① ①

① ① ✉

✉ ①

slide-99
SLIDE 99

26/49

FLIGHT RESIMULATION

Last assessment criterion = static; Does not incorporate the fact that the observations are time dependent; Does not take into account the goal of optimally controlling the aircraft system.

① ✉

✉ ✉

① ①

① ① ✉

✉ ①

slide-100
SLIDE 100

26/49

FLIGHT RESIMULATION

Last assessment criterion = static; Does not incorporate the fact that the observations are time dependent; Does not take into account the goal of optimally controlling the aircraft system. Another possible dynamic criterion: min

(①,✉)

tn

t0

✉(t) − ✉test(t)2

✉ + ①(t) − ①test(t)2 ①

  • dt

s.t. ˙ ①(t) = g(①(t), ✉(t), ˆ θ), where · ✉, · ① denote scaling norms.

slide-101
SLIDE 101

26/49

FLIGHT RESIMULATION

Last assessment criterion = static; Does not incorporate the fact that the observations are time dependent; Does not take into account the goal of optimally controlling the aircraft system. Another possible dynamic criterion: min

(①,✉)

tn

t0

✉(t) − ✉test(t)2

✉ + ①(t) − ①test(t)2 ①

  • dt

s.t. ˙ ①(t) = g(①(t), ✉(t), ˆ θ), where · ✉, · ① denote scaling norms. For practical applications: t ↔ h

slide-102
SLIDE 102

27/49

FLIGHT RESIMULATION

slide-103
SLIDE 103

28/49

FLIGHT RESIMULATION

FIGURE: Distribution of the off-sample simulation error and boxplot of the optimization number of iterations and CPU time.

slide-104
SLIDE 104

29/49

SYSTEM IDENTIFICATION CONCLUSION

1 Proposed Equation-Error Method approaches which extend to

the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),

slide-105
SLIDE 105

29/49

SYSTEM IDENTIFICATION CONCLUSION

1 Proposed Equation-Error Method approaches which extend to

the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),

2 Applicable to large amounts of data,

slide-106
SLIDE 106

29/49

SYSTEM IDENTIFICATION CONCLUSION

1 Proposed Equation-Error Method approaches which extend to

the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),

2 Applicable to large amounts of data, 3 Block-sparse estimators are proved to lead to consistent

structured feature selection,

slide-107
SLIDE 107

29/49

SYSTEM IDENTIFICATION CONCLUSION

1 Proposed Equation-Error Method approaches which extend to

the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),

2 Applicable to large amounts of data, 3 Block-sparse estimators are proved to lead to consistent

structured feature selection,

4 Can be efficiently trained using LARS algorithm as they are

equivalent to successive Lasso problems,

slide-108
SLIDE 108

29/49

SYSTEM IDENTIFICATION CONCLUSION

1 Proposed Equation-Error Method approaches which extend to

the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),

2 Applicable to large amounts of data, 3 Block-sparse estimators are proved to lead to consistent

structured feature selection,

4 Can be efficiently trained using LARS algorithm as they are

equivalent to successive Lasso problems,

5 Compared to regular Nonlinear Least-Squares:

Similar performances in accuracy and training time,

slide-109
SLIDE 109

29/49

SYSTEM IDENTIFICATION CONCLUSION

1 Proposed Equation-Error Method approaches which extend to

the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),

2 Applicable to large amounts of data, 3 Block-sparse estimators are proved to lead to consistent

structured feature selection,

4 Can be efficiently trained using LARS algorithm as they are

equivalent to successive Lasso problems,

5 Compared to regular Nonlinear Least-Squares:

Similar performances in accuracy and training time, No initialization required,

slide-110
SLIDE 110

29/49

SYSTEM IDENTIFICATION CONCLUSION

1 Proposed Equation-Error Method approaches which extend to

the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),

2 Applicable to large amounts of data, 3 Block-sparse estimators are proved to lead to consistent

structured feature selection,

4 Can be efficiently trained using LARS algorithm as they are

equivalent to successive Lasso problems,

5 Compared to regular Nonlinear Least-Squares:

Similar performances in accuracy and training time, No initialization required, Light, interpretable and compact data-dependent models (more than 50% compression),

slide-111
SLIDE 111

29/49

SYSTEM IDENTIFICATION CONCLUSION

1 Proposed Equation-Error Method approaches which extend to

the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),

2 Applicable to large amounts of data, 3 Block-sparse estimators are proved to lead to consistent

structured feature selection,

4 Can be efficiently trained using LARS algorithm as they are

equivalent to successive Lasso problems,

5 Compared to regular Nonlinear Least-Squares:

Similar performances in accuracy and training time, No initialization required, Light, interpretable and compact data-dependent models (more than 50% compression), Faster convergence when applied to control problems.

slide-112
SLIDE 112

30/49

TRAJECTORY ACCEPTABILITY

slide-113
SLIDE 113

31/49

TRAJECTORY ACCEPTABILITY

min

(①,✉)∈X×U

tf

C(✉(t), ①(t))dt, s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (AOCP) ③ ① ✉ ③

slide-114
SLIDE 114

31/49

TRAJECTORY ACCEPTABILITY

min

(①,✉)∈X×U

tf

C(✉(t), ①(t))dt, s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (AOCP) ⇒ ˆ ③ = ( ˆ ①, ˆ ✉) solution of (AOCP). ③

slide-115
SLIDE 115

31/49

TRAJECTORY ACCEPTABILITY

min

(①,✉)∈X×U

tf

C(✉(t), ①(t))dt, s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (AOCP) ⇒ ˆ ③ = ( ˆ ①, ˆ ✉) solution of (AOCP). Is ˆ ③ inside the validity region of the dynamics model ˆ g ?

slide-116
SLIDE 116

31/49

TRAJECTORY ACCEPTABILITY

min

(①,✉)∈X×U

tf

C(✉(t), ①(t))dt, s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (AOCP) ⇒ ˆ ③ = ( ˆ ①, ˆ ✉) solution of (AOCP). Is ˆ ③ inside the validity region of the dynamics model ˆ g ? Does it look like a real trajectory ?

slide-117
SLIDE 117

31/49

TRAJECTORY ACCEPTABILITY

min

(①,✉)∈X×U

tf

C(✉(t), ①(t))dt, s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (AOCP) ⇒ ˆ ③ = ( ˆ ①, ˆ ✉) solution of (AOCP). Is ˆ ③ inside the validity region of the dynamics model ˆ g ? Does it look like a real trajectory ? Pilots acceptance Air Traffic Control2

slide-118
SLIDE 118

31/49

TRAJECTORY ACCEPTABILITY

min

(①,✉)∈X×U

tf

C(✉(t), ①(t))dt, s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (AOCP) ⇒ ˆ ③ = ( ˆ ①, ˆ ✉) solution of (AOCP). Is ˆ ③ inside the validity region of the dynamics model ˆ g ? Does it look like a real trajectory ? Pilots acceptance Air Traffic Control2 How can we quantify the closeness from the optimized trajectory to the set of real flights?

slide-119
SLIDE 119

32/49

OPTIMIZED TRAJECTORY LIKELIHOOD

Assumption: We suppose that the real flights are observations of the same functional random variable Z = (Zt) valued in C(T, E), with E compact subset of Rd and T = [0, tf ]. How likely is it to draw the optimized trajectory from the law

  • f Z ?
slide-120
SLIDE 120

33/49

HOW TO APPLY THIS TO FUNCTIONAL DATA?

Problem: Computation of probability densities in infinite dimensional space.

slide-121
SLIDE 121

33/49

HOW TO APPLY THIS TO FUNCTIONAL DATA?

Problem: Computation of probability densities in infinite dimensional space. Standard approach in Functional Data Analysis: use Functional Principal Component Analysis to decompose the data in a small number of coefficients ⇒

slide-122
SLIDE 122

33/49

HOW TO APPLY THIS TO FUNCTIONAL DATA?

Problem: Computation of probability densities in infinite dimensional space. Standard approach in Functional Data Analysis: use Functional Principal Component Analysis to decompose the data in a small number of coefficients Or: we can use the marginal densities

slide-123
SLIDE 123

34/49

HOW DO WE AGGREGATE THE MARGINAL

LIKELIHOODS?

ft marginal density of Z, i.e. probability density function of Zt, ② new trajectory, ft(②(t)) marginal likelihood of ② at t, i.e. likelihood of

  • bserving Zt = ②(t).

② ②

slide-124
SLIDE 124

34/49

HOW DO WE AGGREGATE THE MARGINAL

LIKELIHOODS?

ft marginal density of Z, i.e. probability density function of Zt, ② new trajectory, ft(②(t)) marginal likelihood of ② at t, i.e. likelihood of

  • bserving Zt = ②(t).

MEAN MARGINAL LIKELIHOOD MML(Z, ②) = 1 tf

tf

ψ[ft, ②(t)]dt, where ψ : L1(E, R+) × R → [0; 1] is a continuous scaling map,

slide-125
SLIDE 125

34/49

HOW DO WE AGGREGATE THE MARGINAL

LIKELIHOODS?

ft marginal density of Z, i.e. probability density function of Zt, ② new trajectory, ft(②(t)) marginal likelihood of ② at t, i.e. likelihood of

  • bserving Zt = ②(t).

MEAN MARGINAL LIKELIHOOD MML(Z, ②) = 1 tf

tf

ψ[ft, ②(t)]dt, where ψ : L1(E, R+) × R → [0; 1] is a continuous scaling map, because marginal densities may have really different shapes.

slide-126
SLIDE 126

35/49

HOW DO WE AGGREGATE THE MARGINAL

LIKELIHOODS?

Possible scalings are the normalized density ψ[ft, ②(t)] := ft(②(t)) max

z∈E ft(z),

② ②

slide-127
SLIDE 127

35/49

HOW DO WE AGGREGATE THE MARGINAL

LIKELIHOODS?

Possible scalings are the normalized density ψ[ft, ②(t)] := ft(②(t)) max

z∈E ft(z),

  • r the confidence level

ψ[ft, ②(t)] := P (ft(Zt) ≤ ft(②(t))) .

slide-128
SLIDE 128

36/49

HOW DO WE DEAL WITH SAMPLED CURVES?

In practice, the m trajectories are sampled at variable discrete times: T D := {(tr

j , zr j )} 1≤j≤n 1≤r≤m

⊂ T × E, zr

j := ③r(tr j ),

Y := {(˜ tj, yj)} ˜

n j=1 ⊂ T × E,

yj := ②(˜ tj).

slide-129
SLIDE 129

36/49

HOW DO WE DEAL WITH SAMPLED CURVES?

In practice, the m trajectories are sampled at variable discrete times: T D := {(tr

j , zr j )} 1≤j≤n 1≤r≤m

⊂ T × E, zr

j := ③r(tr j ),

Y := {(˜ tj, yj)} ˜

n j=1 ⊂ T × E,

yj := ②(˜ tj). Hence, we approximate the MML using a Riemann sum which aggregates consistent estimators ˆ f m

˜ tj of the marginal densities f˜ tj:

EMMLm(T D, Y) := 1 tf

˜ n

j=1

ψ[ ˆ f m

˜ tj , yj]∆˜

tj.

slide-130
SLIDE 130

37/49

HOW CAN WE ESTIMATE MARGINAL DENSITIES?

slide-131
SLIDE 131

37/49

HOW CAN WE ESTIMATE MARGINAL DENSITIES?

In practice, the altitude plays the role of time, so we can’t assume the same sampling for each trajectory;

slide-132
SLIDE 132

37/49

HOW CAN WE ESTIMATE MARGINAL DENSITIES?

In practice, the altitude plays the role of time, so we can’t assume the same sampling for each trajectory; Assume sampling times {tr

j : j = 1, . . . , n; r = 1, . . . , m} to be

i.i.d. observations of a r.v. T, indep. Z;

slide-133
SLIDE 133

37/49

HOW CAN WE ESTIMATE MARGINAL DENSITIES?

In practice, the altitude plays the role of time, so we can’t assume the same sampling for each trajectory; Assume sampling times {tr

j : j = 1, . . . , n; r = 1, . . . , m} to be

i.i.d. observations of a r.v. T, indep. Z; Our problem can be seen as a conditional probability density learning problem with (X, Y ) = (T, ZT), where ft is the density of Zt = (ZT|T = t) = (Y |X).

slide-134
SLIDE 134

37/49

HOW CAN WE ESTIMATE MARGINAL DENSITIES?

In practice, the altitude plays the role of time, so we can’t assume the same sampling for each trajectory; Assume sampling times {tr

j : j = 1, . . . , n; r = 1, . . . , m} to be

i.i.d. observations of a r.v. T, indep. Z; Our problem can be seen as a conditional probability density learning problem with (X, Y ) = (T, ZT), where ft is the density of Zt = (ZT|T = t) = (Y |X).

1 We can apply SOA conditional density estimation techniques,

such as LS-CDE [Sugiyama et al., 2010],

slide-135
SLIDE 135

37/49

HOW CAN WE ESTIMATE MARGINAL DENSITIES?

In practice, the altitude plays the role of time, so we can’t assume the same sampling for each trajectory; Assume sampling times {tr

j : j = 1, . . . , n; r = 1, . . . , m} to be

i.i.d. observations of a r.v. T, indep. Z; Our problem can be seen as a conditional probability density learning problem with (X, Y ) = (T, ZT), where ft is the density of Zt = (ZT|T = t) = (Y |X).

1 We can apply SOA conditional density estimation techniques,

such as LS-CDE [Sugiyama et al., 2010],

2 We can use a fine partitioning of the time domain.

slide-136
SLIDE 136

38/49

PARTITION BASED MARGINAL DENSITY ESTIMATION

Idea: to average in time the marginal densities over small bins by applying classical multivariate density estimation techniques to each subset.

slide-137
SLIDE 137

39/49

CONSISTENCY

We denote by: Θ : S → L1(E, R+) multivariate density estimation statistic, S = {(zk)N

k=1 ∈ E N : N ∈ N∗} set of finite sequences,

slide-138
SLIDE 138

39/49

CONSISTENCY

We denote by: Θ : S → L1(E, R+) multivariate density estimation statistic, S = {(zk)N

k=1 ∈ E N : N ∈ N∗} set of finite sequences,

m the number of random curves; T m

t

subset of data points whose sampling times fall in the bin containing t;

slide-139
SLIDE 139

39/49

CONSISTENCY

We denote by: Θ : S → L1(E, R+) multivariate density estimation statistic, S = {(zk)N

k=1 ∈ E N : N ∈ N∗} set of finite sequences,

m the number of random curves; T m

t

subset of data points whose sampling times fall in the bin containing t; ˆ f m

t

:= Θ[T m

t ] estimator trained using T m t .

slide-140
SLIDE 140

40/49

CONSISTENCY

ASSUMPTION 1 - POSITIVE TIME DENSITY ν ∈ L∞(E, R+) density function of T, s.t. ν+ := ess sup

t∈T

ν(t) < ∞, ν− := ess inf

t∈T ν(t) > 0.

slide-141
SLIDE 141

40/49

CONSISTENCY

ASSUMPTION 1 - POSITIVE TIME DENSITY ν ∈ L∞(E, R+) density function of T, s.t. ν+ := ess sup

t∈T

ν(t) < ∞, ν− := ess inf

t∈T ν(t) > 0.

ASSUMPTION 2 - LIPSCHITZ IN TIME Function (t, z) ∈ T × E → ft(z) is continuous and |ft1(z) − ft2(z)| ≤ L|t1 − t2|, L > 0.

slide-142
SLIDE 142

40/49

CONSISTENCY

ASSUMPTION 1 - POSITIVE TIME DENSITY ν ∈ L∞(E, R+) density function of T, s.t. ν+ := ess sup

t∈T

ν(t) < ∞, ν− := ess inf

t∈T ν(t) > 0.

ASSUMPTION 2 - LIPSCHITZ IN TIME Function (t, z) ∈ T × E → ft(z) is continuous and |ft1(z) − ft2(z)| ≤ L|t1 − t2|, L > 0. ASSUMPTION 3 - SHRINKING BINS The homogeneous partition {Bm

ℓ }qm ℓ=1 of [0; tf ], with binsize bm, is

s.t. lim

m→∞ bm = 0,

lim

m→∞ mbm = ∞.

slide-143
SLIDE 143

41/49

CONSISTENCY

ASSUMPTION 4 - I.I.D. CONSISTENCY G arbitrary family of probability density functions on E, ρ ∈ G, SN

ρ i.i.d sample of size N drawn from ρ valued in S.

The estimator obtained by applying Θ to SN

ρ , denoted by

ˆ ρN := Θ[SN

ρ ] ∈ L1(E, R+),

is a (pointwise) consistent density estimator, uniformly in ρ: For all z ∈ E, ε > 0, α1 > 0, there is Nε,α1 > 0 such that, for any ρ ∈ G, N ≥ Nε,α1 ⇒ P

  • ˆ

ρN(z) − ρ(z)

  • < ε
  • > 1 − α1.
slide-144
SLIDE 144

42/49

CONSISTENCY

THEOREM 1 Under assumptions 1 to 4, for any z ∈ E and t ∈ T, ˆ f m

ℓm(t)(z)

consistently approximates the marginal density ft(z) as the number

  • f curves m grows:

∀ε > 0, lim

m→∞ P

| ˆ f m

t (z) − ft(z)| < ε

= 1.

slide-145
SLIDE 145

42/49

CONSISTENCY

THEOREM 1 Under assumptions 1 to 4, for any z ∈ E and t ∈ T, ˆ f m

ℓm(t)(z)

consistently approximates the marginal density ft(z) as the number

  • f curves m grows:

∀ε > 0, lim

m→∞ P

| ˆ f m

t (z) − ft(z)| < ε

= 1. Note that: m → ∞ = N → ∞,

slide-146
SLIDE 146

42/49

CONSISTENCY

THEOREM 1 Under assumptions 1 to 4, for any z ∈ E and t ∈ T, ˆ f m

ℓm(t)(z)

consistently approximates the marginal density ft(z) as the number

  • f curves m grows:

∀ε > 0, lim

m→∞ P

| ˆ f m

t (z) − ft(z)| < ε

= 1. Note that: m → ∞ = N → ∞, Number of samples = random,

slide-147
SLIDE 147

42/49

CONSISTENCY

THEOREM 1 Under assumptions 1 to 4, for any z ∈ E and t ∈ T, ˆ f m

ℓm(t)(z)

consistently approximates the marginal density ft(z) as the number

  • f curves m grows:

∀ε > 0, lim

m→∞ P

| ˆ f m

t (z) − ft(z)| < ε

= 1. Note that: m → ∞ = N → ∞, Number of samples = random, Training data not i.i.d.

slide-148
SLIDE 148

43/49

MARGINAL DENSITY ESTIMATION RESULTS

slide-149
SLIDE 149

43/49

MARGINAL DENSITY ESTIMATION RESULTS

slide-150
SLIDE 150

43/49

MARGINAL DENSITY ESTIMATION RESULTS

slide-151
SLIDE 151

44/49

HOW GOOD IS IT COMPARED TO OTHER METHODS?

slide-152
SLIDE 152

44/49

HOW GOOD IS IT COMPARED TO OTHER METHODS?

Training set of m = 424 flights ≃ 334 531 point observations,

slide-153
SLIDE 153

44/49

HOW GOOD IS IT COMPARED TO OTHER METHODS?

Training set of m = 424 flights ≃ 334 531 point observations, Test set of 150 flights

slide-154
SLIDE 154

44/49

HOW GOOD IS IT COMPARED TO OTHER METHODS?

Training set of m = 424 flights ≃ 334 531 point observations, Test set of 150 flights Discrimination power comparison with (gmm-)FPCA and (integrated) LS-CDE:

Var. Estimated Likelihoods Real Opt1 Opt2 MML 0.63 ± 0.07 0.43 ± 0.08 0.13 ± 0.02 FPCA 0.16 ± 0.12 6.4e-03 ± 3.8e-03 3.6e-03 ± 5.4e-03 LS-CDE 0.77 ± 0.05 0.68 ± 0.04 0.49 ± 0.06

slide-155
SLIDE 155

44/49

HOW GOOD IS IT COMPARED TO OTHER METHODS?

Training set of m = 424 flights ≃ 334 531 point observations, Test set of 150 flights Discrimination power comparison with (gmm-)FPCA and (integrated) LS-CDE:

Var. Estimated Likelihoods

  • Tr. Time

Real Opt1 Opt2 MML 0.63 ± 0.07 0.43 ± 0.08 0.13 ± 0.02 5s FPCA 0.16 ± 0.12 6.4e-03 ± 3.8e-03 3.6e-03 ± 5.4e-03 20s LS-CDE 0.77 ± 0.05 0.68 ± 0.04 0.49 ± 0.06 14h

slide-156
SLIDE 156

45/49

MML PENALTY

The MML can be used not only to assess the optimization solutions, but also to penalize the optimization itself: min

(①,✉)∈X×U

tf

C(✉(t), ①(t))dt − λ MML(Z, ①), s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (MML-AOCP)

slide-157
SLIDE 157

45/49

MML PENALTY

The MML can be used not only to assess the optimization solutions, but also to penalize the optimization itself: min

(①,✉)∈X×U

tf

C(✉(t), ①(t))dt − λ MML(Z, ①), s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (MML-AOCP)

slide-158
SLIDE 158

45/49

MML PENALTY

The MML can be used not only to assess the optimization solutions, but also to penalize the optimization itself: min

(①,✉)∈X×U

tf

C(✉(t), ①(t))dt − λ MML(Z, ①), s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (MML-AOCP) λ sets trade-off between a fuel minimization and a likelihood maximization,

slide-159
SLIDE 159

46/49

PENALTY EFFECT

slide-160
SLIDE 160

47/49

TRAJECTORY ACCEPTABILITY CONCLUSION

1 General probabilistic criterion using marginal densities to

quantify the closeness between a curve and a set of random trajectories,

slide-161
SLIDE 161

47/49

TRAJECTORY ACCEPTABILITY CONCLUSION

1 General probabilistic criterion using marginal densities to

quantify the closeness between a curve and a set of random trajectories,

2 Class of consistent plug-in estimators, based on “histogram”

  • f multivariate density estimators,
slide-162
SLIDE 162

47/49

TRAJECTORY ACCEPTABILITY CONCLUSION

1 General probabilistic criterion using marginal densities to

quantify the closeness between a curve and a set of random trajectories,

2 Class of consistent plug-in estimators, based on “histogram”

  • f multivariate density estimators,

3 Applicable to the case of aircraft climb trajectories,

slide-163
SLIDE 163

47/49

TRAJECTORY ACCEPTABILITY CONCLUSION

1 General probabilistic criterion using marginal densities to

quantify the closeness between a curve and a set of random trajectories,

2 Class of consistent plug-in estimators, based on “histogram”

  • f multivariate density estimators,

3 Applicable to the case of aircraft climb trajectories,

Competitive with other well-established SOA approaches,

slide-164
SLIDE 164

47/49

TRAJECTORY ACCEPTABILITY CONCLUSION

1 General probabilistic criterion using marginal densities to

quantify the closeness between a curve and a set of random trajectories,

2 Class of consistent plug-in estimators, based on “histogram”

  • f multivariate density estimators,

3 Applicable to the case of aircraft climb trajectories,

Competitive with other well-established SOA approaches,

4 Particular Adaptive Kernel and Gaussian mixture

implementation,

slide-165
SLIDE 165

47/49

TRAJECTORY ACCEPTABILITY CONCLUSION

1 General probabilistic criterion using marginal densities to

quantify the closeness between a curve and a set of random trajectories,

2 Class of consistent plug-in estimators, based on “histogram”

  • f multivariate density estimators,

3 Applicable to the case of aircraft climb trajectories,

Competitive with other well-established SOA approaches,

4 Particular Adaptive Kernel and Gaussian mixture

implementation,

Showed that it can be used in optimal control problems to

  • btain solutions close to optimal, and still realistic.
slide-166
SLIDE 166

48/49

THANK YOU FOR YOUR ATTENTION

slide-167
SLIDE 167

49/49

REFERENCES

Anderson, R. M. and May, R. M. (1992). Infectious Diseases of Humans: Dynamics and Control. Oxford university press. Bach, F. (2008). Bolasso: model consistent Lasso estimation through the bootstrap. In ICML, pages 33–40. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM

  • algorithm. JRSS-B, pages 1–38.

Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32:407–499. FitzHugh, R. (1961). Impulses and physiological states in theoretical models of nerve membrane. Biophysical journal, 1(6):445–466. Friedman, J., Hastie, T., and Tibshirani, R. (2010). A note on the Group Lasso and a Sparse group Lasso. arXiv:1001.0736. Jategaonkar, R. V. (2006). Flight Vehicle System Identification: A Time Domain Methdology. AIAA. Mattingly, J. D., Heiser, W. H., and Daley, D. H. (1992). Aircraft Engine Design. University Press. Nagumo, J., Arimoto, S., and Yoshizawa, S. (1962). An active pulse transmission line simulating nerve axon. Proceedings of the IRE, 50(10):2061–2070. Obozinski, G., Taskar, B., and Jordan, M. I. (2006). Multi-task feature selection. In ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning. Roux, E. (2005). Pour une approche analytique de la dynamique du vol. PhD thesis, Supaero. Sugiyama, M., Takeuchi, I., Suzuki, T., Kanamori, T., Hachiya, H., and Okanohara, D. (2010). Conditional density estimation via least-squares density ratio estimation. In AISTAT, pages 781–788. Tibshirani, R. (1994). Regression shrinkage and selection via the Lasso. JRSS-B, 58:267–288. Tikhonov, A. N. (1943). On the stability of inverse problems. In Doklady Akademii Nauk SSSR, volume 39, pages 195–198. Yuan, M. and Lin, Y. (2005). Model selection and estimation in regression with grouped variables. JRSS-B, 68:49–67.

slide-168
SLIDE 168

50/49

ACCURACY OF DYNAMICS PREDICTIONS

FIGURE: Leave-one-out off-sample errors distributions for nonlinear least-squares NLLS and block-sparse bolasso BSBL. Median errors are annotated and marked by dashed vertical lines.

slide-169
SLIDE 169

51/49

STRUCTURED FEATURE SELECTION

STATE-OF-THE-ART

Other methods Difference with Block-sparse Lasso Group Lasso Groups sparsity is fixed a priori, [Yuan and Lin, 2005] Sparse Group Lasso Sparsity induced only within group, [Friedman et al., 2010] Multi-task Lasso Not same pattern for every task. [Obozinski et al., 2006]

slide-170
SLIDE 170

52/49

THEOREM (BOLASSO CONSISTENCY - BACH [2008]) For λ = λ0N− 1

2 and λ0 > 0, assume that

(H1) the cumulant generating functions E

  • exp(sX2

2)

  • and E
  • exp(sY 2

2)

  • are finite for some s > 0.

(H2) the joint matrix of second order moments Q = E

  • XX ⊤ ∈ Rp×p is invertible.

(H3) E [Y |X] = X · θ and Var [Y |X] = σ2 a.s. for some θ ∈ Rp and σ ∈ R∗

+.

Then, for any b > 0, the probability that algorithm 1 does not exactly select the correct model has the following upper bound: P [J = J∗] ≤ bA1e−A2N + A3 log N N1/2 + A4 log b b , where A1, A2, A3, A4 > 0.

slide-171
SLIDE 171

53/49

GENERALIZED TIKHONOV REGULARIZATION OF ISP

Equivalent to Γ(θ − ˜ θ)2

2 with Γi = ( 0, . . . , 0,

  • dT +dD+dL

X ⊤

Isp) and

Γi ˜ θ = ˜ Isp,i.

slide-172
SLIDE 172

54/49

MML CONSISTENCY FOR STANDARD KERNEL

ESTIMATOR

ASSUMPTION 5 The function (t, z) ∈ T × E → ft(z) is C4(E) in z and C1(T) in t ; the Lipschitz constant of the function t → d2ft dz2 (z) := f ′′

t (z)

is denoted by L′′ > 0: for any z ∈ E and t1, t2 ∈ T, |f ′′

t1 (z) − f ′′ t2 (z)| ≤ L′′|t1 − t2|.

slide-173
SLIDE 173

55/49

MML CONSISTENCY FOR STANDARD KERNEL

ESTIMATOR

σ2

Kσ =

  • w2Kσ(w)dw = σ2
  • w2K(w)dw = σ2σ2

K,

σ2

K 2

σ =

  • w2Kσ(w)2dw = σ
  • w2K(w)2dw = σσ2

K 2,

R(Kσ) =

  • Kσ(w)2dw = 1

σ

  • K(w)2dw = 1

σR(K). THEOREM 2 Under assumptions 1, 3 and 5, if ˆ f m

ℓm(t) is a KDE where the kernel

K and the bandwidth σ := σm are deterministic, such that σK < ∞, σK 2 < ∞, R(K) < ∞ and if lim

m→∞ σm = 0,

lim

m→∞ mbmσm = +∞,

then lim

m→∞ E

  • ( ˆ

f m

ℓm(t)(z) − ft(z))2

= 0.

slide-174
SLIDE 174

56/49

THEOREM 1 PROOF SKETCH

lim

m→∞ |ft(z) − f m ℓm(t)(z)| = 0.

lim

m→∞ P(Nm r,ℓm(t) ≤ 1) = 1,

r = 1, . . . , m, ∀M > 0, lim

m→∞ P

  • Nm

ℓm(t) > M

  • = 1.

CM := {Nm

ℓm(t) > M} m

  • r=1

{Nm

r,ℓm(t) ≤ 1}.

∀M > 0, lim

m→∞ P (CM) = 1.

∀ε > 0, lim

m→∞ P

  • | ˆ

f m

ℓm(t)(z) − f m ℓm(t)(z)| < ε

  • = 1.
slide-175
SLIDE 175

57/49

FLIGHT MECHANICS MODELS

ρ =

P RsSAT

SAT(h) = T0 + αTh, SAT(TAT, M) =

TAT 1+ λ−1

2 M2

M =

V Vsound = V (λRsSAT)

1 2

slide-176
SLIDE 176

58/49

CONSUMPTION X ACCEPTABILITY TRADE-OFF

FIGURE: Average over 20 flights of the fuel consumption and MML score (called acceptability here) of optimized trajectories with varying MML-penalty weight λ.

slide-177
SLIDE 177

59/49

GAUSSIAN MIXTURE MODEL FOR MARGINAL

DENSITIES

ft(z) =

K

k=1

wt,kφ(z, µt,k, Σt,k),

K

k=1

wt,k = 1, wt,k ≥ 0, φ(z, µ, Σ) := 1

  • (2π)d det Σ

e− 1

2 (z−µ)⊤Σ−1(z−µ).

Assuming that the number of components is known, the weights wt,k, means µt,k and covariance matrices Σt,k need to be estimated.

slide-178
SLIDE 178

60/49

MAXIMUM LIKELIHOOD PARAMETERS ESTIMATION

For K = 1, maximum likelihood estimates have closed form: L(µt,1, Σt,1|z1, . . . , zN) =

N

i=1

1

  • (2π)d det Σt,1

e− 1

2 (z−µt,1)⊤Σ−1 t,1(z−µt,1)

ˆ θ := ( ˆ µt,1, ˆ Σt,1) = arg min

(µt,1,Σt,1) N

i=1

  • log det Σt,1 + (zi − µt,1)⊤Σ−1

t,1(zi − µt,1)

ˆ µt,1 = 1 N

N

i=1

zi, ˆ Σt,1 = 1 N

N

i=1

(zi − ˆ µt,1)(zi − ˆ µt,1)⊤.

slide-179
SLIDE 179

61/49

EM ALGORITHM

Hidden random variable J valued on {1, . . . , K}, If ith observation Ji = k, then zi was drawn from the kth component, Group observations by component and compute ( ˆ µt,k, ˆ Σt,k) with K = 1 maximum likelihood formulas. EXPECTATION-MAXIMIZATION - [DEMPSTER ET AL., 1977] Initialization: ˆ θ = ( ˆ wt,k, ˆ µt,k, ˆ Σt,k)K

k=1 = (w0 t,k, µ0 t,k, Σ0 t,k)K k=1,

Expectation: For k = 1, . . . , K and i = 1, . . . , N, ˆ wt,k = 1 N

N

i=1

ˆ πk,i, ˆ πk,i := P(Ji = k| ˆ θt, Zh) = ˆ µt,kφ(zi, ˆ µt,k, ˆ Σt,k) ∑N

j=1 ˆ

wt,kφ(zj, ˆ µt,k, ˆ Σt,k) . Maximization: ˆ µt,k = ∑N

i=1 ˆ

πk,izi ∑N

i=1 ˆ

πk,i , ˆ Σt,k = ∑N

i=1 ˆ

πk,i(zi − ˆ µt,k)(zi − ˆ µt,k)⊤ ∑N

i=1 ˆ

πk,i .