1/49
E XPLORATION DE DONN EES POUR L OPTIMISATION DE TRAJECTOIRES A - - PowerPoint PPT Presentation
E XPLORATION DE DONN EES POUR L OPTIMISATION DE TRAJECTOIRES A - - PowerPoint PPT Presentation
E XPLORATION DE DONN EES POUR L OPTIMISATION DE TRAJECTOIRES A ERIENNES C edric Rommel Directeurs de th` ese: Fr ed eric Bonnans, Pierre Martinon Encadrant Safety Line: Baptiste Gregorutti Soutenance de th` ese, 26 octobre
2/49
CONTEXT
3/49
MOTIVATION
FIGURE: World air traffic - source: www.flightradar24.com
3/49
MOTIVATION
FIGURE: World air traffic - source: www.flightradar24.com
20 000 airplanes — 80 000 flights per day,
3/49
MOTIVATION
FIGURE: World air traffic - source: www.flightradar24.com
20 000 airplanes — 80 000 flights per day, Should double until 2033,
4/49
MOTIVATION
Most polluting means of transportation,
4/49
MOTIVATION
Most polluting means of transportation, Responsible for 3% of CO2 emissions,
4/49
MOTIVATION
Most polluting means of transportation, Responsible for 3% of CO2 emissions,
4/49
MOTIVATION
Most polluting means of transportation, Responsible for 3% of CO2 emissions, Fuel ≃ 30% of an airline operational cost,
5/49
MOTIVATION
How to tackle this problem ?
5/49
MOTIVATION
How to tackle this problem ?
1 New hardware ?
5/49
MOTIVATION
How to tackle this problem ?
1 New hardware ? 2 Better use of existing fleet,
5/49
MOTIVATION
How to tackle this problem ?
1 New hardware ? 2 Better use of existing fleet,
Climb is the most consuming flight phase...
5/49
MOTIVATION
How to tackle this problem ?
1 New hardware ? 2 Better use of existing fleet,
Climb is the most consuming flight phase... Mostly rectilinear trajectories at full thrust,
5/49
MOTIVATION
How to tackle this problem ?
1 New hardware ? 2 Better use of existing fleet,
Climb is the most consuming flight phase... Mostly rectilinear trajectories at full thrust, Thousands of variables recorded every second,
5/49
MOTIVATION
How to tackle this problem ?
1 New hardware ? 2 Better use of existing fleet,
Climb is the most consuming flight phase... Mostly rectilinear trajectories at full thrust, Thousands of variables recorded every second,
6/49
OPTICLIMB
6/49
OPTICLIMB
6/49
OPTICLIMB
6/49
OPTICLIMB
6/49
OPTICLIMB
6/49
OPTICLIMB
6/49
OPTICLIMB
6/49
OPTICLIMB
7/49
TRAJECTORY OPTIMIZATION
Dynamics: ˙ x(t) = g(✉(t), ①(t)) + ε(t) ✉ ① ① ① ✉ ① ✉ ①
7/49
TRAJECTORY OPTIMIZATION
Dynamics: ˙ x(t) = g(✉(t), ①(t)) + ε(t) Optimization objective: tf
0 C(✉(t), ①(t))dt
① ① ✉ ① ✉ ①
7/49
TRAJECTORY OPTIMIZATION
Dynamics: ˙ x(t) = g(✉(t), ①(t)) + ε(t) Optimization objective: tf
0 C(✉(t), ①(t))dt ⇐
= , ① ① ✉ ① ✉ ①
7/49
TRAJECTORY OPTIMIZATION
Dynamics: ˙ x(t) = g(✉(t), ①(t)) + ε(t) Optimization objective: tf
0 C(✉(t), ①(t))dt ⇐
= , , ① ① ✉ ① ✉ ①
7/49
TRAJECTORY OPTIMIZATION
Dynamics: ˙ x(t) = g(✉(t), ①(t)) + ε(t) Optimization objective: tf
0 C(✉(t), ①(t))dt ⇐
= , , Flight constraints: Φ(①(0), ①(tf )) ∈ KΦ Initial and final conditions ✉(t) ∈ Uad, ①(t) ∈ Xad, Flight domain c(✉(t), ①(t)) ≤ 0, Operational path constraints
8/49
TRAJECTORY OPTIMIZATION
Optimal Control Problem
min
(①,✉)∈X×U
tf
C(✉(t), ①(t))dt, s.t. ˙ ①(t) = g(✉(t), ①(t)) + ε(t), a.e. t ∈ [0, tf ], Φ(①(0), ①(tf )) ∈ KΦ, ✉(t) ∈ Uad, ①(t) ∈ Xad, a.e. t ∈ [0, tf ], c(✉(t), ①(t)) ≤ 0, a.e. t ∈ [0, tf ]. (OCP)
8/49
TRAJECTORY OPTIMIZATION
Optimal Control Problem
min
(①,✉)∈X×U
tf
C(✉(t), ①(t))dt, s.t. ˙ ①(t) = g(✉(t), ①(t)) + ε(t), a.e. t ∈ [0, tf ], Φ(①(0), ①(tf )) ∈ KΦ, ✉(t) ∈ Uad, ①(t) ∈ Xad, a.e. t ∈ [0, tf ], c(✉(t), ①(t)) ≤ 0, a.e. t ∈ [0, tf ]. (OCP)
8/49
TRAJECTORY OPTIMIZATION
Optimal Control Problem
min
(①,✉)∈X×U
tf
C(✉(t), ①(t))dt, s.t. ˙ ①(t) = g(✉(t), ①(t)) + ε(t), a.e. t ∈ [0, tf ], Φ(①(0), ①(tf )) ∈ KΦ, ✉(t) ∈ Uad, ①(t) ∈ Xad, a.e. t ∈ [0, tf ], c(✉(t), ①(t)) ≤ 0, a.e. t ∈ [0, tf ]. (OCP)
System Identification Black box QAR data
8/49
TRAJECTORY OPTIMIZATION
Approximate Optimal Control Problem
min
(①,✉)∈X×U
tf
C(✉(t), ①(t))dt, s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), +ε(t) a.e. t ∈ [0, tf ], Φ(①(0), ①(tf )) ∈ KΦ, ✉(t) ∈ Uad, ①(t) ∈ Xad, a.e. t ∈ [0, tf ], c(✉(t), ①(t)) ≤ 0, a.e. t ∈ [0, tf ]. (A OCP)
System Identification Black box QAR data
9/49
SYSTEM IDENTIFICATION
10/49
1 Context - Chapter 1 2 System Identification - Chapter 4 3 Trajectory Acceptability - Chapters 5 and 6
11/49
SYSTEM IDENTIFICATION
12/49
FLIGHT DYNAMICS
12/49
FLIGHT DYNAMICS
˙ h = V sin γ + ˙ Wz ˙ V = T cos α − D − mg sin γ − m ˙ Wxv m ˙ γ = (T sin α + L) cos µ − mg cos γ − m ˙ Wzv mV ˙ m = − T Isp
12/49
FLIGHT DYNAMICS
˙ h = V sin γ + ˙ Wz ˙ V = T cos α − D − mg sin γ − m ˙ Wxv m ˙ γ = (T sin α + L) cos µ − mg cos γ − m ˙ Wzv mV ˙ m = − T Isp
12/49
FLIGHT DYNAMICS
˙ h = V sin γ + ˙ Wz ˙ V = T cos α − D − mg sin γ − m ˙ Wxv m ˙ γ = (T sin α + L) cos µ − mg cos γ − m ˙ Wzv mV ˙ m = − T Isp
12/49
FLIGHT DYNAMICS
States: ① = (h, V , γ, m)
˙ h = V sin γ + ˙ Wz ˙ V = T cos α − D − mg sin γ − m ˙ Wxv m ˙ γ = (T sin α + L) cos µ − mg cos γ − m ˙ Wzv mV ˙ m = − T Isp
12/49
FLIGHT DYNAMICS
States: ① = (h, V , γ, m) Controls: ✉ = (α, N1)
˙ h = V sin γ + ˙ Wz ˙ V = T cos α − D − mg sin γ − m ˙ Wxv m ˙ γ = (T sin α + L) cos µ − mg cos γ − m ˙ Wzv mV ˙ m = − T Isp
12/49
FLIGHT DYNAMICS
States: ① = (h, V , γ, m) Controls: ✉ = (α, N1) Unknown functions of ①, ✉
˙ h = V sin γ + ˙ Wz ˙ V = T cos α − D − mg sin γ − m ˙ Wxv m ˙ γ = (T sin α + L) cos µ − mg cos γ − m ˙ Wzv mV ˙ m = − T Isp
12/49
FLIGHT DYNAMICS
States: ① = (h, V , γ, m) Controls: ✉ = (α, N1) Unknown functions of ①, ✉
˙ h = V sin γ + ˙ Wz ˙ V = T(✉, ①) cos α − D(✉, ①) − mg sin γ − m ˙ Wxv m ˙ γ = (T(✉, ①) sin α + L(✉, ①)) cos µ − mg cos γ − m ˙ Wzv mV ˙ m = − T(✉, ①) Isp(✉, ①)
13/49
PHYSICAL MODELS OF NESTED FUNCTIONS
T function of (N1, M, ρ) = ϕT(①, ✉) D function of (q, M, α) = ϕD(①, ✉) L function of (q, M, α) = ϕL(①, ✉) Isp function of (SAT, M, h) = ϕIsp(①, ✉)
13/49
PHYSICAL MODELS OF NESTED FUNCTIONS
T function of (N1, M, ρ) = ϕT(①, ✉) D function of (q, M, α) = ϕD(①, ✉) L function of (q, M, α) = ϕL(①, ✉) Isp function of (SAT, M, h) = ϕIsp(①, ✉)
13/49
PHYSICAL MODELS OF NESTED FUNCTIONS
T(①, ✉, θT) = N1 × PT(ρ, M) = XT · θT D(①, ✉, θD) = q × PD(α, M) = XD · θD L(①, ✉, θL) = q × PL(α, M) = XL · θL Isp(①, ✉, θIsp) = SAT × PIsp(h, M) = XIsp · θIsp
13/49
PHYSICAL MODELS OF NESTED FUNCTIONS
T(①, ✉, θT) = N1 × PT(ρ, M) = XT · θT D(①, ✉, θD) = q × PD(α, M) = XD · θD L(①, ✉, θL) = q × PL(α, M) = XL · θL Isp(①, ✉, θIsp) = SAT × PIsp(h, M) = XIsp · θIsp XT = N1 1 ρ M ρ2 ρM M2 . . . , XD = XL = q 1 α M α2 αM M2 . . . , XIsp = SAT 1 h M h2 hM M2 . . . .
14/49
STATE-OF-THE-ART - [JATEGAONKAR, 2006]
Output-Error Method Filter-Error Method
14/49
STATE-OF-THE-ART - [JATEGAONKAR, 2006]
Output-Error Method Filter-Error Method
- Less scalable to many trajectories
14/49
STATE-OF-THE-ART - [JATEGAONKAR, 2006]
Output-Error Method Filter-Error Method
- Less scalable to many trajectories
Equation-Error Method ˙ ①(t) = g(✉(t), ①(t), θ) + ε(t), t ∈ [0, tf ]
14/49
STATE-OF-THE-ART - [JATEGAONKAR, 2006]
Output-Error Method Filter-Error Method
- Less scalable to many trajectories
Equation-Error Method ˙ ①i = g(✉i, ①i, θ) + εi, i = 1, . . . , N
14/49
STATE-OF-THE-ART - [JATEGAONKAR, 2006]
Output-Error Method Filter-Error Method
- Less scalable to many trajectories
Equation-Error Method min
θ N
∑
i=1
ℓ
- ˙
①i, g(✉i, ①i, θ)
14/49
STATE-OF-THE-ART - [JATEGAONKAR, 2006]
Output-Error Method Filter-Error Method
- Less scalable to many trajectories
Equation-Error Method Ex: (Nonlinear) Least-Squares min
θ N
∑
i=1
- ˙
①i − g(✉i, ①i, θ)
- 2
2
14/49
STATE-OF-THE-ART - [JATEGAONKAR, 2006]
Output-Error Method Filter-Error Method
- Less scalable to many trajectories
Equation-Error Method Ex: (Nonlinear) Least-Squares min
θ N
∑
i=1
- Y (✉i, ①i, ˙
①i) − G(✉i, ①i, ˙ ①i, θ)
- 2
2
15/49
LEVERAGING THE DYNAMICS STRUCTURE
˙ h = V sin γ ˙ V = T(✉, ①, θT) cos α − D(✉, ①, θD) − mg sin γ m ˙ γ = T(✉, ①, θT) sin α + L(✉, ①, θL) − mg cos γ mV ˙ m = − T(✉, ①, θT) Isp(✉, ①, θIsp)
15/49
LEVERAGING THE DYNAMICS STRUCTURE
˙ h = V sin γ ˙ V = T(✉, ①, θT) cos α − D(✉, ①, θD) − mg sin γ m ˙ γ = T(✉, ①, θT) sin α + L(✉, ①, θL) − mg cos γ mV ˙ m = − T(✉, ①, θT) Isp(✉, ①, θIsp) Nonlinear in states and controls
15/49
LEVERAGING THE DYNAMICS STRUCTURE
˙ h = V sin γ ˙ V = T(✉, ①, θT) cos α − D(✉, ①, θD) − mg sin γ m ˙ γ = T(✉, ①, θT) sin α + L(✉, ①, θL) − mg cos γ mV ˙ m = − T(✉, ①, θT) Isp(✉, ①, θIsp) Nonlinear in states and controls Nonlinear in parameters
15/49
LEVERAGING THE DYNAMICS STRUCTURE
˙ h = V sin γ m ˙ V + mg sin γ = T(✉, ①, θT) cos α − D(✉, ①, θD) mV ˙ γ + mg cos γ = T(✉, ①, θT) sin α + L(✉, ①, θL) 0 = T(✉, ①, θT) + ˙ mIsp(✉, ①, θIsp) Nonlinear in states and controls Nonlinear in parameters
15/49
LEVERAGING THE DYNAMICS STRUCTURE
˙ h = V sin γ m ˙ V + mg sin γ = (XT · θT) cos α − XD · θD + ε1 mV ˙ γ + mg cos γ = (XT · θT) sin α + XL · θL + ε2 0 = XT · θT + ˙ m(XIsp · θIsp) + ε3 mV ˙ γ + mg cos γ
- Y (✉,①, ˙
①)
= (XT · θT) sin α + XL · θL
- G(✉,①, ˙
①,θ)
+ε2 Nonlinear in states and controls Nonlinear in parameters → Linear in parameters
15/49
LEVERAGING THE DYNAMICS STRUCTURE
˙ h = V sin γ m ˙ V + mg sin γ = (XT · θT) cos α − XD · θD + ε1 mV ˙ γ + mg cos γ = (XT · θT) sin α + XL · θL + ε2 0 = XT · θT + ˙ m(XIsp · θIsp) + ε3 mV ˙ γ + mg cos γ
- Y (✉,①, ˙
①)
= (XT · θT) sin α + XL · θL
- G(✉,①, ˙
①,θ)
+ε2 Nonlinear in states and controls Nonlinear in parameters → Linear in parameters
15/49
LEVERAGING THE DYNAMICS STRUCTURE
˙ h = V sin γ Y1 = XT1 · θT − XD · θD + ε1 Y2 = XT2 · θT + XL · θL + ε2 Y3 = XT · θT + XIspm · θIsp + ε3 Nonlinear in states and controls Nonlinear in parameters → Linear in parameters
15/49
LEVERAGING THE DYNAMICS STRUCTURE
˙ h = V sin γ Y1 = XT1 · θT − XD · θD + ε1 Y2 = XT2 · θT + XL · θL + ε2 Y3 = XT · θT + XIspm · θIsp + ε3 Nonlinear in states and controls Nonlinear in parameters → Linear in parameters Structured Coupling
15/49
LEVERAGING THE DYNAMICS STRUCTURE
˙ h = V sin γ Y1 = XT1 · θT − XD · θD + ε1 Y2 = XT2 · θT + XL · θL + ε2 Y3 = XT · θT + XIspm · θIsp + ε3 Nonlinear in states and controls Nonlinear in parameters → Linear in parameters Structured Coupling Multi-task Learning
16/49
MULTI-TASK REGRESSION
Aircraft:
Y1 = XT1 · θT − XD · θD + ε1 Y2 = XT2 · θT + XL · θL + ε2 Y3 = XT · θT + XIspm · θIsp + ε3
General:
Y1 = Xc,1 · θc + X1 · θ1 + ε1 Y2 = Xc,2 · θc + X2 · θ2 + ε2 . . . . . . YK = Xc,K · θc + XK · θK + εK
Coupling parameters , Task specific parameters
Many other examples: Giant squid neurons [FitzHugh, 1961, Nagumo et al., 1962], Susceptible-infectious-recovered models [Anderson and May, 1992], Mechanical systems,...
16/49
MULTI-TASK REGRESSION
Aircraft:
Y1 = XT1 · θT − XD · θD + ε1 Y2 = XT2 · θT + XL · θL + ε2 Y3 = XT · θT + XIspm · θIsp + ε3
General:
Y1 = Xc,1 · θc + X1 · θ1 + ε1 Y2 = Xc,2 · θc + X2 · θ2 + ε2 . . . . . . YK = Xc,K · θc + XK · θK + εK
Coupling parameters , Task specific parameters
Multi-task Linear Least-Squares: min
θ K
∑
k=1 N
∑
i=1
(Yk,i − Xc,k,i · θc − Xk,i · θk) 2
16/49
MULTI-TASK REGRESSION
Aircraft:
Y1 = XT1 · θT − XD · θD + ε1 Y2 = XT2 · θT + XL · θL + ε2 Y3 = XT · θT + XIspm · θIsp + ε3
General:
Y1 = Xc,1 · θc + X1 · θ1 + ε1 Y2 = Xc,2 · θc + X2 · θ2 + ε2 . . . . . . YK = Xc,K · θc + XK · θK + εK
Coupling parameters , Task specific parameters
Multi-task Linear Least-Squares: Block-sparse Coupling Structure min
θ N
∑
i=1
-
Y1,i . . . YK,i − X ⊤
c,1,i
X ⊤
1,i
. . . X ⊤
c,2,i
X ⊤
2,i
. . . . . . ... X ⊤
c,K,i
. . . X ⊤
K,i
θc θ1 . . . θK
- 2
2
16/49
MULTI-TASK REGRESSION
Aircraft:
Y1 = XT1 · θT − XD · θD + ε1 Y2 = XT2 · θT + XL · θL + ε2 Y3 = XT · θT + XIspm · θIsp + ε3
General:
Y1 = Xc,1 · θc + X1 · θ1 + ε1 Y2 = Xc,2 · θc + X2 · θ2 + ε2 . . . . . . YK = Xc,K · θc + XK · θK + εK
Coupling parameters , Task specific parameters
Multi-task Linear Least-Squares: min
θ N
∑
i=1
Yi − Xiθ2
2
with θ = (θc, θ1, . . . , θK) ∈ Rp, p = dc + ∑K
k=1 dk,
Yi ∈ RK and Xi ∈ RK×p.
17/49
FEATURE SELECTION
Our model: T =N1(θT,1 + θT,2ρ + θT,3M + θT,4ρ2 + θT,5ρM + θT,6M2+ θT,7ρ3 + θT,8ρ2M + θT,9ρM2 + θT,10M3 + θT,11ρ4+ θT,12ρ3M + θT,13ρ2M2 + θT,14ρM3 + θT,15M4). Mattingly’s model [Mattingly et al., 1992]: T = N1(θT,1ρ + θT,2ρM3).
17/49
FEATURE SELECTION
Our model: T =N1(θT,1 + θT,2ρ + θT,3M + θT,4ρ2 + θT,5ρM + θT,6M2+ θT,7ρ3 + θT,8ρ2M + θT,9ρM2 + θT,10M3 + θT,11ρ4+ θT,12ρ3M + θT,13ρ2M2 + θT,14ρM3 + θT,15M4). Mattingly’s model [Mattingly et al., 1992]: T = N1(θT,1ρ + θT,2ρM3). ⇒ High risk of overfitting
17/49
FEATURE SELECTION
Our (sparse) model: T =N1(θT,1 + θT,2ρ + θT,3M + θT,4ρ2 + θT,5ρM + θT,6M2+ θT,7ρ3 + θT,8ρ2M + θT,9ρM2 + θT,10M3 + θT,11ρ4+ θT,12ρ3M + θT,13ρ2M2 + θT,14ρM3 + θT,15M4). Mattingly’s model [Mattingly et al., 1992]: T = N1(θT,1ρ + θT,2ρM3). ⇒ High risk of overfitting
17/49
FEATURE SELECTION
Our (sparse) model: T =N1(θT,1 + θT,2ρ + θT,3M + θT,4ρ2 + θT,5ρM + θT,6M2+ θT,7ρ3 + θT,8ρ2M + θT,9ρM2 + θT,10M3 + θT,11ρ4+ θT,12ρ3M + θT,13ρ2M2 + θT,14ρM3 + θT,15M4). Mattingly’s model [Mattingly et al., 1992]: T = N1(θT,1ρ + θT,2ρM3). Sparse models are: Less susceptible to overfitting, More compliant with physical models, More interpretable, Lighter/Faster.
18/49
BLOCK-SPARSE LASSO
Lasso [Tibshirani, 1994]:
{(Xi, Yi)}N
i=1 ⊂ Rd+1 i.i.d sample,
min
θ N
∑
i=1
(Yi − Xi · θ)2 + λθ1. FIGURE: 1Sparsity induced by L1 norm in Lasso.
Source : Wikipedia, Lasso(statistics)
18/49
BLOCK-SPARSE LASSO
Block-sparse structure preserved min
θ K
∑
k=1 N
∑
i=1
(Yk,i − Xc,k,i · θc − Xk,i · θk) 2 + λcθc1 +
K
∑
k=1
λkθk1
18/49
BLOCK-SPARSE LASSO
Block-sparse structure preserved Equivalent to Lasso problem min
θ K
∑
k=1 N
∑
i=1
(Yk,i − Xc,k,i · θc − Xk,i · θk) 2 + λcθc1 +
K
∑
k=1
λkθk1
18/49
BLOCK-SPARSE LASSO
Block-sparse structure preserved Equivalent to Lasso problem min
β N
∑
i=1
Yi − Bi β2
2 + λcβ1
with β = (θc, λ1
λc θ1, . . . , λK λc θK) ∈ Rp, p = dc + ∑K k=1 dk,
Yi ∈ RK and Bi ∈ RK×p.
18/49
BLOCK-SPARSE LASSO
Block-sparse structure preserved Equivalent to Lasso problem min
θ N
∑
i=1
Yi − Xiθ2
2 + λcθ1
with θ = (θc, λ1
λc θ1, . . . , λK λc θK) ∈ Rp, p = dc + ∑K k=1 dk,
Yi ∈ RK and Xi ∈ RK×p, In practice, we choose λk = λc, for all k = 1, . . . , 3 and Xi = X ⊤
T1,i
−X ⊤
D,i
X ⊤
T2,i
X ⊤
L,i
X ⊤
T,i
X ⊤
Ispm,i
, Yi = Y1,i Y2,i Y3,i
19/49
BOOTSTRAP IMPLEMENTATION
High correlations between features...
19/49
BOOTSTRAP IMPLEMENTATION
High correlations between features... ⇒ Inconsistent selections via the lasso !
19/49
BOOTSTRAP IMPLEMENTATION
High correlations between features... ⇒ Inconsistent selections via the lasso !
Bolasso - Bach [2008] Require: training data T = {(Xi, Yi)}N
i=1 ⊂ RK×(K+1) × RK,
number of bootstrap replicates b, L1 penalty parameter λc,
1: for k = 1 to b do 2:
Generate bootstrap sample Tk,
3:
Compute Block sparse Lasso estimate ˆ θk from Tk,
4:
Compute support Jk = {j, ˆ θk
j = 0},
5: end for 6: Compute intersection J = b
k=1 Jk,
7: Compute ˆ
θJ from selected features using Least-Squares.
19/49
BOOTSTRAP IMPLEMENTATION
High correlations between features... ⇒ Inconsistent selections via the lasso !
Bolasso - Bach [2008] Require: training data T = {(Xi, Yi)}N
i=1 ⊂ RK×(K+1) × RK,
number of bootstrap replicates b, L1 penalty parameter λc,
1: for k = 1 to b do 2:
Generate bootstrap sample Tk,
3:
Compute Block sparse Lasso estimate ˆ θk from Tk,
4:
Compute support Jk = {j, ˆ θk
j = 0},
5: end for 6: Compute intersection J = b
k=1 Jk,
7: Compute ˆ
θJ from selected features using Least-Squares.
Consistency even under high correlations proved in Bach [2008],
19/49
BOOTSTRAP IMPLEMENTATION
High correlations between features... ⇒ Inconsistent selections via the lasso !
Bolasso - Bach [2008] Require: training data T = {(Xi, Yi)}N
i=1 ⊂ RK×(K+1) × RK,
number of bootstrap replicates b, L1 penalty parameter λc,
1: for k = 1 to b do 2:
Generate bootstrap sample Tk,
3:
Compute Block sparse Lasso estimate ˆ θk from Tk,
4:
Compute support Jk = {j, ˆ θk
j = 0},
5: end for 6: Compute intersection J = b
k=1 Jk,
7: Compute ˆ
θJ from selected features using Least-Squares.
Consistency even under high correlations proved in Bach [2008], Efficient implementations exist: LARS [Efron et al., 2004].
20/49
PROBLEM WITH INTRA-GROUP CORRELATIONS
min
θ N
∑
i=1
Yi − Xiθ2
2 + λcθ1 ⇒ ˆ
θ❚ = ˆ θ■s♣ = 0!
20/49
PROBLEM WITH INTRA-GROUP CORRELATIONS
min
θ N
∑
i=1
Yi − Xiθ2
2 + λcθ1 ⇒ ˆ
θ❚ = ˆ θ■s♣ = 0! Y1 = XT1 · θT − XD · θD +ε1 Y2 = XT2 · θT + XL · θL +ε2 0 = XT · θT + XIspm · θIsp +ε3
21/49
PROBLEM WITH INTRA-GROUP CORRELATIONS
FIGURE: Features correlations higher than 0.9 in absolute value in white.
21/49
PROBLEM WITH INTRA-GROUP CORRELATIONS
FIGURE: Features correlations higher than 0.9 in absolute value in white.
⇒ θ → ∑N
i=1 Yi − Xiθ2 2 not
injective... Ill-posed problem !
22/49
PROBLEM WITH INTRA-GROUP CORRELATIONS
Y1 = XT1 · θT − XD · θD +ε1 Y2 = XT2 · θT + XL · θL +ε2 = XT · θT + XIspm · θIsp +ε3 λt ˜ Isp = λtXIsp · θIsp +ε4 min
θ N
∑
i=1
Yi − Xiθ2
2 + λcθ1
Prior model ˜ Isp from Roux [2005] ˜ Isp,i = ˜ Isp(✉i, ①i), i = 1, . . . , N.
22/49
PROBLEM WITH INTRA-GROUP CORRELATIONS
Y1 = XT1 · θT − XD · θD +ε1 Y2 = XT2 · θT + XL · θL +ε2 = XT · θT + XIspm · θIsp +ε3 λt ˜ Isp = λtXIsp · θIsp +ε4 min
θ N
∑
i=1
- Yi − Xiθ2
2 + λt ˜
Isp,i − XIsp,i · θIsp2
2
- + λcθ1
Prior model ˜ Isp from Roux [2005] ˜ Isp,i = ˜ Isp(✉i, ①i), i = 1, . . . , N.
22/49
PROBLEM WITH INTRA-GROUP CORRELATIONS
Y1 = XT1 · θT − XD · θD +ε1 Y2 = XT2 · θT + XL · θL +ε2 = XT · θT + XIspm · θIsp +ε3 √ λt ˜ Isp = √ λtXIsp · θIsp +ε4 min
θ N
∑
i=1
- Yi − Xiθ2
2 + λt ˜
Isp,i − XIsp,i · θIsp2
2
- + λcθ1
Prior model ˜ Isp from Roux [2005] ˜ Isp,i = ˜ Isp(✉i, ①i), i = 1, . . . , N.
22/49
PROBLEM WITH INTRA-GROUP CORRELATIONS
Y1 = XT1 · θT − XD · θD +ε1 Y2 = XT2 · θT + XL · θL +ε2 = XT · θT + XIspm · θIsp +ε3 √ λt ˜ Isp = √ λtXIsp · θIsp +ε4 min
θ N
∑
i=1
˜ Yi − ˜ Xiθ2
2 + λcθ1
˜ Yi = Y1,i Y2,i √ λt ˜ Isp,i , ˜ Xi = X ⊤
T1,i −X ⊤ D,i
X ⊤
T2,i
X ⊤
L,i
X ⊤
T,i
X ⊤
Ispm,i
√ λtX ⊤
Isp,i
, Prior model ˜ Isp from Roux [2005] ˜ Isp,i = ˜ Isp(✉i, ①i), i = 1, . . . , N.
23/49
FEATURE SELECTION RESULTS
25 different B737-800, 10 471 flights = 8 261 619 observations,
23/49
FEATURE SELECTION RESULTS
25 different B737-800, 10 471 flights = 8 261 619 observations, Block sparse Bolasso used for T, D, L and Isp, We expect similar model structures,
23/49
FEATURE SELECTION RESULTS
Feature selection results for the thrust, drag, lift and specific impulse models.
24/49
ACCURACY OF DYNAMICS PREDICTIONS
25/49
REALISM OF HIDDEN ELEMENTS
26/49
FLIGHT RESIMULATION
① ✉
✉ ✉
✉
① ①
①
① ① ✉
✉ ①
26/49
FLIGHT RESIMULATION
Last assessment criterion = static;
① ✉
✉ ✉
✉
① ①
①
① ① ✉
✉ ①
26/49
FLIGHT RESIMULATION
Last assessment criterion = static; Does not incorporate the fact that the observations are time dependent;
① ✉
✉ ✉
✉
① ①
①
① ① ✉
✉ ①
26/49
FLIGHT RESIMULATION
Last assessment criterion = static; Does not incorporate the fact that the observations are time dependent; Does not take into account the goal of optimally controlling the aircraft system.
① ✉
✉ ✉
✉
① ①
①
① ① ✉
✉ ①
26/49
FLIGHT RESIMULATION
Last assessment criterion = static; Does not incorporate the fact that the observations are time dependent; Does not take into account the goal of optimally controlling the aircraft system. Another possible dynamic criterion: min
(①,✉)
tn
t0
✉(t) − ✉test(t)2
✉ + ①(t) − ①test(t)2 ①
- dt
s.t. ˙ ①(t) = g(①(t), ✉(t), ˆ θ), where · ✉, · ① denote scaling norms.
26/49
FLIGHT RESIMULATION
Last assessment criterion = static; Does not incorporate the fact that the observations are time dependent; Does not take into account the goal of optimally controlling the aircraft system. Another possible dynamic criterion: min
(①,✉)
tn
t0
✉(t) − ✉test(t)2
✉ + ①(t) − ①test(t)2 ①
- dt
s.t. ˙ ①(t) = g(①(t), ✉(t), ˆ θ), where · ✉, · ① denote scaling norms. For practical applications: t ↔ h
27/49
FLIGHT RESIMULATION
28/49
FLIGHT RESIMULATION
FIGURE: Distribution of the off-sample simulation error and boxplot of the optimization number of iterations and CPU time.
29/49
SYSTEM IDENTIFICATION CONCLUSION
1 Proposed Equation-Error Method approaches which extend to
the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),
29/49
SYSTEM IDENTIFICATION CONCLUSION
1 Proposed Equation-Error Method approaches which extend to
the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),
2 Applicable to large amounts of data,
29/49
SYSTEM IDENTIFICATION CONCLUSION
1 Proposed Equation-Error Method approaches which extend to
the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),
2 Applicable to large amounts of data, 3 Block-sparse estimators are proved to lead to consistent
structured feature selection,
29/49
SYSTEM IDENTIFICATION CONCLUSION
1 Proposed Equation-Error Method approaches which extend to
the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),
2 Applicable to large amounts of data, 3 Block-sparse estimators are proved to lead to consistent
structured feature selection,
4 Can be efficiently trained using LARS algorithm as they are
equivalent to successive Lasso problems,
29/49
SYSTEM IDENTIFICATION CONCLUSION
1 Proposed Equation-Error Method approaches which extend to
the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),
2 Applicable to large amounts of data, 3 Block-sparse estimators are proved to lead to consistent
structured feature selection,
4 Can be efficiently trained using LARS algorithm as they are
equivalent to successive Lasso problems,
5 Compared to regular Nonlinear Least-Squares:
Similar performances in accuracy and training time,
29/49
SYSTEM IDENTIFICATION CONCLUSION
1 Proposed Equation-Error Method approaches which extend to
the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),
2 Applicable to large amounts of data, 3 Block-sparse estimators are proved to lead to consistent
structured feature selection,
4 Can be efficiently trained using LARS algorithm as they are
equivalent to successive Lasso problems,
5 Compared to regular Nonlinear Least-Squares:
Similar performances in accuracy and training time, No initialization required,
29/49
SYSTEM IDENTIFICATION CONCLUSION
1 Proposed Equation-Error Method approaches which extend to
the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),
2 Applicable to large amounts of data, 3 Block-sparse estimators are proved to lead to consistent
structured feature selection,
4 Can be efficiently trained using LARS algorithm as they are
equivalent to successive Lasso problems,
5 Compared to regular Nonlinear Least-Squares:
Similar performances in accuracy and training time, No initialization required, Light, interpretable and compact data-dependent models (more than 50% compression),
29/49
SYSTEM IDENTIFICATION CONCLUSION
1 Proposed Equation-Error Method approaches which extend to
the System Identification framework well-known supervised learning techniques (Lasso, Ridge, bootstrap,...),
2 Applicable to large amounts of data, 3 Block-sparse estimators are proved to lead to consistent
structured feature selection,
4 Can be efficiently trained using LARS algorithm as they are
equivalent to successive Lasso problems,
5 Compared to regular Nonlinear Least-Squares:
Similar performances in accuracy and training time, No initialization required, Light, interpretable and compact data-dependent models (more than 50% compression), Faster convergence when applied to control problems.
30/49
TRAJECTORY ACCEPTABILITY
31/49
TRAJECTORY ACCEPTABILITY
min
(①,✉)∈X×U
tf
C(✉(t), ①(t))dt, s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (AOCP) ③ ① ✉ ③
31/49
TRAJECTORY ACCEPTABILITY
min
(①,✉)∈X×U
tf
C(✉(t), ①(t))dt, s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (AOCP) ⇒ ˆ ③ = ( ˆ ①, ˆ ✉) solution of (AOCP). ③
31/49
TRAJECTORY ACCEPTABILITY
min
(①,✉)∈X×U
tf
C(✉(t), ①(t))dt, s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (AOCP) ⇒ ˆ ③ = ( ˆ ①, ˆ ✉) solution of (AOCP). Is ˆ ③ inside the validity region of the dynamics model ˆ g ?
31/49
TRAJECTORY ACCEPTABILITY
min
(①,✉)∈X×U
tf
C(✉(t), ①(t))dt, s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (AOCP) ⇒ ˆ ③ = ( ˆ ①, ˆ ✉) solution of (AOCP). Is ˆ ③ inside the validity region of the dynamics model ˆ g ? Does it look like a real trajectory ?
31/49
TRAJECTORY ACCEPTABILITY
min
(①,✉)∈X×U
tf
C(✉(t), ①(t))dt, s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (AOCP) ⇒ ˆ ③ = ( ˆ ①, ˆ ✉) solution of (AOCP). Is ˆ ③ inside the validity region of the dynamics model ˆ g ? Does it look like a real trajectory ? Pilots acceptance Air Traffic Control2
31/49
TRAJECTORY ACCEPTABILITY
min
(①,✉)∈X×U
tf
C(✉(t), ①(t))dt, s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (AOCP) ⇒ ˆ ③ = ( ˆ ①, ˆ ✉) solution of (AOCP). Is ˆ ③ inside the validity region of the dynamics model ˆ g ? Does it look like a real trajectory ? Pilots acceptance Air Traffic Control2 How can we quantify the closeness from the optimized trajectory to the set of real flights?
32/49
OPTIMIZED TRAJECTORY LIKELIHOOD
Assumption: We suppose that the real flights are observations of the same functional random variable Z = (Zt) valued in C(T, E), with E compact subset of Rd and T = [0, tf ]. How likely is it to draw the optimized trajectory from the law
- f Z ?
33/49
HOW TO APPLY THIS TO FUNCTIONAL DATA?
Problem: Computation of probability densities in infinite dimensional space.
33/49
HOW TO APPLY THIS TO FUNCTIONAL DATA?
Problem: Computation of probability densities in infinite dimensional space. Standard approach in Functional Data Analysis: use Functional Principal Component Analysis to decompose the data in a small number of coefficients ⇒
33/49
HOW TO APPLY THIS TO FUNCTIONAL DATA?
Problem: Computation of probability densities in infinite dimensional space. Standard approach in Functional Data Analysis: use Functional Principal Component Analysis to decompose the data in a small number of coefficients Or: we can use the marginal densities
34/49
HOW DO WE AGGREGATE THE MARGINAL
LIKELIHOODS?
ft marginal density of Z, i.e. probability density function of Zt, ② new trajectory, ft(②(t)) marginal likelihood of ② at t, i.e. likelihood of
- bserving Zt = ②(t).
② ②
34/49
HOW DO WE AGGREGATE THE MARGINAL
LIKELIHOODS?
ft marginal density of Z, i.e. probability density function of Zt, ② new trajectory, ft(②(t)) marginal likelihood of ② at t, i.e. likelihood of
- bserving Zt = ②(t).
MEAN MARGINAL LIKELIHOOD MML(Z, ②) = 1 tf
tf
ψ[ft, ②(t)]dt, where ψ : L1(E, R+) × R → [0; 1] is a continuous scaling map,
34/49
HOW DO WE AGGREGATE THE MARGINAL
LIKELIHOODS?
ft marginal density of Z, i.e. probability density function of Zt, ② new trajectory, ft(②(t)) marginal likelihood of ② at t, i.e. likelihood of
- bserving Zt = ②(t).
MEAN MARGINAL LIKELIHOOD MML(Z, ②) = 1 tf
tf
ψ[ft, ②(t)]dt, where ψ : L1(E, R+) × R → [0; 1] is a continuous scaling map, because marginal densities may have really different shapes.
35/49
HOW DO WE AGGREGATE THE MARGINAL
LIKELIHOODS?
Possible scalings are the normalized density ψ[ft, ②(t)] := ft(②(t)) max
z∈E ft(z),
② ②
35/49
HOW DO WE AGGREGATE THE MARGINAL
LIKELIHOODS?
Possible scalings are the normalized density ψ[ft, ②(t)] := ft(②(t)) max
z∈E ft(z),
- r the confidence level
ψ[ft, ②(t)] := P (ft(Zt) ≤ ft(②(t))) .
36/49
HOW DO WE DEAL WITH SAMPLED CURVES?
In practice, the m trajectories are sampled at variable discrete times: T D := {(tr
j , zr j )} 1≤j≤n 1≤r≤m
⊂ T × E, zr
j := ③r(tr j ),
Y := {(˜ tj, yj)} ˜
n j=1 ⊂ T × E,
yj := ②(˜ tj).
36/49
HOW DO WE DEAL WITH SAMPLED CURVES?
In practice, the m trajectories are sampled at variable discrete times: T D := {(tr
j , zr j )} 1≤j≤n 1≤r≤m
⊂ T × E, zr
j := ③r(tr j ),
Y := {(˜ tj, yj)} ˜
n j=1 ⊂ T × E,
yj := ②(˜ tj). Hence, we approximate the MML using a Riemann sum which aggregates consistent estimators ˆ f m
˜ tj of the marginal densities f˜ tj:
EMMLm(T D, Y) := 1 tf
˜ n
∑
j=1
ψ[ ˆ f m
˜ tj , yj]∆˜
tj.
37/49
HOW CAN WE ESTIMATE MARGINAL DENSITIES?
37/49
HOW CAN WE ESTIMATE MARGINAL DENSITIES?
In practice, the altitude plays the role of time, so we can’t assume the same sampling for each trajectory;
37/49
HOW CAN WE ESTIMATE MARGINAL DENSITIES?
In practice, the altitude plays the role of time, so we can’t assume the same sampling for each trajectory; Assume sampling times {tr
j : j = 1, . . . , n; r = 1, . . . , m} to be
i.i.d. observations of a r.v. T, indep. Z;
37/49
HOW CAN WE ESTIMATE MARGINAL DENSITIES?
In practice, the altitude plays the role of time, so we can’t assume the same sampling for each trajectory; Assume sampling times {tr
j : j = 1, . . . , n; r = 1, . . . , m} to be
i.i.d. observations of a r.v. T, indep. Z; Our problem can be seen as a conditional probability density learning problem with (X, Y ) = (T, ZT), where ft is the density of Zt = (ZT|T = t) = (Y |X).
37/49
HOW CAN WE ESTIMATE MARGINAL DENSITIES?
In practice, the altitude plays the role of time, so we can’t assume the same sampling for each trajectory; Assume sampling times {tr
j : j = 1, . . . , n; r = 1, . . . , m} to be
i.i.d. observations of a r.v. T, indep. Z; Our problem can be seen as a conditional probability density learning problem with (X, Y ) = (T, ZT), where ft is the density of Zt = (ZT|T = t) = (Y |X).
1 We can apply SOA conditional density estimation techniques,
such as LS-CDE [Sugiyama et al., 2010],
37/49
HOW CAN WE ESTIMATE MARGINAL DENSITIES?
In practice, the altitude plays the role of time, so we can’t assume the same sampling for each trajectory; Assume sampling times {tr
j : j = 1, . . . , n; r = 1, . . . , m} to be
i.i.d. observations of a r.v. T, indep. Z; Our problem can be seen as a conditional probability density learning problem with (X, Y ) = (T, ZT), where ft is the density of Zt = (ZT|T = t) = (Y |X).
1 We can apply SOA conditional density estimation techniques,
such as LS-CDE [Sugiyama et al., 2010],
2 We can use a fine partitioning of the time domain.
38/49
PARTITION BASED MARGINAL DENSITY ESTIMATION
Idea: to average in time the marginal densities over small bins by applying classical multivariate density estimation techniques to each subset.
39/49
CONSISTENCY
We denote by: Θ : S → L1(E, R+) multivariate density estimation statistic, S = {(zk)N
k=1 ∈ E N : N ∈ N∗} set of finite sequences,
39/49
CONSISTENCY
We denote by: Θ : S → L1(E, R+) multivariate density estimation statistic, S = {(zk)N
k=1 ∈ E N : N ∈ N∗} set of finite sequences,
m the number of random curves; T m
t
subset of data points whose sampling times fall in the bin containing t;
39/49
CONSISTENCY
We denote by: Θ : S → L1(E, R+) multivariate density estimation statistic, S = {(zk)N
k=1 ∈ E N : N ∈ N∗} set of finite sequences,
m the number of random curves; T m
t
subset of data points whose sampling times fall in the bin containing t; ˆ f m
t
:= Θ[T m
t ] estimator trained using T m t .
40/49
CONSISTENCY
ASSUMPTION 1 - POSITIVE TIME DENSITY ν ∈ L∞(E, R+) density function of T, s.t. ν+ := ess sup
t∈T
ν(t) < ∞, ν− := ess inf
t∈T ν(t) > 0.
40/49
CONSISTENCY
ASSUMPTION 1 - POSITIVE TIME DENSITY ν ∈ L∞(E, R+) density function of T, s.t. ν+ := ess sup
t∈T
ν(t) < ∞, ν− := ess inf
t∈T ν(t) > 0.
ASSUMPTION 2 - LIPSCHITZ IN TIME Function (t, z) ∈ T × E → ft(z) is continuous and |ft1(z) − ft2(z)| ≤ L|t1 − t2|, L > 0.
40/49
CONSISTENCY
ASSUMPTION 1 - POSITIVE TIME DENSITY ν ∈ L∞(E, R+) density function of T, s.t. ν+ := ess sup
t∈T
ν(t) < ∞, ν− := ess inf
t∈T ν(t) > 0.
ASSUMPTION 2 - LIPSCHITZ IN TIME Function (t, z) ∈ T × E → ft(z) is continuous and |ft1(z) − ft2(z)| ≤ L|t1 − t2|, L > 0. ASSUMPTION 3 - SHRINKING BINS The homogeneous partition {Bm
ℓ }qm ℓ=1 of [0; tf ], with binsize bm, is
s.t. lim
m→∞ bm = 0,
lim
m→∞ mbm = ∞.
41/49
CONSISTENCY
ASSUMPTION 4 - I.I.D. CONSISTENCY G arbitrary family of probability density functions on E, ρ ∈ G, SN
ρ i.i.d sample of size N drawn from ρ valued in S.
The estimator obtained by applying Θ to SN
ρ , denoted by
ˆ ρN := Θ[SN
ρ ] ∈ L1(E, R+),
is a (pointwise) consistent density estimator, uniformly in ρ: For all z ∈ E, ε > 0, α1 > 0, there is Nε,α1 > 0 such that, for any ρ ∈ G, N ≥ Nε,α1 ⇒ P
- ˆ
ρN(z) − ρ(z)
- < ε
- > 1 − α1.
42/49
CONSISTENCY
THEOREM 1 Under assumptions 1 to 4, for any z ∈ E and t ∈ T, ˆ f m
ℓm(t)(z)
consistently approximates the marginal density ft(z) as the number
- f curves m grows:
∀ε > 0, lim
m→∞ P
| ˆ f m
t (z) − ft(z)| < ε
= 1.
42/49
CONSISTENCY
THEOREM 1 Under assumptions 1 to 4, for any z ∈ E and t ∈ T, ˆ f m
ℓm(t)(z)
consistently approximates the marginal density ft(z) as the number
- f curves m grows:
∀ε > 0, lim
m→∞ P
| ˆ f m
t (z) − ft(z)| < ε
= 1. Note that: m → ∞ = N → ∞,
42/49
CONSISTENCY
THEOREM 1 Under assumptions 1 to 4, for any z ∈ E and t ∈ T, ˆ f m
ℓm(t)(z)
consistently approximates the marginal density ft(z) as the number
- f curves m grows:
∀ε > 0, lim
m→∞ P
| ˆ f m
t (z) − ft(z)| < ε
= 1. Note that: m → ∞ = N → ∞, Number of samples = random,
42/49
CONSISTENCY
THEOREM 1 Under assumptions 1 to 4, for any z ∈ E and t ∈ T, ˆ f m
ℓm(t)(z)
consistently approximates the marginal density ft(z) as the number
- f curves m grows:
∀ε > 0, lim
m→∞ P
| ˆ f m
t (z) − ft(z)| < ε
= 1. Note that: m → ∞ = N → ∞, Number of samples = random, Training data not i.i.d.
43/49
MARGINAL DENSITY ESTIMATION RESULTS
43/49
MARGINAL DENSITY ESTIMATION RESULTS
43/49
MARGINAL DENSITY ESTIMATION RESULTS
44/49
HOW GOOD IS IT COMPARED TO OTHER METHODS?
44/49
HOW GOOD IS IT COMPARED TO OTHER METHODS?
Training set of m = 424 flights ≃ 334 531 point observations,
44/49
HOW GOOD IS IT COMPARED TO OTHER METHODS?
Training set of m = 424 flights ≃ 334 531 point observations, Test set of 150 flights
44/49
HOW GOOD IS IT COMPARED TO OTHER METHODS?
Training set of m = 424 flights ≃ 334 531 point observations, Test set of 150 flights Discrimination power comparison with (gmm-)FPCA and (integrated) LS-CDE:
Var. Estimated Likelihoods Real Opt1 Opt2 MML 0.63 ± 0.07 0.43 ± 0.08 0.13 ± 0.02 FPCA 0.16 ± 0.12 6.4e-03 ± 3.8e-03 3.6e-03 ± 5.4e-03 LS-CDE 0.77 ± 0.05 0.68 ± 0.04 0.49 ± 0.06
44/49
HOW GOOD IS IT COMPARED TO OTHER METHODS?
Training set of m = 424 flights ≃ 334 531 point observations, Test set of 150 flights Discrimination power comparison with (gmm-)FPCA and (integrated) LS-CDE:
Var. Estimated Likelihoods
- Tr. Time
Real Opt1 Opt2 MML 0.63 ± 0.07 0.43 ± 0.08 0.13 ± 0.02 5s FPCA 0.16 ± 0.12 6.4e-03 ± 3.8e-03 3.6e-03 ± 5.4e-03 20s LS-CDE 0.77 ± 0.05 0.68 ± 0.04 0.49 ± 0.06 14h
45/49
MML PENALTY
The MML can be used not only to assess the optimization solutions, but also to penalize the optimization itself: min
(①,✉)∈X×U
tf
C(✉(t), ①(t))dt − λ MML(Z, ①), s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (MML-AOCP)
45/49
MML PENALTY
The MML can be used not only to assess the optimization solutions, but also to penalize the optimization itself: min
(①,✉)∈X×U
tf
C(✉(t), ①(t))dt − λ MML(Z, ①), s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (MML-AOCP)
45/49
MML PENALTY
The MML can be used not only to assess the optimization solutions, but also to penalize the optimization itself: min
(①,✉)∈X×U
tf
C(✉(t), ①(t))dt − λ MML(Z, ①), s.t. ˙ ①(t) = ˆ g(✉(t), ①(t)), a.e. t ∈ [0, tf ], Other constraints... (MML-AOCP) λ sets trade-off between a fuel minimization and a likelihood maximization,
46/49
PENALTY EFFECT
47/49
TRAJECTORY ACCEPTABILITY CONCLUSION
1 General probabilistic criterion using marginal densities to
quantify the closeness between a curve and a set of random trajectories,
47/49
TRAJECTORY ACCEPTABILITY CONCLUSION
1 General probabilistic criterion using marginal densities to
quantify the closeness between a curve and a set of random trajectories,
2 Class of consistent plug-in estimators, based on “histogram”
- f multivariate density estimators,
47/49
TRAJECTORY ACCEPTABILITY CONCLUSION
1 General probabilistic criterion using marginal densities to
quantify the closeness between a curve and a set of random trajectories,
2 Class of consistent plug-in estimators, based on “histogram”
- f multivariate density estimators,
3 Applicable to the case of aircraft climb trajectories,
47/49
TRAJECTORY ACCEPTABILITY CONCLUSION
1 General probabilistic criterion using marginal densities to
quantify the closeness between a curve and a set of random trajectories,
2 Class of consistent plug-in estimators, based on “histogram”
- f multivariate density estimators,
3 Applicable to the case of aircraft climb trajectories,
Competitive with other well-established SOA approaches,
47/49
TRAJECTORY ACCEPTABILITY CONCLUSION
1 General probabilistic criterion using marginal densities to
quantify the closeness between a curve and a set of random trajectories,
2 Class of consistent plug-in estimators, based on “histogram”
- f multivariate density estimators,
3 Applicable to the case of aircraft climb trajectories,
Competitive with other well-established SOA approaches,
4 Particular Adaptive Kernel and Gaussian mixture
implementation,
47/49
TRAJECTORY ACCEPTABILITY CONCLUSION
1 General probabilistic criterion using marginal densities to
quantify the closeness between a curve and a set of random trajectories,
2 Class of consistent plug-in estimators, based on “histogram”
- f multivariate density estimators,
3 Applicable to the case of aircraft climb trajectories,
Competitive with other well-established SOA approaches,
4 Particular Adaptive Kernel and Gaussian mixture
implementation,
Showed that it can be used in optimal control problems to
- btain solutions close to optimal, and still realistic.
48/49
THANK YOU FOR YOUR ATTENTION
49/49
REFERENCES
Anderson, R. M. and May, R. M. (1992). Infectious Diseases of Humans: Dynamics and Control. Oxford university press. Bach, F. (2008). Bolasso: model consistent Lasso estimation through the bootstrap. In ICML, pages 33–40. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM
- algorithm. JRSS-B, pages 1–38.
Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32:407–499. FitzHugh, R. (1961). Impulses and physiological states in theoretical models of nerve membrane. Biophysical journal, 1(6):445–466. Friedman, J., Hastie, T., and Tibshirani, R. (2010). A note on the Group Lasso and a Sparse group Lasso. arXiv:1001.0736. Jategaonkar, R. V. (2006). Flight Vehicle System Identification: A Time Domain Methdology. AIAA. Mattingly, J. D., Heiser, W. H., and Daley, D. H. (1992). Aircraft Engine Design. University Press. Nagumo, J., Arimoto, S., and Yoshizawa, S. (1962). An active pulse transmission line simulating nerve axon. Proceedings of the IRE, 50(10):2061–2070. Obozinski, G., Taskar, B., and Jordan, M. I. (2006). Multi-task feature selection. In ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning. Roux, E. (2005). Pour une approche analytique de la dynamique du vol. PhD thesis, Supaero. Sugiyama, M., Takeuchi, I., Suzuki, T., Kanamori, T., Hachiya, H., and Okanohara, D. (2010). Conditional density estimation via least-squares density ratio estimation. In AISTAT, pages 781–788. Tibshirani, R. (1994). Regression shrinkage and selection via the Lasso. JRSS-B, 58:267–288. Tikhonov, A. N. (1943). On the stability of inverse problems. In Doklady Akademii Nauk SSSR, volume 39, pages 195–198. Yuan, M. and Lin, Y. (2005). Model selection and estimation in regression with grouped variables. JRSS-B, 68:49–67.
50/49
ACCURACY OF DYNAMICS PREDICTIONS
FIGURE: Leave-one-out off-sample errors distributions for nonlinear least-squares NLLS and block-sparse bolasso BSBL. Median errors are annotated and marked by dashed vertical lines.
51/49
STRUCTURED FEATURE SELECTION
STATE-OF-THE-ART
Other methods Difference with Block-sparse Lasso Group Lasso Groups sparsity is fixed a priori, [Yuan and Lin, 2005] Sparse Group Lasso Sparsity induced only within group, [Friedman et al., 2010] Multi-task Lasso Not same pattern for every task. [Obozinski et al., 2006]
52/49
THEOREM (BOLASSO CONSISTENCY - BACH [2008]) For λ = λ0N− 1
2 and λ0 > 0, assume that
(H1) the cumulant generating functions E
- exp(sX2
2)
- and E
- exp(sY 2
2)
- are finite for some s > 0.
(H2) the joint matrix of second order moments Q = E
- XX ⊤ ∈ Rp×p is invertible.
(H3) E [Y |X] = X · θ and Var [Y |X] = σ2 a.s. for some θ ∈ Rp and σ ∈ R∗
+.
Then, for any b > 0, the probability that algorithm 1 does not exactly select the correct model has the following upper bound: P [J = J∗] ≤ bA1e−A2N + A3 log N N1/2 + A4 log b b , where A1, A2, A3, A4 > 0.
53/49
GENERALIZED TIKHONOV REGULARIZATION OF ISP
Equivalent to Γ(θ − ˜ θ)2
2 with Γi = ( 0, . . . , 0,
- dT +dD+dL
X ⊤
Isp) and
Γi ˜ θ = ˜ Isp,i.
54/49
MML CONSISTENCY FOR STANDARD KERNEL
ESTIMATOR
ASSUMPTION 5 The function (t, z) ∈ T × E → ft(z) is C4(E) in z and C1(T) in t ; the Lipschitz constant of the function t → d2ft dz2 (z) := f ′′
t (z)
is denoted by L′′ > 0: for any z ∈ E and t1, t2 ∈ T, |f ′′
t1 (z) − f ′′ t2 (z)| ≤ L′′|t1 − t2|.
55/49
MML CONSISTENCY FOR STANDARD KERNEL
ESTIMATOR
σ2
Kσ =
- w2Kσ(w)dw = σ2
- w2K(w)dw = σ2σ2
K,
σ2
K 2
σ =
- w2Kσ(w)2dw = σ
- w2K(w)2dw = σσ2
K 2,
R(Kσ) =
- Kσ(w)2dw = 1
σ
- K(w)2dw = 1
σR(K). THEOREM 2 Under assumptions 1, 3 and 5, if ˆ f m
ℓm(t) is a KDE where the kernel
K and the bandwidth σ := σm are deterministic, such that σK < ∞, σK 2 < ∞, R(K) < ∞ and if lim
m→∞ σm = 0,
lim
m→∞ mbmσm = +∞,
then lim
m→∞ E
- ( ˆ
f m
ℓm(t)(z) − ft(z))2
= 0.
56/49
THEOREM 1 PROOF SKETCH
lim
m→∞ |ft(z) − f m ℓm(t)(z)| = 0.
lim
m→∞ P(Nm r,ℓm(t) ≤ 1) = 1,
r = 1, . . . , m, ∀M > 0, lim
m→∞ P
- Nm
ℓm(t) > M
- = 1.
CM := {Nm
ℓm(t) > M} m
- r=1
{Nm
r,ℓm(t) ≤ 1}.
∀M > 0, lim
m→∞ P (CM) = 1.
∀ε > 0, lim
m→∞ P
- | ˆ
f m
ℓm(t)(z) − f m ℓm(t)(z)| < ε
- = 1.
57/49
FLIGHT MECHANICS MODELS
ρ =
P RsSAT
SAT(h) = T0 + αTh, SAT(TAT, M) =
TAT 1+ λ−1
2 M2
M =
V Vsound = V (λRsSAT)
1 2
58/49
CONSUMPTION X ACCEPTABILITY TRADE-OFF
FIGURE: Average over 20 flights of the fuel consumption and MML score (called acceptability here) of optimized trajectories with varying MML-penalty weight λ.
59/49
GAUSSIAN MIXTURE MODEL FOR MARGINAL
DENSITIES
ft(z) =
K
∑
k=1
wt,kφ(z, µt,k, Σt,k),
K
∑
k=1
wt,k = 1, wt,k ≥ 0, φ(z, µ, Σ) := 1
- (2π)d det Σ
e− 1
2 (z−µ)⊤Σ−1(z−µ).
Assuming that the number of components is known, the weights wt,k, means µt,k and covariance matrices Σt,k need to be estimated.
60/49
MAXIMUM LIKELIHOOD PARAMETERS ESTIMATION
For K = 1, maximum likelihood estimates have closed form: L(µt,1, Σt,1|z1, . . . , zN) =
N
∏
i=1
1
- (2π)d det Σt,1
e− 1
2 (z−µt,1)⊤Σ−1 t,1(z−µt,1)
ˆ θ := ( ˆ µt,1, ˆ Σt,1) = arg min
(µt,1,Σt,1) N
∑
i=1
- log det Σt,1 + (zi − µt,1)⊤Σ−1
t,1(zi − µt,1)
ˆ µt,1 = 1 N
N
∑
i=1
zi, ˆ Σt,1 = 1 N
N
∑
i=1
(zi − ˆ µt,1)(zi − ˆ µt,1)⊤.
61/49
EM ALGORITHM
Hidden random variable J valued on {1, . . . , K}, If ith observation Ji = k, then zi was drawn from the kth component, Group observations by component and compute ( ˆ µt,k, ˆ Σt,k) with K = 1 maximum likelihood formulas. EXPECTATION-MAXIMIZATION - [DEMPSTER ET AL., 1977] Initialization: ˆ θ = ( ˆ wt,k, ˆ µt,k, ˆ Σt,k)K
k=1 = (w0 t,k, µ0 t,k, Σ0 t,k)K k=1,
Expectation: For k = 1, . . . , K and i = 1, . . . , N, ˆ wt,k = 1 N
N
∑
i=1
ˆ πk,i, ˆ πk,i := P(Ji = k| ˆ θt, Zh) = ˆ µt,kφ(zi, ˆ µt,k, ˆ Σt,k) ∑N
j=1 ˆ
wt,kφ(zj, ˆ µt,k, ˆ Σt,k) . Maximization: ˆ µt,k = ∑N
i=1 ˆ
πk,izi ∑N
i=1 ˆ
πk,i , ˆ Σt,k = ∑N
i=1 ˆ
πk,i(zi − ˆ µt,k)(zi − ˆ µt,k)⊤ ∑N
i=1 ˆ