SLIDE 1 A DYNAMIC BAYESIAN BELIEF NETWORK APPROACH FOR MODELING THE ATM NETWORK DELAYS
Yiğit Bekir Kaya Data Science Researcher at CAL, Aeronautics Graduate Student
Director of CAL Istanbul Technical University Controls and Avionics Laboratory Aeronautics Research Center
SLIDE 2
INTRODUCTION
SLIDE 3 Problem Statement
¨ Modeling of ATM Network Delays ¨ Identifying Patterns and Best
Practices for Resilience against System Upsets (Resilience2050.eu)
¨ Creating a stochastic model that can be used as the basis for
¤ Dynamic Slot Management (SecureDataCloud SESAR WP-E) ¤ A-Collaborative Decision Making
SLIDE 4 Real Goals
¨ Giovanni Bisignani (CEO of IATA) claims:
“Shaving one minute off each commercial flight would save 5.0 million tons of CO2 emissions and $3.8 billion in fuel costs each year”
¨ For airlines having about 2% of market share (e.g. THY), the saving
is $76 Million per year
SLIDE 5 Causes of Delays
¨ Weather
¤ Capacity Decrease
n Runway Change n Change in movements per hour ¨ ATC Capacity ¨ Aerodrome Capacity ¨ Environmental Issues
¤ Volcano eruption
¨ Special Events
¤ Airspace closure
n Military
¤ Airline strikes
¨ ATC Staffing ¨ Accident/Incident ¨ Airspace Management
SLIDE 6
Network Flow Model
SLIDE 7
Air Traffic Connectivity Graph
SLIDE 8 Network Delay Propagation and Flow Model
- Each node of the network is a
sector in any demanded level and may include set of aerodromes (airports)
- Flights that start and end in
the same sector is represented with a loop
- Airports of the network system
are represented with sources and sinks in each sector block. In this regard, whole sector block can be deemed as delay (and traffic) generator/consumer which consists of mini generators.
SLIDE 9
Delay Propagation
SLIDE 10
- By comparing flown profile (CPF
- f CTFM) with filed profile
(FTFM), generated delays due to sector capacity/restriction/ traffic overflow is obtained for each Flight
- Delays are investigated in
FIR segments
- Major delays are generated
at aerodromes (or at TMAs)
Main Delay Focus
SLIDE 11 Delay time behavior [Pyrgiotis, Malone, Odoni]
¨ ρ is utilization rate ¨ If ρ < 1 the system is at steady
state and
¤ proportional to
¨ Otherwise the system is
chaotic
SLIDE 12
Effect of Demand on Expected Delay
SLIDE 13
Effect of Annual Operations on Delays
SLIDE 14
FRA Airport
SLIDE 15
EWR Airport
SLIDE 16
Capacity Effects on Delay – 50 move/hr
SLIDE 17
Capacity Effects on Delay – 40 move/hr
SLIDE 18
Capacity Effects on Delay – 30 move/hr
SLIDE 19
Capacity Envelopes (Pareto Optimality)
SLIDE 20
Weather Effect on Capacity Envelope
SLIDE 21
Weather Effect on Capacity Envelope (ATL) [FAA]
SLIDE 22
Weather Effect on Capacity Envelope (BOS) [FAA]
SLIDE 23
Queue Model
SLIDE 24
DELAY PERCEPTION
SLIDE 25
Phases of Flight
SLIDE 26
Delay Schema
SLIDE 27 Perception of Delays
¨ Initial Delay Perception
¤ AOBT – EOBT (delay based on estimation; EOBT = IOBT 96% in data)
¨ Strict Definition Delay (Pushback, Gate-out delay)
¤ AOBT – SOBT
¨ Passenger Perceived Delay
¤ ATOT – STOT
¨ Taxi-out Delay
¤ (ATOT - AOBT) – (STOT - SOBT)
¨ Taxi to TMA Exit Delay (ADTET = Actual Departure TMA Exit Time)
¤ (ADTET – ATOT) – (SDTET – STOT)
¨ Departure Delay
¤ Pushback Delay + Taxi Delay + Taxi to TMA Delay ¤ ADTET – SDTET
SLIDE 28 Perception of Delays
¨ En-route Delay (AATET = Actual Arrival TMA Enter Time)
¤ (AATET – ADTET) – (SATET – SDTET)
¨ TMA Entry to Taxi Delay
¤ (ATOA – AATET) – (STOA – SATET)
¨ Taxi-in Delay
¤ (AIBT – ATOA) – (SIBT – STOA)
¨ Gate-in Delay
¤ AIBT – SIBT
¨ Arrival Delay
¤ TMA Entry to Taxi Delay + Taxi-in Delay + Gate-in Delay ¤ 2*(AIBT – SIBT) – (AATET – SATET)
¨ Passenger perception (*)
¤ STOT as departure time ¤ SIBT as arrival time
SLIDE 29 Data Source
¨ The ALLFT+ data set is managed by the PRISME (Pan-European
Repository of Information Supporting the Management of European Air Traffic Management Master Plan)
¨ Every entry is a single flight information
¤ Flight Plan ¤ Tactical Flight Model
n FTFM n RTFM n CTFM
¤ Routes
n CPF-GEN n CPF-REF
SLIDE 30 ALLFT+ Temporal Variables
¨ AOBT and EOBT as is ¨ IOBT = STOT ¨ SOBT = STOT – nominal time (e.g. 15 min ~ airport) ¨ SFP (Planned Flight Profile, FTFM), AFP (Actual Flight Profile, CPF/
CTFM)
¨ ATOT= first radar point (AFP[0].entryTime)
¤ 0 stands for first entry and -1 stands for last entry (Circular array
notation)
¨ ADTET = AFP[0].exitTime, SDTET = SFP[0].exitTime ¨ AATET = AFP[-1].entryTime, SATET = SFP[-1].entryTime ¨ ATOA = AFP[-1].exitTime, STOA = SFP[-1].exitTime ¨ AIBT
, SIBT are not in ALLFT+ data
¤ Taking (AIBT – SIBT) as nominal (gate-in)
SLIDE 31
DELAY PREDICTION MODELS
SLIDE 32 Some Methods for Delay Prediction Modeling
¨ Linear/Nonlinear Regression ¨ Graphical Models
¤ (Dynamic) Bayesian Belief Network ¤ Hidden Markov Models ¤ Kalman Filter
¨ Time Series Model
¤ SARIMA, GARCH
¨ Nonparametric Methods
¤ Nonparametric Density Estimation
n Kernel estimator, Histogram, k-NN
¤ Smoothing models
n Mean, kernel, running line, moving median, smoothing splines
¤ Multilayer Perceptrons
¨ Decision Trees
¤ Random Forest
SLIDE 33
BAYESIAN BELIEF NETWORK
SLIDE 34
Bayesian Network Structure
SLIDE 35 Bayesian Network Model
¨ P(G,S,R)=
P(G|S,R)P(S|R)P(R)
Belief Propagation:
¨ P(X|E)=
αP(X|E+)P(E−|X) = απ(X)λ(X)
SLIDE 36
Bayesian Network Examples
SLIDE 37
Departure Delay DBBN (initial)
SLIDE 38
All Flight Model (initial)
SLIDE 39
Departure Delay DBBN (TAN optimized)
SLIDE 40 Previous Approaches
¨ Big Picture Approach
¤ No assumptions about inner models of airports ¤ OD pairs are analyzed independently (Eulerian Approach)
¨ Pure Bayesian Model
¤ No assumption about mathematical structure of delay propagation ¤ Observation (data evidence) based probabilistic model
¨ Time Behavior
¤ There is a stochastic relationship between lags
SLIDE 41
Previous Results
SLIDE 42 Previous Conclusions
¨ Departure delay prediction benefits from Belief Propagation more
than other phases
¨ There is a ±22.5 min margin of error from Departure Delay
Prediction for 95% confidence interval
¨ More data samples are needed for accuracy increase ¨ More information should be provided to the system in order to
model underlying system
¨ Weather and Capacity Data should be aggregated along with Delay
Data
SLIDE 43 SARIMA, GARCH
Time Series
SLIDE 44
Time Behavior of Movements
SLIDE 45 Our Aggregate SARIMA Model
¨ Two timing approach
¤ Seasonal periodicity (s) ¤ Hourly periodicity (t)
¨ Delay = f(s, t) = Φ(s) + Θ(t) + w ¨ fbar(s) = daily mean of f(s, t) ¨ Φ(s) = SARIMA(fbar(s)) + WeatherModel(fbar(s)) ¨ f’(t) = hourly mean of {f(s, t) - Φ(s)} (Making levels even) ¨ Θ(t) = SARIMA(f’(t)) + QueueModel(f’(t)) + WeatherModel(f’(t)) ¨ w = f(s, t) - Φ(s) + Θ(t) ¨ w ~ N(0, σ)
SLIDE 46
Delay Prediction SARIMA (Barcelona-Madrid)
SLIDE 47
Special Day 1: May 6 2011 (Military)
SLIDE 48
Special Day 2: 30 May 2011 (Weather)
SLIDE 49 SARIMA
¨ SARIMA
¤ AR - Auto regressive ¤ MA - Moving Average ¤ I - Integrated ¤ S – Seasonal
¨ Conditions
¤ TS should be linear ¤ TS should be stationary ¤ TS should not have any trends (detrending) ¤ TS should be significantly different than white noise ¤ Residuals should be white noise
SLIDE 50 AR, MA, ARMA, ARIMA
¨ Condition Analysis
¤ Non-linearity: White Test ¤ Stationary/Explosive: Dicky-Fuller Test ¤ White Noise: Box-Jung Test ¤ Seasonality: Auto Correlation Function ¤ Cross Correlation
¨ AR(p)
¤ xt − µ = φ1(xt−1 − µ) + φ2(xt−2 − µ) + ··· + φp(xt−p − µ) + wt,
¨ MA(q)
¤ xt = wt + θ1wt−1 + θ2wt−2 + ··· + θqwt−q,
¨ ARMA(p, q)
¤ xt = α + φ1xt−1 + ··· + φpxt−p + wt + θ1wt−1 + ··· + θqwt−q
SLIDE 51 SARIMA&GARCH
¨ A Sample Equation for ARMA(2,2) Model:
¤ xt = .4xt−1 + .45xt−2 + wt + wt−1 + .25wt−2 ¤ Where xt denotes dependent time series and wt denotes white noise
time series
¤ wt~N(0,σw
2) ¨ ARIMA(0; 0; 0)x(0; 0; 1)12 ¨ GARCH: Similar to ARIMA
¤ Generalized Auto Regressive Conditional Heteroskedasticity ¤ Heteroskedasticity: No constant variance assumption ¤ The variance can be estimated ¤ Yt = f(X1,t;… ; Xp,t) +σ(X1,t; … ; Xp,t) wt;
SLIDE 52
COMPARISON OF MODELS
SLIDE 53 Comparison of Models
¨ Using only Bayesian Network causes higher margin of error than
SARIMA model for the same confidence interval
¨ However, Bayesian Network provides a probability distribution
rather than only mean and standard error.
¨ Bayesian can process missing values ¨ Belief propagation might decrease the variability of the result ¨ Random Forest provides importance information ¨ Non parametric methods such as Multilayer Perceptrons are highly
dependent on current data and does not provide a parametric inference
¨ Non parametric methods are very sensitive to initial state ¨ Non parametric methods can “over-fit” data ¨ Non parametric benefit from online learning
SLIDE 54
DATA UNDERSTANDING
SLIDE 55 Data format in ALLFT+
¨ Unstructured
¤ Data has no structural information in it
n No labels for features (or fields) n No hierarchy between fields ¨ Text formatted files
¤ BigData is stored in plain text files ¤ Fields are separated by symbols or white space characters
¨ Very large at size
¤ Daily information is about several Gigabytes for flight information ¤ A simple aggregated database can reach to Terabytes
SLIDE 56 Data format in ALLFT+
¨ Data generation has very high speed
¤ Collecting data is far more faster than analyzing them ¤ There is not enough bandwidth to transfer data to outside servers for
processing BigData in reasonable time
¨ Diverse source of information collection
¤ For aviation every stakeholder have their own interest of collecting
data
¤ Aligning and synchronizing data can be cumbersome
SLIDE 57 ALLFT+ Data Profiles - I
ALL_FT+ Data set includes eight different Airspace profile; Tactical flight Models
¨ FTFM - Filed Tactical Flight Model; The FTFM is the “initial” profile
as it reflects the status of the demand before activation of the regulation plan. It is computed with the latest flight plan version, sent by each AO to the CFMU/IFPS
¨ RTFM - Regulated Tactical Flight Model; The RTFM is the
“regulated” profile as it reflects the status of the demand after activation of the regulation plan. It is computed with the latest ATFM slot (CTOT) issued to the AO, by the ground regulation system
¨ CTFM - Current Tactical Flight Model; The CTFM is the “actual”
profile as it integrates the actual entry time of the flights in the regulated TV. It is computed with the Radar Data sent by ACCs to CFMU/ETFMS ref
SLIDE 58 ALLFT+ Data Profiles - II
¨ CPG_GEN - Profiles generated by the CFMU path generation tool
¤ SCR - Shortest Constrained Route ¤ SRR - Shortest RAD restriction applied Route ¤ SUR - Shortest Unconstrained Route ¤ DCT - Direct route
¨ CPF - Correlated Position reports for a Flight; CPRs (Correlated
Position Reports) which are surveillance data collected from the ACCs.
SLIDE 59
BIGDATA MANAGEMENT TOOL
SLIDE 60
BMT Schema
SLIDE 61 BMT Overview
¨ BMT converts unstructured text file to managed and efficient
database
¨ BMT allows BigData to be consumed efficiently by applications
processing BigData provided
¨ BMT can process any BigData which is in text format separated by
symbols
¤ Other extensions can be built for other possibilities
SLIDE 62 BMT adds value to BigData management
¨ Stores data in MongoDB
¤ Schemaless design ¤ Flexible, distributed ¤ Highly efficient and secure
¨ Reduces the size of data stored
¤ Up to 5-6 times (with extensions the size of data can be further
reduced up to 60-140 times with some trade-off issues)
¤ Reducing size and distributed structure of MongoDB makes data
processing ultra faster depending on the scale of the data
SLIDE 63 What is the impact of BMT?
¨ BMT eliminates client application’s parsing overhead at every run
¤ Structured information is stored in efficient-to-process data centers
¨ By utilizing Python rather than MATLAB the parsing time of a single
day of ALLFT+ data is dropped from 5 hours to 14 minutes at the same specs
¤ It can be further reduced by distributing database across shards and
utilizing supercomputers
SLIDE 64
Applications of BMT
SLIDE 65 Calculation of greenhouse emissions in Istanbul
¨ Calculating and monitoring CO and other greenhouse gases in
Istanbul is an ongoing project implemented jointly with ITU Eurasia Institute of Earth Sciences
¤ Emissions of various gases are calculated efficiently by utilizing flight
mode and other information processed by BMT
SLIDE 66 Real-time traffic delay prediction
¨ Predicting delay propagation in an airport by combining various
machine learning techniques based on different approaches
¤ Machine learning algorithms are implemented for parallel and
distributed computation utilizing BMT
SLIDE 67
THANK YOU!
Any Questions? Any Comments?