Travel Time Estimation using Approximate Belief States on a Hidden - - PowerPoint PPT Presentation

travel time estimation using approximate belief states on
SMART_READER_LITE
LIVE PREVIEW

Travel Time Estimation using Approximate Belief States on a Hidden - - PowerPoint PPT Presentation

Travel Time Estimation using Approximate Belief States on a Hidden Markov Model Walid Krichene Overview Context Inference on a HMM Modeling framework and exact inference Approximate Inference: the Boyen-Koller algorithm Graph Partitioning


slide-1
SLIDE 1

Travel Time Estimation using Approximate Belief States on a Hidden Markov Model

Walid Krichene

slide-2
SLIDE 2

Overview

Context Inference on a HMM Modeling framework and exact inference Approximate Inference: the Boyen-Koller algorithm Graph Partitioning

slide-3
SLIDE 3

Overview

Context Inference on a HMM Modeling framework and exact inference Approximate Inference: the Boyen-Koller algorithm Graph Partitioning

slide-4
SLIDE 4

Context

◮ Mobile Millennium project ◮ Travel time estimation on an Arterial Network

slide-5
SLIDE 5

Context

◮ Mobile Millennium project ◮ Travel time estimation on an Arterial Network ◮ Input data: probe vehicles that send their GPS locations

periodically

slide-6
SLIDE 6

Context

◮ Mobile Millennium project ◮ Travel time estimation on an Arterial Network ◮ Input data: probe vehicles that send their GPS locations

periodically

◮ processed using path inference

slide-7
SLIDE 7

Context

◮ Mobile Millennium project ◮ Travel time estimation on an Arterial Network ◮ Input data: probe vehicles that send their GPS locations

periodically

◮ processed using path inference ◮ observation = (path, travel time along the path)

slide-8
SLIDE 8

Objective

Improve inference algorithm

◮ Time complexity exponential in size of the network (number

  • f links)
slide-9
SLIDE 9

Objective

Improve inference algorithm

◮ Time complexity exponential in size of the network (number

  • f links)

◮ Solution: assume links are independent ◮ But lose structure of network

slide-10
SLIDE 10

Objective

Improve inference algorithm

◮ Time complexity exponential in size of the network (number

  • f links)

◮ Solution: assume links are independent ◮ But lose structure of network ◮ Need approximate inference to keep the structure

slide-11
SLIDE 11

Overview

Context Inference on a HMM Modeling framework and exact inference Approximate Inference: the Boyen-Koller algorithm Graph Partitioning

slide-12
SLIDE 12

Graphical Model

◮ Nodes: random variables ◮ Conditional independence: x and y are independent

conditionally on (n1, n2) but not on n1

slide-13
SLIDE 13

Hidden Markov Model

◮ Hidden variables st ∈ (s1, . . . , sN) ◮ Observed variables yt ◮ (s0, . . . , st) is a Markov process

slide-14
SLIDE 14

Hidden Markov Model

◮ Hidden variables st ∈ (s1, . . . , sN) ◮ Observed variables yt ◮ (s0, . . . , st) is a Markov process ◮ Hidden variables are introduced to simplify the model

slide-15
SLIDE 15

Hidden Markov Model

◮ Hidden variables st ∈ (s1, . . . , sN) ◮ Observed variables yt ◮ (s0, . . . , st) is a Markov process ◮ Hidden variables are introduced to simplify the model ◮ Interesting because provides efficient algorithms to do

inference and parameter estimation

slide-16
SLIDE 16

Parametrization of a HMM

◮ Initial probability distribution πi = P(si 0)

slide-17
SLIDE 17

Parametrization of a HMM

◮ Initial probability distribution πi = P(si 0) ◮ Transition Matrix: Ti,j = P(sj t+1|si t)

slide-18
SLIDE 18

Parametrization of a HMM

◮ Initial probability distribution πi = P(si 0) ◮ Transition Matrix: Ti,j = P(sj t+1|si t) ◮ Observation model: P(yt|st)

slide-19
SLIDE 19

Parametrization of a HMM

◮ Initial probability distribution πi = P(si 0) ◮ Transition Matrix: Ti,j = P(sj t+1|si t) ◮ Observation model: P(yt|st) ◮ Completely characterizes the HMM: We can compute

probability of any event.

slide-20
SLIDE 20

Inference

General inference problem: compute P(st|y0:T)

slide-21
SLIDE 21

Inference

General inference problem: compute P(st|y0:T)

◮ Filtering if t = T ◮ Prediction if t > T ◮ Smoothing if t < T

slide-22
SLIDE 22

Inference

General inference problem: compute P(st|y0:T)

◮ Filtering if t = T ◮ Prediction if t > T ◮ Smoothing if t < T

Let y = y0:T P(s|y) = P(s, y) P(y) = α(st)β(st)

  • st α(st)β(st)

where α(st) ∆ = P(y0, . . . , yt, st) β(st) ∆ = P(yt+1, . . . , yT|st)

slide-23
SLIDE 23

Message passing algorithms

Recursive algorithm to compute α(st) and β(st)

slide-24
SLIDE 24

Message passing algorithms

Recursive algorithm to compute α(st) and β(st)

◮ α(st+1) = st α(st)Tst,st+1P(yt+1|st+1) ◮ β(st) = st+1 β(st+1)P(yt+1|st+1)Tst,st+1

slide-25
SLIDE 25

Message passing algorithms

Recursive algorithm to compute α(st) and β(st)

◮ α(st+1) = st α(st)Tst,st+1P(yt+1|st+1) ◮ β(st) = st+1 β(st+1)P(yt+1|st+1)Tst,st+1 ◮ Complexity: O(N2T) operations ◮ α recursion: for every t, N possible values of st+1, each

α(st+1) requires N multiplications

slide-26
SLIDE 26

Parameter estimation

Parameters of the HMM: θ = (π, T, η)

◮ T: transition matrix ◮ π: initial state probability distribution ◮ η: parameters of observation model: P(yt|st, η)

Parameter estimation: maximize log likelihood w.r.t θ ln

  • s0
  • s1

· · ·

  • sT

πs0

T−1

  • t=0

Tst,st+1

T

  • t=0

P(yt|st, η)

slide-27
SLIDE 27

Expectation Maximization algorithm

◮ E step: estimate the hidden (unobserved) variables given the

  • bserved variables and the current estimate of θ
slide-28
SLIDE 28

Expectation Maximization algorithm

◮ E step: estimate the hidden (unobserved) variables given the

  • bserved variables and the current estimate of θ

◮ M step: maximize likelihood function under assumption that

latent variables are known (they are “filled-in” with their expected values)

slide-29
SLIDE 29

Expectation Maximization algorithm

In the case of HMMs:

◮ ˆ

Tij =

T−1

t=0 ξ(si t,sj t+1)

T−1

t=0 γ(si t)

◮ ˆ

ηij =

T

t=0 γ(si t)yj t

T

t=0 γ(si t)

◮ ˆ

πi =

α(si

0)β(si 0)

  • s0 α(s0)β(s0)

where ξ and γ are simple functions of α and β.

Time complexity

O(N2T) operations

slide-30
SLIDE 30

Overview

Context Inference on a HMM Modeling framework and exact inference Approximate Inference: the Boyen-Koller algorithm Graph Partitioning

slide-31
SLIDE 31

Modeling framework

◮ System modeled using a Hidden Markov Model. ◮ L links

slide-32
SLIDE 32

Modeling framework

◮ System modeled using a Hidden Markov Model. ◮ L links

Hidden variables

Link l: discrete state Sl

t ∈ {1, . . . , K}

  • state of entire system St = (S1

t , . . . , SL t )

  • N = K L possible states
  • Markov process P(St+1|S0, . . . , St) = P(St+1|St)
slide-33
SLIDE 33

Modeling framework

◮ System modeled using a Hidden Markov Model. ◮ L links

Hidden variables

Link l: discrete state Sl

t ∈ {1, . . . , K}

  • state of entire system St = (S1

t , . . . , SL t )

  • N = K L possible states
  • Markov process P(St+1|S0, . . . , St) = P(St+1|St)

Observed variables

We observe travel times: random variables Distributions depend on the state of links

slide-34
SLIDE 34

HMM

slide-35
SLIDE 35

Parametrization of the HMM

Transition model

Tt(si → sj) ∆ = P(sj

t+1|si t)

Transition matrix, size K L

slide-36
SLIDE 36

Parametrization of the HMM

Observation model

Probability to observe response y = (l, xi, xf , δ) given state s at time t Ot(s → y) ∆ = P(yt|st) = gl,s

t (δ) ×

xf

xi

ρl

t(x)dx ◮ gl,s t : distribution of total travel time on link l at state s. ◮ ρl t: probability distribution of vehicle locations (results from

traffic assumptions)

slide-37
SLIDE 37

Parametrization of the HMM

Observation model

Probability to observe response y = (l, xi, xf , δ) given state s at time t Ot(s → y) ∆ = P(yt|st) = gl,s

t (δ) ×

xf

xi

ρl

t(x)dx ◮ gl,s t : distribution of total travel time on link l at state s. ◮ ρl t: probability distribution of vehicle locations (results from

traffic assumptions)

Assumptions

Processes time invariant during 1 hour time slices

slide-38
SLIDE 38

Travel time estimation

◮ Estimate state of system

slide-39
SLIDE 39

Travel time estimation

◮ Estimate state of system ◮ Estimate parameters of models (observation)

slide-40
SLIDE 40

Travel time estimation

◮ Estimate state of system ◮ Estimate parameters of models (observation) ◮ Update estimation when new responses are observed

slide-41
SLIDE 41

Travel time estimation

◮ Estimate state of system ◮ Estimate parameters of models (observation) ◮ Update estimation when new responses are observed

Belief State

pt(s) ∆ = P(st|y0:t) Probability distribution over possible states

slide-42
SLIDE 42

Travel time estimation

Bayesian tracking of the belief state: forward-backward propagation (O(N2T) time) Can be done in O(N2): pt

T [.]

→ qt+1

Oy[.]

→ pt+1

slide-43
SLIDE 43

Travel time estimation

Bayesian tracking of the belief state: forward-backward propagation (O(N2T) time) Can be done in O(N2): pt

T [.]

→ qt+1

Oy[.]

→ pt+1

Parameter estimation of the model

◮ update parameters of probability distribution of vehicle

locations: solve max

  • x∈X l

t

ln ρl

t(x)

where X l

t are the observed vehicle locations

slide-44
SLIDE 44

Parameter estimation of the model

◮ update Transition matrix: EM algorithm in O(N2T)

  • perations
slide-45
SLIDE 45

Parameter estimation of the model

◮ update Transition matrix: EM algorithm in O(N2T)

  • perations

◮ Exact inference and parameter estimation done in

O(N2T) = O(K 2LT) time complexity

slide-46
SLIDE 46

Computational intractability

Exact inference and EM algorithm not tractable

Size of the belief state and transition matrix exponential in size of network. EM algorithm takes exponential time in L.

slide-47
SLIDE 47

Computational intractability

Exact inference and EM algorithm not tractable

Size of the belief state and transition matrix exponential in size of network. EM algorithm takes exponential time in L.

◮ Assume independence of links?

slide-48
SLIDE 48

Computational intractability

Exact inference and EM algorithm not tractable

Size of the belief state and transition matrix exponential in size of network. EM algorithm takes exponential time in L.

◮ Assume independence of links? ◮ Use approximate tracking instead, limit the size of the

network?

slide-49
SLIDE 49

Overview

Context Inference on a HMM Modeling framework and exact inference Approximate Inference: the Boyen-Koller algorithm Graph Partitioning

slide-50
SLIDE 50

Approximate Belief State

Choose a family of belief states that have a compact representation.

slide-51
SLIDE 51

Approximate Belief State

Choose a family of belief states that have a compact representation.

Factorized belief state

Decompose process into subprocesses. Approximate probability of state s by product of marginal probabilities of sub states sc pt(s) = P(st|y0:t) ≈

C

  • c=1

P(sc

t |y0:t)

=

C

  • c=1

˜ pc

t (sc) ∆

= ˜ pt(s)

slide-52
SLIDE 52

Approximate Belief State

Example

A network with 3 links, S = (S1, S2, S3), links 1 and 2 are in cluster 1 and link 3 is in cluster 2. pt((0, 1, 1)) = P(St = (0, 1, 1)|y0:t) ≈ P((S1

t , S2 t ) = (0, 1)|y0:t)P(S3 t = 1|y0:t)

= ˜ p1

t ((0, 1))˜

p2

t (1) ∆

= ˜ pt((0, 1, 1))

slide-53
SLIDE 53

Approximate Belief State

slide-54
SLIDE 54

Approximate Belief State

Perform Bayesian tracking and parameter estimation (EM algorithm) on each ˜ p separately.

Transition model

Assume state of cluster c at t + 1 only depend on state of N(c) at t Transition matrix of size K |N(c)| × K |Sc|

Inference

Inference is done on the subprocesses separately ˜ pt

T [.]

→ ˆ qt+1

Oy[.]

→ ˆ pt+1

Π[.]

→ ˜ pt+1 The new approximate belief state ˜ pt+1 is computed as the product

  • f marginal distributions over the subprocesses.

Observation model and EM algorithm are the same.

slide-55
SLIDE 55

Time Complexity

Let L′ be the maximum size of subprocesses, C the number of clusters, and M = K L′

◮ Time complexity of inference + parameter estimation (EM) is

O(CM2T).

slide-56
SLIDE 56

Time Complexity

Let L′ be the maximum size of subprocesses, C the number of clusters, and M = K L′

◮ Time complexity of inference + parameter estimation (EM) is

O(CM2T).

◮ If L′ is fixed:

slide-57
SLIDE 57

Time Complexity

Let L′ be the maximum size of subprocesses, C the number of clusters, and M = K L′

◮ Time complexity of inference + parameter estimation (EM) is

O(CM2T).

◮ If L′ is fixed:

C increases linearly with L

slide-58
SLIDE 58

Time Complexity

Let L′ be the maximum size of subprocesses, C the number of clusters, and M = K L′

◮ Time complexity of inference + parameter estimation (EM) is

O(CM2T).

◮ If L′ is fixed:

C increases linearly with L time complexity becomes O(CT) = O(LT): linear in L as

  • pposed to the original algorithm (O(K 2LT))
slide-59
SLIDE 59

Approximation error

Use the Kullback-Leibler divergence (relative entropy) D[pt, ˜ pt] =

  • i

pt(si) ln pt(si) ˜ pt(si) The error is shown to be bounded D[pt, ˜ pt] ≤ ǫ (γ/r)q for some ǫ, where each subprocess Tc depends on at most r

  • thers, and affects at most q others.

◮ Smaller error for more independent subprocesses.

slide-60
SLIDE 60

Approximation error

Use the Kullback-Leibler divergence (relative entropy) D[pt, ˜ pt] =

  • i

pt(si) ln pt(si) ˜ pt(si) The error is shown to be bounded D[pt, ˜ pt] ≤ ǫ (γ/r)q for some ǫ, where each subprocess Tc depends on at most r

  • thers, and affects at most q others.

◮ Smaller error for more independent subprocesses. ◮ Trade-off between speed and approximation error.

slide-61
SLIDE 61

Approximation error

Use the Kullback-Leibler divergence (relative entropy) D[pt, ˜ pt] =

  • i

pt(si) ln pt(si) ˜ pt(si) The error is shown to be bounded D[pt, ˜ pt] ≤ ǫ (γ/r)q for some ǫ, where each subprocess Tc depends on at most r

  • thers, and affects at most q others.

◮ Smaller error for more independent subprocesses. ◮ Trade-off between speed and approximation error. ◮ Define a partitioning of the Network into subgraphs such that

clusters depend weekly.

slide-62
SLIDE 62

Overview

Context Inference on a HMM Modeling framework and exact inference Approximate Inference: the Boyen-Koller algorithm Graph Partitioning

slide-63
SLIDE 63

Graph Partitioning

We need to cluster the network into sub-graphs that interact weakly (to have a small approximation error). Use historical observations to define a weighted graph that describes interaction.

slide-64
SLIDE 64

Graph Partitioning

We need to cluster the network into sub-graphs that interact weakly (to have a small approximation error). Use historical observations to define a weighted graph that describes interaction.

Weighted graph

◮ Set P of observed paths

slide-65
SLIDE 65

Graph Partitioning

We need to cluster the network into sub-graphs that interact weakly (to have a small approximation error). Use historical observations to define a weighted graph that describes interaction.

Weighted graph

◮ Set P of observed paths ◮ Each path p ∈ P is a sequence of connected links

p = (li1, . . . , lik)

slide-66
SLIDE 66

Graph Partitioning

We need to cluster the network into sub-graphs that interact weakly (to have a small approximation error). Use historical observations to define a weighted graph that describes interaction.

Weighted graph

◮ Set P of observed paths ◮ Each path p ∈ P is a sequence of connected links

p = (li1, . . . , lik)

◮ Define weight of edge (i, j)

wi,j = #

  • p ∈ P|li

p

→ lj

  • # {p ∈ P|li ∈ p}
slide-67
SLIDE 67

Graph Partitioning

We need to cluster the network into sub-graphs that interact weakly (to have a small approximation error). Use historical observations to define a weighted graph that describes interaction.

Weighted graph

◮ Set P of observed paths ◮ Each path p ∈ P is a sequence of connected links

p = (li1, . . . , lik)

◮ Define weight of edge (i, j)

wi,j = #

  • p ∈ P|li

p

→ lj

  • # {p ∈ P|li ∈ p}

◮ Weights are normalized ∀i, j wi,j = 1

slide-68
SLIDE 68

Partitioning the weighted graph

Loss function

Minimize a loss function L((Gc)1≤c≤C) =

  • c,c′

cut(Gc, Gc′)

slide-69
SLIDE 69

Partitioning the weighted graph

Loss function

Minimize a loss function L((Gc)1≤c≤C) =

  • c,c′

cut(Gc, Gc′) where cut(Gc, Gc′) =

  • li∈Gc,lj∈Gc′

wi,j

◮ Does it yield good results?

slide-70
SLIDE 70

Partitioning the weighted graph

Minimizing the cut function yields unbalanced clusters

slide-71
SLIDE 71

Partitioning the weighted graph

Appropriate loss function

Minimize L((Gc)1≤c≤C) =

  • c,c′

Ncut(Gc, Gc′)

slide-72
SLIDE 72

Partitioning the weighted graph

Appropriate loss function

Minimize L((Gc)1≤c≤C) =

  • c,c′

Ncut(Gc, Gc′) where Ncut(Gc, Gc′) = cut(Gc, Gc′)

  • li∈Gc wi,j +

li∈Gc′ wi,j

slide-73
SLIDE 73

Partitioning the weighted graph

◮ Normalizing the cut function favors balanced clusters

slide-74
SLIDE 74

Partitioning the weighted graph

◮ Normalizing the cut function favors balanced clusters ◮ Exact solution is NP hard

slide-75
SLIDE 75

Partitioning the weighted graph

◮ Normalizing the cut function favors balanced clusters ◮ Exact solution is NP hard ◮ Use METIS algorithm for approximate solution

slide-76
SLIDE 76

Partitioning the weighted graph

◮ Normalizing the cut function favors balanced clusters ◮ Exact solution is NP hard ◮ Use METIS algorithm for approximate solution ◮ Post process the output to have connected clusters

slide-77
SLIDE 77

Results of Graph Partitioning

Tested on historical data aggregated on 1 hour time periods, for each day of week, over 3 months.

slide-78
SLIDE 78

Results of Graph Partitioning

slide-79
SLIDE 79

Results of Graph Partitioning

slide-80
SLIDE 80

Results of Graph Partitioning

◮ Geographically connected clusters ◮ Connected arteries appear in the same cluster ◮ Sections of highway 80 (Bay Bridge) and neighboring links all

appear in the same cluster

slide-81
SLIDE 81

Summary

Adapted BK algorithm to our model and provided a study and description of its steps. BK is promising because:

◮ Trade-off between fast computation and approximation error:

can adjust the size of network to choose speed over accuracy.

◮ If we limit the size of subgraphs: polynomial time in the size

  • f the network.

◮ Error remains bounded in time ◮ Possibility of concurrent processing ◮ Possibility of short term prediction:

Use the transition matrix learned up to time t0 and propagate the belief state pt0 up to time t0 + T pt0

T

→ pt0+1

T

→ · · · T → pt0+T Addressed the graph partition problem. The BK algorithm should be tested to evaluate the performance. I started its implementation and it is being carried by the arterial team.

slide-82
SLIDE 82

Thank you.