Yanjie Fu
Toward Automated Pattern Discovery: Deep Representation Learning with Spatial-Temporal-Networked Data
—Collective, Dynamic, and Structured Analysis
Toward Automated Pattern Discovery: Deep Representation Learning - - PowerPoint PPT Presentation
Toward Automated Pattern Discovery: Deep Representation Learning with Spatial-Temporal-Networked Data Collective, Dynamic, and Structured Analysis Yanjie Fu Outline 5 Background and Motivation Collective Representation Learning
Yanjie Fu
—Collective, Dynamic, and Structured Analysis
5
¨ Background and Motivation
¨ Collective Representation Learning ¨ Dynamic Representation Learning ¨ Structured Representation Learning ¨ Conclusions and Future Work
6
IoT, GPS, wireless sensors, mobile Apps
Cyber World Physical World
¨ Spatial, Temporal, and Networked (STN) data can be
grid netload
¨ from a variety of sources
tagged photos (Flickr), check-ins (Foursquare, Yelp)
7
Taxicab GPS Traces Phone Traces Mobile Check-ins
Represent the spatial, temporal, social, and semantic contexts of dynamic human/systems behaviors within and across regions
Bus Traces
8
User Profiling & Recommendation Systems Intelligent Transportation Systems Personalized and Intelligent Education Smart Heath Care City Governance and Emergency Management Solar Analytics for Energy Saving
¨ Spatiotemporallly non-i.i.d.
¨ Spatial autocorrelation ¨ Spatial heterogeneity ¨ Sequential asymmetric patterns ¨ Temporal periodicity and dependency
9
Spatial autocorrelations Temporal periodical patterns Sequential asymmetric transitions Spatial heterogeneity
10
¨ Networked over time
¨ Collectively-related
¨ Heterogeneous
¨ Multi-source ¨ Multi-view ¨ Multi-modality
¨ Semantically-rich
¨ Trajectory semantics ¨ User semantics ¨ Event semantics ¨ Region semantics
¨ Feature identification and quantification
features
11
Car Not car Input Pattern/Feature extraction Classification / Clustering Output Classic Machine learning
¨ Multi-source unbalanced data fusion
weighted combination
unbalanced data?
12
Car Not car Input Pattern/Feature extraction Classification / Clustering Output Classic Machine learning
¨ Field data/real-world systems are usually lack of
electricity and solar-generated electricity are unknown
13
Car Not car Input Pattern/Feature extraction Classification / Clustering Output Classic Machine learning
14
Car Not car Feature extraction + Classification/Clustering Input Output Task-specific (End to End) Deep Learning Car Not car Input Unsupervised Pattern (Feature / Representation ) Learning Classification /Clustering Output Generic Deep Learning Automated feature learning Feature learning from multi-source data Lack of labels
¨ Classic algorithms are not directly available in
networked data regularities
n Regression + spatial properties = spatial autoregression method n Clustering + spatial properties = spatial co-location method
spatiotemporal networked data?
15
Car Not car Input Pattern/Feature extraction Classification / Clustering Output Classic Machine learning
Data Regularity-aware Unsupervised Representation Learning
16
Human and system behaviors have spatiotemporally socially regularities Data Regularity- aware representation learning
Regularities of spatiotemporal networked data
Car Not car Generic Deep Learning Automated feature learning Feature learning from multi-source data Lack of labels Data regularities
17
Collective representation learning with multi-view data Dynamic representation learning with stream data Structured representation learning with global and sub structure preservation
Automated Feature Learning from Spatial-Temporal-Networked Data
18
¨ Background and Motivation
¨ Deep Collective Representation
¨ Deep Dynamic Representation Learning ¨ Deep Structured Representation Learning ¨ Conclusion and Future Work
19
¨ Consumer City Theory, Edward L. Glaeser (2001), Harvard
University.
¨ More by Nathan Schiff (2014), University of British Columbia. Victor
Coutour (2014), UC Berkeley. Yan Song (2014), UNC Chapel Hill.
¨ Spatial Characters: walkable, dense, compact, diverse, accessible,
connected, mixed-use, etc.
¨ Socio-economic Characters: willingness to pay, intensive social
interactions, attract talented workers and cutting-edge firms, etc.
What are the underlying driving forces of a vibrant community?
Supported by NSF CISE pre-Career award (III- 1755946)
20
¨ Mobile checkin data ¨ Frequency and diversity of mobile checkins
#(674689:,1234) #(674689:)
=>? #(674689:,1234)
#(674689:)
, where type denotes the activity type of mobile users
¨ Fused scoring
IJ4∗L9M (NO∗IJ4PL9M)
Community rankings Vibrancy Score Shopping Transport Dinning Travel Lodging
Urban vibrancy is reflected by the frequency and diversity of user activities.
Spatial Unbalance of Urban Community Vibrancy
21
Motivation Application: How to Quantify Spatial Configurations and Social Interactions
22
Urban Community =Spatial Configuration + Social Interactions
Static Element Dynamic Element
¨ POIs à nodes ¨ Human mobility
connectivity between two POIs à edge weights
¨ Edge weights are
asymmetric
10
¨ Different days-hours à different periodic mobility
patterns à different graph structures
24
Collective Representation Learning with Multi-view Graphs
12
Multiple Graphs Feature Vector Representations Spatial Objects (e.g., Regions) Constraint: the multi-view graphs are collaboratively related
¨ The encoding-decoding representation learning paradigm
reconstructed graphs
26
x x
y yz
input matrix D d1 d2 dNSolving Multi-graph Inputs: An Ensemble-Encoding Dissemble-Decoding Method
27
NN as an input unit
NN as an output unit of decoder
signal ensemble (Multi-perceptron summation ) signal dissemble (Multi-perceptron filtering ) Minimize reconstruction loss
28
8 > > > > < > > > > : y(k),1
i,t
= σ(W(k),1
i,t
p(k)
i,t + b(k),1 i,t
), ∀t ∈ {1, 2, · · · , 7}, y(k),r
i,t
= σ(W(k),r
i,t
p(k)
i,t + b(k),r i,t
), ∀r ∈ {2, 3, · · · , o}, y(k),o+1
i
= σ(P
t W(k),o+1 t
y(k),o
i,t
+ b(k),o+1
t
), z(k)
i
= σ(W(k),o+2y(k),o+1
i
+ b(k),o+2),
ˆ y(k),o+1
i
= σ( ˆ W(k),o+2z(k)
i
+ ˆ b(k),o+2), ˆ y(k),o
i,t
= σ( ˆ W(k),o+1
t
ˆ y(k),o+1
i
+ ˆ b(k),o+1
t
), ˆ y(k),r−1
i,t
= σ( ˆ W(k),r
i,t
ˆ y(k),r
i,t
+ ˆ b(k),r
i,t
), ∀r ∈ {2, 3, · · · , o}, ˆ p(k)
i,t
= σ( ˆ W(k),1
i,t
ˆ y(k),1
i,t
+ ˆ b(k),1
i,t
),
L(k) = X
t∈{1,2,...,7}
X
i
k(p(k)
i,t ˆ
p(k)
i,t ) v(k) i,t k2 2
Ensemble Encoding
Function
Dissemble Decoding
Ensemble multi-graphs Dissemble multi-graphs Sparsity regularization: If mobility connectivity = 0, weight=1 to penalize the loss If mobility connectivity >0, weight>1 Reconstruction loss
Comparisons with Features Generated By Different Methods
29
¨
Data
¨
Ranking Models
method, which trains multiple weak rankers and combines their outputs as final ranking.
the underlying probabilistic cost function.
¨
Feature Sets
distance graphs not mobility graphs
average not collective
non-weighted not unsupervised weighted.
¨
Evaluation Criteria
@5 @10 @15 @20 NDCG
0.0 0.2 0.4 0.6 0.8 1.0 1.2ELF−MART LF−MART EF−MART V−1−MART V−2−MART V−3−MART ELF−RN LF−RN EF−RN V−1−RN V−2−RN V−3−RN ELF−RB LF−RB EF−RB V−1−RB V−2−RB V−3−RB
Comparison with Baseline Representation Learning Algorithms
30
@5 @10 @15 @20 NDCG
0.0 0.2 0.4 0.6 0.8 1.0 1.2 Our Model NMF RBM Skip−gram@5 @10 @15 @20 NDCG
0.0 0.2 0.4 0.6 0.8 1.0 1.2 Our Model NMF RBM Skip−gram@5 @10 @15 @20 NDCG
0.0 0.2 0.4 0.6 0.8 1.0 1.2 Our Model NMF RBM Skip−gramNDCG@N comparisons over LambdaMART NDCG@N comparisons over ListNe NDCG@N comparisons over MART NDCG@N comparisons over RankBoost
¨
Ranking Models
¨
Baseline Methods
Boltzmann machine
factorization
¨
Evaluation Criteria
ranking performance at Top N
¨ Task
¨ Modeling
approach
decoding
¨ Application
vibrancy
31
32
¨ Background and Motivation ¨ Collective Representation Learning ¨ Dynamic Representation Learning ¨ Structured Representation Learning ¨ Conclusion and Future Work
33
Space Router
< t1, lat1, lon1 >
< t2,lat2,lon2 >
< t
3
, l a t
3
, l
3
> < t4, lat4, lon4 > < t5, lat5, lon5 >
Turn Left Turn Right Accelerate
Motivation Application: Machine-Learning Based Driving Behavior Analysis
34
Driving Behavior Analysis Insurance Companies
35
¨ Driving Operations
acceleration, deceleration, constant speed
Turning right, left, moving straight
¨ Driving States
Speed Operation Acceleration Deceleration Constant Speed Direction Operation Turning Right Turning Left Moving Straight +
Quantifying Driving Habits with Driving State Transition Graphs
36
Left turn + deceleration Right turn + deceleration Left turn + deceleration Straight+ Acceleration Driving State
Transition Duration Transition Duration View
Driving State
Transition Probability Transition Frequency View
Driving style & habit patterns
37
…… ……
t=1 T=2 t=T
Driving State Driving State
Transition Frequency: 0.4 Transition Duration: 1 minuets
from one to another (unusual high-frequency: drunk?)
from one to another (unusually fast: non-comfortable driving habits)
Dynamic Representation Learning with Graph Stream
38
sequence of time-varying yet relational vectors
¨ Structural Reservation
are similar
¨ Temporal Dependency
¨ Peer Dependency
feature vectors
39
¨ Structural Reservation: Minimizing reconstruction loss
40
y1
i
= σ(W1xi + b1), yk
i
= σ(Wkyk−1
i
+ bk), ∀k ∈ {2, 3, · · · , o}, zi = σ(Wo+1yo
i + bo+1).
ˆ yo
i
= σ( ˆ Wo+1zi + ˆ bo+1), ˆ yk−1
i
= σ( ˆ Wkˆ yk
i + ˆ
bk), ∀k ∈ {2, 3, · · · , o}, ˆ xi = σ( ˆ W1ˆ y1
i + ˆ
b1).
… … … … … … …
x ˆ x
ˆ y1 y1
z
input matrix D d1 d2 dN
Input vector Encoded vector Embedding Embedding Decoded vector Input vector
The encoding phrase: encode input vector into embedding; The decoding phrase: decode the embedding to recover input.
Learned Representation
¨ Temporal Dependency: Current driving operations are
related to previous driving operations
41
#Sequential Encode Step (y1
i )τ
= σ(W1xτ
i + b1),
(yk
i )τ
= σ(Wk(yk−1
i
)τ + bk), ∀k ∈ {2, 3, · · · , o}, zτ
i
= (1 − cτ)zτ−1
i
+ cτ˜ zτ
i .
#Sequential Decode Step (ˆ yo
i )τ
= σ( ˆ Wo+1zτ
i + ˆ
bo+1), (ˆ yk−1
i
)τ = σ( ˆ Wk(ˆ yk
i )τ + ˆ
bk), ∀k ∈ {2, 3, · · · , o}, ˆ xτ
i
= σ( ˆ W1(ˆ y1
i )τ + ˆ
b1).
Feed the output of Autoencoder’s hidden layer into Gated Recurrent Unit
Current hidden layer depends on previous hidden layer
¨ Peer Dependency: Drivers with similar driving behaviors
should share similar latent representations
42
similar similar Transition Graph Representation learning learning Hc(Gτ) = X
ui2U
X
uj2U,ui6=uj
sτ
i,j · kzτ i zτ j k
2 2
The similarity of driving behavior between the driver Q9 and QR at the time slot S T9,R
U = cos(Y9 U, YR U)
using descriptive statistics of various historical driving operations Graphical regularization: if a spatial item i and a spatial item j are similar at time T, the representation Zi and Zj are similar; punished otherwise.
43
Graphs Series G1 G2 G3 Gm
Peer Dependence
Temporal Dependence
X2 X3 Xm Embedding Vectors
min 1 2 X
τ∈T
{ X
ui∈U(n)
k(xτ
i ˆ
xτ
i )k
2 2 + α · Hc(Gτ)}
Structural reservation: the representation that is encoded from input can be decoded to recover input Temporal dependency: current embedding is related to past embedding Peer dependency: the similar graph streams from two similar drivers share similar representations
Model Structure
44
transition graphs
score driving performances and detect risky areas
45
Square Error
0.4 0.6 0.8 1.0 1.2Auto−Encoder DeepWAalk CNN LINE DSV PTARL
@5 @10 @15 @20NDCG@N
0.0 0.2 0.4 0.6 0.8 1.0 1.2Auto−Encoder DeepWAalk CNN LINE DSV PTARL R2
−1.0 −0.5 0.0 0.5Auto−Encoder DeepWAalk CNN LINE DSV PTARL Tau
−0.1 0.0 0.1 0.2 0.3 0.4Auto−Encoder DeepWAalk CNN LINE DSV PTARL
representing driving behavior Apply the learned representations to predict driving scores
¨
Data
volunteer drivers)
¨
Evaluation Metrics
(NDCG@N)
¨
Baselines
to learn latent representations
network structures with an edge- sampling algorithm
transportation approach
Frequency
0.4 0.5 0.6 0.7 20 40 60 80 100 120Driving Score Distribution
46
Square Error
0.2 0.4 0.6 0.8 1.0Auto−Encoder PTARL−peer PTARL−temporal PTARL
@5 @10 @15 @20NDCG@N
0.0 0.2 0.4 0.6 0.8 1.0 1.2Auto−Encoder PTARL−peer PTARL−temporal PTARL Tau
−0.1 0.0 0.1 0.2 0.3 0.4Auto−Encoder PTARL−peer PTARL−temporal PTARL R2
−1.0 −0.5 0.0 0.5Auto−Encoder PTARL−peer PTARL−temporal PTARL
PTARL: -Our model ¨ Two variants of our model
the peer dependency.
considers the temporal dependency.
than the peer dependency
47
5 10 15 20 0.0 0.2 0.4 0.6 0.8 1.0 1.2
Time Score
Riskier Driver Safer Driver
A “Safer Driver” is not always safe A “Riskier Driver” is not always risky Scores of the “Safer Driver” are relatively higher at most time, while the scores of the “Riskier Driver” are relatively lower at most time
48
Dynamic evolution of the distribution
¨ Task
¨ Modeling
representation learning approach
dependency, and peer dependency
¨ Application
risk area detection
49
50
¨ Background and Motivation ¨ Deep Collective Representation Learning ¨ Deep Dynamic Representation Learning ¨ Deep Structured Representation Learning ¨ Conclusion and Future Work
Less Matches Between Human and Technologies
51
Non-personalized news feeds
education
Motivation Application: Precision User Profiling
52
Webpage = Contents + Structure User = Explicit Activities + Latent Behavioral Structure ?
53
Spatial-temporal asymmetric transition patterns User Activity Graph
Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department
Problem Reformulation: Representation Learning with Activity Graphs
54
z
to map the user to a profile vector
Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department
User Profile Vector
¨ Global structures: how a user’ activities globally
interact with each other (strongly link, weakly link, no link)
55
Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department
Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department
56
Substructure 1: high-frequency discrete nodes Substructure 2: high- frequency circle
¨ Substructures: topology of subgraphs that feature
the unique behavioral patterns of a user’s activities
Representation Learning with Behavioral Global and Substructure Preservation
¨ Traditional solution: global structure (encoding-
decoding) + substructure (loss regularization)
57
graph
reconstruct substructures
Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department58
Walmart Hospital Costco Gas Station PreSchool Gym Home Zoo Sixflag Regal Cinemas T J Maxx ALDI Farmer Market PetSmart Pharmacy Library
different substructure topology and contents
Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department
dynamically distributed in different locations of graphs
¨ Translate substructure-aware representation
learning into an adversarial substructured learning problem
59
Encoder Decoder
Substructure Detector Discriminator
+
Fake 1 Real Fake
Substructure Detector
Substructure Substructure
An encoder-decoder network: learn the representations of a graph A substructure detector: detect substructure patterns A discriminator: classify
reconstructed substructure Adversarial training: to match original substructures with reconstructed substructures
60
Encoder Decoder
Substructure Detector Discriminator
+
Fake 1 Real Fake
Substructure Detector
Substructure Substructure
not differentiable
deep first search based subgraph detection
How to Approximate Substructure Detector?
61
Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT DepartmentOffice PreSchool Home
Non-differentiable substructure detection algorithm Differentiable CNN Latent embedding Substructure
Approximated Adversarial Substructured Learning
62 Encoder Decoder
Substructure Detector Discriminator
+
Fake 1 Real Fake
Substructure Detector
Substructure Substructure
The Mini-Max Game in Optimization
and generated substructures
correctly classify generated substructures
CNN-based detector: detect and output a substructure feature vector
63
Objective Function
reconstruction loss
to confuse discriminator
Discriminator accuracy Train G to minimize D’s accuracy on generated substructures Reconstruction loss
discriminator to maximize accuracy
Classify ground true substructure to 1 Classify generated substructure to 0 Classify generated substructure to 0 Minimize likelihood Maximize likelihood
64
65
User Representation Next Activity Recommendation User Profiling User Adversarial Substructured Learning
…… Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department Encoder Decoder
Substructure Detector Discriminator
+
Fake 1 Real Fake
Substructure Detector
Substructure Substructure
User Activity Graph
corresponding user activity graph
category
¨ Data
¨ Evaluation Metrics
category prediction
activity recommendation
¨ Baselines
random walks to learn latent representations
global network structures with an edge-sampling algorithm
Network
Performance Comparisons on New York and Tokyo Activity Checkin Data
66
user profiling
behavior patterns Apply the learned representations to predict next activity type (next POI category)
67
nodes and circles
consider node based substructure
consider circle based substructure
¨ Task
structure preservation
¨ Modeling
max game
¨ Application
personalization and recommender systems
68