Toward Automated Pattern Discovery: Deep Representation Learning - - PowerPoint PPT Presentation

toward automated pattern discovery deep representation
SMART_READER_LITE
LIVE PREVIEW

Toward Automated Pattern Discovery: Deep Representation Learning - - PowerPoint PPT Presentation

Toward Automated Pattern Discovery: Deep Representation Learning with Spatial-Temporal-Networked Data Collective, Dynamic, and Structured Analysis Yanjie Fu Outline 5 Background and Motivation Collective Representation Learning


slide-1
SLIDE 1

Yanjie Fu

Toward Automated Pattern Discovery: Deep Representation Learning with Spatial-Temporal-Networked Data

—Collective, Dynamic, and Structured Analysis

slide-2
SLIDE 2

5

Outline

¨ Background and Motivation

¨ Collective Representation Learning ¨ Dynamic Representation Learning ¨ Structured Representation Learning ¨ Conclusions and Future Work

slide-3
SLIDE 3

Human-Social-Technologic Systems

6

IoT, GPS, wireless sensors, mobile Apps

Cyber World Physical World

slide-4
SLIDE 4

Human Activities in Human-Social- Technologic Systems

¨ Spatial, Temporal, and Networked (STN) data can be

  • Spatial: Point-of-Interests, blocks, zones, regions
  • Spatiotemporal: Taxi trajectories, bus trips, bike traces
  • Spatiotemporal-networked: Geo-tagged twitter posts, power

grid netload

¨ from a variety of sources

  • Devices: phones, WIFIs, network stations, RFID
  • Vehicles: bikes, taxicabs, buses, subways, light-rails
  • Location based services: geo-tweets (Facebook, Twitter), geo-

tagged photos (Flickr), check-ins (Foursquare, Yelp)

7

Taxicab GPS Traces Phone Traces Mobile Check-ins

Represent the spatial, temporal, social, and semantic contexts of dynamic human/systems behaviors within and across regions

Bus Traces

slide-5
SLIDE 5

Important Applications

8

User Profiling & Recommendation Systems Intelligent Transportation Systems Personalized and Intelligent Education Smart Heath Care City Governance and Emergency Management Solar Analytics for Energy Saving

slide-6
SLIDE 6

Unprecedented and Unique Complexity

¨ Spatiotemporallly non-i.i.d.

¨ Spatial autocorrelation ¨ Spatial heterogeneity ¨ Sequential asymmetric patterns ¨ Temporal periodicity and dependency

9

Spatial autocorrelations Temporal periodical patterns Sequential asymmetric transitions Spatial heterogeneity

slide-7
SLIDE 7

Unprecedented and Unique Complexity

10

¨ Networked over time

¨ Collectively-related

¨ Heterogeneous

¨ Multi-source ¨ Multi-view ¨ Multi-modality

¨ Semantically-rich

¨ Trajectory semantics ¨ User semantics ¨ Event semantics ¨ Region semantics

slide-8
SLIDE 8

Technical Pains in Pattern Discovery (1)

¨ Feature identification and quantification

  • Traditional method: Find domain experts to hand-craft

features

  • Can we automate feature/pattern extraction?

11

Car Not car Input Pattern/Feature extraction Classification / Clustering Output Classic Machine learning

slide-9
SLIDE 9

Technical Pains in Pattern Discovery (2)

¨ Multi-source unbalanced data fusion

  • Traditional method: Extract features, weigh features,

weighted combination

  • Can we automatically extract features from multi-source

unbalanced data?

12

Car Not car Input Pattern/Feature extraction Classification / Clustering Output Classic Machine learning

slide-10
SLIDE 10

Technical Pains in Pattern Discovery (3)

¨ Field data/real-world systems are usually lack of

benchmark labels (i.e., y, responses, targets)

  • Example: Netload in power grids: behind-the-meter gas-generated

electricity and solar-generated electricity are unknown

  • Can we learn features without labels (unsupervised)?

13

Car Not car Input Pattern/Feature extraction Classification / Clustering Output Classic Machine learning

slide-11
SLIDE 11

Deep Learning Can Help

14

Car Not car Feature extraction + Classification/Clustering Input Output Task-specific (End to End) Deep Learning Car Not car Input Unsupervised Pattern (Feature / Representation ) Learning Classification /Clustering Output Generic Deep Learning Automated feature learning Feature learning from multi-source data Lack of labels

slide-12
SLIDE 12

Technical Pains in Pattern Discovery (4)

¨ Classic algorithms are not directly available in

spatiotemporal networked data

  • Traditional method: revised classic algorithms + spatiotemporal

networked data regularities

n Regression + spatial properties = spatial autoregression method n Clustering + spatial properties = spatial co-location method

  • Can we learn features while maintaining the regularities of

spatiotemporal networked data?

15

Car Not car Input Pattern/Feature extraction Classification / Clustering Output Classic Machine learning

slide-13
SLIDE 13

Data Regularity-aware Unsupervised Representation Learning

16

Human and system behaviors have spatiotemporally socially regularities Data Regularity- aware representation learning

  • Lack of labels (unsupervised)
  • Multi-source multi-view multi-modality
  • Spatial autocorrelation (peer)
  • Spatial heterogeneity (clustering)
  • Temporal dependencies (current-past)
  • Periodical patterns
  • Sequential asymmetric transition
  • Spatial hierarchy (hierarchical clustering)
  • Hidden semantics
  • Spatial locality
  • Global and sub structural patterns in behavioral graphs

Regularities of spatiotemporal networked data

Car Not car Generic Deep Learning Automated feature learning Feature learning from multi-source data Lack of labels Data regularities

slide-14
SLIDE 14

The Overview of The Talk

17

Dynamic Learning Collective Learning Structured Learning

Collective representation learning with multi-view data Dynamic representation learning with stream data Structured representation learning with global and sub structure preservation

Automated Feature Learning from Spatial-Temporal-Networked Data

slide-15
SLIDE 15

18

Outline

¨ Background and Motivation

¨ Deep Collective Representation

Learning

¨ Deep Dynamic Representation Learning ¨ Deep Structured Representation Learning ¨ Conclusion and Future Work

slide-16
SLIDE 16

19

The Rising of Vibrant Communities

¨ Consumer City Theory, Edward L. Glaeser (2001), Harvard

University.

¨ More by Nathan Schiff (2014), University of British Columbia. Victor

Coutour (2014), UC Berkeley. Yan Song (2014), UNC Chapel Hill.

¨ Spatial Characters: walkable, dense, compact, diverse, accessible,

connected, mixed-use, etc.

¨ Socio-economic Characters: willingness to pay, intensive social

interactions, attract talented workers and cutting-edge firms, etc.

What are the underlying driving forces of a vibrant community?

Supported by NSF CISE pre-Career award (III- 1755946)

slide-17
SLIDE 17

20

¨ Mobile checkin data ¨ Frequency and diversity of mobile checkins

  • Frequency: fre = # &ℎ(&)*+
  • Diversity: div = − ∑1234

#(674689:,1234) #(674689:)

=>? #(674689:,1234)

#(674689:)

, where type denotes the activity type of mobile users

¨ Fused scoring

  • @*ABC+&D = (1 + GH)

IJ4∗L9M (NO∗IJ4PL9M)

  • G controls the weights of fre and div
  • Power-law distributed
  • Some are highly vibrant while most are somewhat vibrant

Measuring Community Vibrancy

Community rankings Vibrancy Score Shopping Transport Dinning Travel Lodging

Urban vibrancy is reflected by the frequency and diversity of user activities.

slide-18
SLIDE 18

Spatial Unbalance of Urban Community Vibrancy

21

slide-19
SLIDE 19

Motivation Application: How to Quantify Spatial Configurations and Social Interactions

22

Urban Community =Spatial Configuration + Social Interactions

Static Element Dynamic Element

slide-20
SLIDE 20

From Regions to Graphs

¨ POIs à nodes ¨ Human mobility

connectivity between two POIs à edge weights

¨ Edge weights are

asymmetric

10

Spatial Regions as Human Mobility Graphs

slide-21
SLIDE 21

Periodicity of Human Mobility

¨ Different days-hours à different periodic mobility

patterns à different graph structures

24

slide-22
SLIDE 22

Collective Representation Learning with Multi-view Graphs

12

f( , ) =

Multiple Graphs Feature Vector Representations Spatial Objects (e.g., Regions) Constraint: the multi-view graphs are collaboratively related

slide-23
SLIDE 23

Solving Single-Graph Input

¨ The encoding-decoding representation learning paradigm

  • Encoder: compress a graph into a latent feature vector
  • Decoder: reconstruct the graph based on the latent feature vector
  • Objective: minimizing the difference between original and

reconstructed graphs

26

  • Unsupervised (label-free): doesn’t require labels
  • Generic: not specific for single application
  • Intuitive: a good representation can be used to reconstruct original signals
input matrix D d1 d2 dN

x x

y y

z

input matrix D d1 d2 dN
slide-24
SLIDE 24

Solving Multi-graph Inputs: An Ensemble-Encoding Dissemble-Decoding Method

27

NN as an input unit

  • f encoder

NN as an output unit of decoder

signal ensemble (Multi-perceptron summation ) signal dissemble (Multi-perceptron filtering ) Minimize reconstruction loss

slide-25
SLIDE 25

Solving the Optimization Problem

28

8 > > > > < > > > > : y(k),1

i,t

= σ(W(k),1

i,t

p(k)

i,t + b(k),1 i,t

), ∀t ∈ {1, 2, · · · , 7}, y(k),r

i,t

= σ(W(k),r

i,t

p(k)

i,t + b(k),r i,t

), ∀r ∈ {2, 3, · · · , o}, y(k),o+1

i

= σ(P

t W(k),o+1 t

y(k),o

i,t

+ b(k),o+1

t

), z(k)

i

= σ(W(k),o+2y(k),o+1

i

+ b(k),o+2),

           ˆ y(k),o+1

i

= σ( ˆ W(k),o+2z(k)

i

+ ˆ b(k),o+2), ˆ y(k),o

i,t

= σ( ˆ W(k),o+1

t

ˆ y(k),o+1

i

+ ˆ b(k),o+1

t

), ˆ y(k),r−1

i,t

= σ( ˆ W(k),r

i,t

ˆ y(k),r

i,t

+ ˆ b(k),r

i,t

), ∀r ∈ {2, 3, · · · , o}, ˆ p(k)

i,t

= σ( ˆ W(k),1

i,t

ˆ y(k),1

i,t

+ ˆ b(k),1

i,t

),

L(k) = X

t∈{1,2,...,7}

X

i

k(p(k)

i,t ˆ

p(k)

i,t ) v(k) i,t k2 2

  • 1. Multi-graph

Ensemble Encoding

  • 3. Objective

Function

  • 2. Multi-graph

Dissemble Decoding

Ensemble multi-graphs Dissemble multi-graphs Sparsity regularization: If mobility connectivity = 0, weight=1 to penalize the loss If mobility connectivity >0, weight>1 Reconstruction loss

slide-26
SLIDE 26

Comparisons with Features Generated By Different Methods

29

¨

Data

  • Beijing Checkin Data

¨

Ranking Models

  • MART: it is a boosted tree ranking model
  • RankBoost (RB): it is a boosted pairwise ranking

method, which trains multiple weak rankers and combines their outputs as final ranking.

  • RankNet (RN): it uses a neural network to model

the underlying probabilistic cost function.

¨

Feature Sets

  • Explicit Features (EF)
  • Latent features (LF)
  • Explicit & Latent features (ELF)
  • Features generated by variation 1 of our method:

distance graphs not mobility graphs

  • Features generated by variation 2 of our method:

average not collective

  • Features generated by variation 3 of our method:

non-weighted not unsupervised weighted.

¨

Evaluation Criteria

  • NDCG: Evaluate the ranking performance at Top N

@5 @10 @15 @20 NDCG

0.0 0.2 0.4 0.6 0.8 1.0 1.2

ELF−MART LF−MART EF−MART V−1−MART V−2−MART V−3−MART ELF−RN LF−RN EF−RN V−1−RN V−2−RN V−3−RN ELF−RB LF−RB EF−RB V−1−RB V−2−RB V−3−RB

slide-27
SLIDE 27

Comparison with Baseline Representation Learning Algorithms

30

@5 @10 @15 @20 NDCG

0.0 0.2 0.4 0.6 0.8 1.0 1.2 Our Model NMF RBM Skip−gram

@5 @10 @15 @20 NDCG

0.0 0.2 0.4 0.6 0.8 1.0 1.2 Our Model NMF RBM Skip−gram

@5 @10 @15 @20 NDCG

0.0 0.2 0.4 0.6 0.8 1.0 1.2 Our Model NMF RBM Skip−gram

NDCG@N comparisons over LambdaMART NDCG@N comparisons over ListNe NDCG@N comparisons over MART NDCG@N comparisons over RankBoost

¨

Ranking Models

  • LAMBDAMART
  • ListNet
  • MART
  • RankBoost

¨

Baseline Methods

  • RBM: restricted

Boltzmann machine

  • NMF: non-negative matrix

factorization

  • Skip-gram

¨

Evaluation Criteria

  • NDCG: Evaluate the

ranking performance at Top N

slide-28
SLIDE 28

Summary

¨ Task

  • Collective representation learning with multi-view graphs

¨ Modeling

  • Develop an ensemble-dissemble encoding-decoding

approach

  • multi-graph ensemble encoding and multi-graph dissemble

decoding

¨ Application

  • Quantifying urban communities for understanding urban

vibrancy

31

slide-29
SLIDE 29

32

Outline

¨ Background and Motivation ¨ Collective Representation Learning ¨ Dynamic Representation Learning ¨ Structured Representation Learning ¨ Conclusion and Future Work

slide-30
SLIDE 30

Social Fairness in Insurance Sector

33

What can we do to defend social fairness on insurance rates?

slide-31
SLIDE 31

Space Router

< t1, lat1, lon1 >

< t2,lat2,lon2 >

< t

3

, l a t

3

, l

  • n

3

> < t4, lat4, lon4 > < t5, lat5, lon5 >

Turn Left Turn Right Accelerate

Motivation Application: Machine-Learning Based Driving Behavior Analysis

34

Driving Behavior Analysis Insurance Companies

slide-32
SLIDE 32

Defining Driving Operations & States

35

¨ Driving Operations

  • Speed-related:

acceleration, deceleration, constant speed

  • Direction-related:

Turning right, left, moving straight

¨ Driving States

  • Definition: speed operation + direction operation

Speed Operation Acceleration Deceleration Constant Speed Direction Operation Turning Right Turning Left Moving Straight +

slide-33
SLIDE 33

Quantifying Driving Habits with Driving State Transition Graphs

36

Left turn + deceleration Right turn + deceleration Left turn + deceleration Straight+ Acceleration Driving State

Transition Duration Transition Duration View

Driving State

Transition Probability Transition Frequency View

Driving style & habit patterns

slide-34
SLIDE 34

Driving State Transition Graph Sequence

37

…… ……

t=1 T=2 t=T

Driving State Driving State

Transition Frequency: 0.4 Transition Duration: 1 minuets

  • Transition frequency: how frequently a driver changes his/her driving state

from one to another (unusual high-frequency: drunk?)

  • Transition duration: how quickly a driver changes his/her driving state

from one to another (unusually fast: non-comfortable driving habits)

slide-35
SLIDE 35

Dynamic Representation Learning with Graph Stream

38

f( ) =

  • Map a sequence of time-varying yet relational graphs to a

sequence of time-varying yet relational vectors

  • s. t. spatial and temporal dependencies
slide-36
SLIDE 36

¨ Structural Reservation

  • If two graphs’ structures are similar, their feature vectors

are similar

¨ Temporal Dependency

  • Current driving operations are related to previous driving
  • perations

¨ Peer Dependency

  • Drivers with similar driving behaviors should share similar

feature vectors

39

Three Modeling Constraints

slide-37
SLIDE 37

Modeling Structural Reservation

¨ Structural Reservation: Minimizing reconstruction loss

40

     y1

i

= σ(W1xi + b1), yk

i

= σ(Wkyk−1

i

+ bk), ∀k ∈ {2, 3, · · · , o}, zi = σ(Wo+1yo

i + bo+1).

     ˆ yo

i

= σ( ˆ Wo+1zi + ˆ bo+1), ˆ yk−1

i

= σ( ˆ Wkˆ yk

i + ˆ

bk), ∀k ∈ {2, 3, · · · , o}, ˆ xi = σ( ˆ W1ˆ y1

i + ˆ

b1).

… … … … … … …

x ˆ x

ˆ y1 y1

z

input matrix D d1 d2 dN

Input vector Encoded vector Embedding Embedding Decoded vector Input vector

The encoding phrase: encode input vector into embedding; The decoding phrase: decode the embedding to recover input.

Learned Representation

slide-38
SLIDE 38

Modeling Temporal Dependency

¨ Temporal Dependency: Current driving operations are

related to previous driving operations

41

         #Sequential Encode Step (y1

i )τ

= σ(W1xτ

i + b1),

(yk

i )τ

= σ(Wk(yk−1

i

)τ + bk), ∀k ∈ {2, 3, · · · , o}, zτ

i

= (1 − cτ)zτ−1

i

+ cτ˜ zτ

i .

         #Sequential Decode Step (ˆ yo

i )τ

= σ( ˆ Wo+1zτ

i + ˆ

bo+1), (ˆ yk−1

i

)τ = σ( ˆ Wk(ˆ yk

i )τ + ˆ

bk), ∀k ∈ {2, 3, · · · , o}, ˆ xτ

i

= σ( ˆ W1(ˆ y1

i )τ + ˆ

b1).

Feed the output of Autoencoder’s hidden layer into Gated Recurrent Unit

Current hidden layer depends on previous hidden layer

slide-39
SLIDE 39

Modeling Peer Dependency

¨ Peer Dependency: Drivers with similar driving behaviors

should share similar latent representations

42

similar similar Transition Graph Representation learning learning Hc(Gτ) = X

ui2U

X

uj2U,ui6=uj

i,j · kzτ i zτ j k

2 2

The similarity of driving behavior between the driver Q9 and QR at the time slot S T9,R

U = cos(Y9 U, YR U)

using descriptive statistics of various historical driving operations Graphical regularization: if a spatial item i and a spatial item j are similar at time T, the representation Zi and Zj are similar; punished otherwise.

slide-40
SLIDE 40

A Joint Optimization Objective

43

  • Multi-spatial

Graphs Series G1 G2 G3 Gm

Peer Dependence

Temporal Dependence

  • X1

X2 X3 Xm Embedding Vectors

min 1 2 X

τ∈T

{ X

ui∈U(n)

k(xτ

i ˆ

i )k

2 2 + α · Hc(Gτ)}

Structural reservation: the representation that is encoded from input can be decoded to recover input Temporal dependency: current embedding is related to past embedding Peer dependency: the similar graph streams from two similar drivers share similar representations

Model Structure

slide-41
SLIDE 41

Applications: Driving Performance Scoring and Risky Area Detection

44

  • 1. Learn driving behavior profiles from driving state

transition graphs

  • 2. Use driving behavior profiles to automatically

score driving performances and detect risky areas

slide-42
SLIDE 42

Comparison with Baseline Methods

45

Square Error

0.4 0.6 0.8 1.0 1.2

Auto−Encoder DeepWAalk CNN LINE DSV PTARL

@5 @10 @15 @20

NDCG@N

0.0 0.2 0.4 0.6 0.8 1.0 1.2

Auto−Encoder DeepWAalk CNN LINE DSV PTARL R2

−1.0 −0.5 0.0 0.5

Auto−Encoder DeepWAalk CNN LINE DSV PTARL Tau

−0.1 0.0 0.1 0.2 0.3 0.4

Auto−Encoder DeepWAalk CNN LINE DSV PTARL

  • Our model achieves the best performances
  • Peer and temporal dependencies are essential for

representing driving behavior Apply the learned representations to predict driving scores

¨

Data

  • T-drive (Beijing GPS trajectories of

volunteer drivers)

¨

Evaluation Metrics

  • Square Error
  • Coefficient of Determination (ZH)
  • Normalized Discounted Cumulative Gain

(NDCG@N)

  • Kendall Tau Coefficient (Tau)

¨

Baselines

  • Autoencoder
  • DeepWalk: use truncated random walks

to learn latent representations

  • LINE: preserve both local and global

network structures with an edge- sampling algorithm

  • CNN: Convolutional Neural Network
  • Driving State Vector (DSV) – a traditional

transportation approach

  • PTARL—Our model

Frequency

0.4 0.5 0.6 0.7 20 40 60 80 100 120

Driving Score Distribution

slide-43
SLIDE 43

Study of Peer and Temporal Dependencies

46

Square Error

0.2 0.4 0.6 0.8 1.0

Auto−Encoder PTARL−peer PTARL−temporal PTARL

@5 @10 @15 @20

NDCG@N

0.0 0.2 0.4 0.6 0.8 1.0 1.2

Auto−Encoder PTARL−peer PTARL−temporal PTARL Tau

−0.1 0.0 0.1 0.2 0.3 0.4

Auto−Encoder PTARL−peer PTARL−temporal PTARL R2

−1.0 −0.5 0.0 0.5

Auto−Encoder PTARL−peer PTARL−temporal PTARL

PTARL: -Our model ¨ Two variants of our model

  • PTARL-peer that only considers

the peer dependency.

  • PTARL-temporal that only

considers the temporal dependency.

  • The Autoencoder that ignores both dependencies performs the worst
  • The temporal dependency is more significant in profiling driving behavior

than the peer dependency

slide-44
SLIDE 44

Historical Assessment of Driving Scores

47

5 10 15 20 0.0 0.2 0.4 0.6 0.8 1.0 1.2

Time Score

Riskier Driver Safer Driver

A “Safer Driver” is not always safe A “Riskier Driver” is not always risky Scores of the “Safer Driver” are relatively higher at most time, while the scores of the “Riskier Driver” are relatively lower at most time

slide-45
SLIDE 45

Risky Area Detection

48

Dynamic evolution of the distribution

  • f risky areas in 12 hours
slide-46
SLIDE 46

Summary

¨ Task

  • Dynamic representation learning with graph streams

¨ Modeling

  • Develop a temporal and peer-aware dynamic

representation learning approach

  • Robustness checks over structural preservation, temporal

dependency, and peer dependency

¨ Application

  • Driving behavior analysis for inferring driving scores and

risk area detection

49

slide-47
SLIDE 47

50

Outline

¨ Background and Motivation ¨ Deep Collective Representation Learning ¨ Deep Dynamic Representation Learning ¨ Deep Structured Representation Learning ¨ Conclusion and Future Work

slide-48
SLIDE 48

Less Matches Between Human and Technologies

51

Non-personalized news feeds

  • Non personalized

education

What can we do to improve user performance and engagement in human- technological systems?

slide-49
SLIDE 49

Motivation Application: Precision User Profiling

52

Webpage = Contents + Structure User = Explicit Activities + Latent Behavioral Structure ?

slide-50
SLIDE 50

From Users To Activity Graphs

53

Spatial-temporal asymmetric transition patterns User Activity Graph

Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department

slide-51
SLIDE 51

Problem Reformulation: Representation Learning with Activity Graphs

54

f( , ) =

z

  • Given a user and corresponding user activity graph, we aim

to map the user to a profile vector

Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department

User Profile Vector

slide-52
SLIDE 52

Global Behavioral Patterns

¨ Global structures: how a user’ activities globally

interact with each other (strongly link, weakly link, no link)

55

Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department

slide-53
SLIDE 53

Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department

Substructure Behavioral Patterns

56

Substructure 1: high-frequency discrete nodes Substructure 2: high- frequency circle

¨ Substructures: topology of subgraphs that feature

the unique behavioral patterns of a user’s activities

slide-54
SLIDE 54

Representation Learning with Behavioral Global and Substructure Preservation

¨ Traditional solution: global structure (encoding-

decoding) + substructure (loss regularization)

57

  • Global structure:
  • Minimize the loss between the input graph and the reconstructed

graph

  • Substructure preservation:
  • Strongly penalize the loss if the model cannot accurately

reconstruct substructures

Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department
slide-55
SLIDE 55

Will The Traditional Solution Work?

58

Walmart Hospital Costco Gas Station PreSchool Gym Home Zoo Sixflag Regal Cinemas T J Maxx ALDI Farmer Market PetSmart Pharmacy Library

  • Different users show

different substructure topology and contents

Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department

  • Substructure are

dynamically distributed in different locations of graphs

slide-56
SLIDE 56

Adversarial Substructured Learning

¨ Translate substructure-aware representation

learning into an adversarial substructured learning problem

59

Encoder Decoder

Substructure Detector Discriminator

+

  • Real

Fake 1 Real Fake

Substructure Detector

Substructure Substructure

An encoder-decoder network: learn the representations of a graph A substructure detector: detect substructure patterns A discriminator: classify

  • riginal substructure and

reconstructed substructure Adversarial training: to match original substructures with reconstructed substructures

slide-57
SLIDE 57

Will The New Formulation Work?

60

Encoder Decoder

Substructure Detector Discriminator

+

  • Real

Fake 1 Real Fake

Substructure Detector

Substructure Substructure

  • Traditional subgraph detection algorithms are usually

not differentiable

  • Impossible to backpropagate gradience for optimization

deep first search based subgraph detection

slide-58
SLIDE 58

How to Approximate Substructure Detector?

61

Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department

Office PreSchool Home

  • Use CNN to replace substructure detection algorithms
  • Use an embedding vector to replace a subgraph

Non-differentiable substructure detection algorithm Differentiable CNN Latent embedding Substructure

slide-59
SLIDE 59

Approximated Adversarial Substructured Learning

62 Encoder Decoder

Substructure Detector Discriminator

+

  • Real

Fake 1 Real Fake

Substructure Detector

Substructure Substructure

The Mini-Max Game in Optimization

  • Discriminator: is trained to maximize the accuracy of classifying detected

and generated substructures

  • Generator: is trained to minimizing the probability that Discriminator

correctly classify generated substructures

CNN-based detector: detect and output a substructure feature vector

slide-60
SLIDE 60

Solving The Min-Max Game

63

  • 1. Minimizing

Objective Function

  • 4. Minimize

reconstruction loss

  • 3. Update generator

to confuse discriminator

Discriminator accuracy Train G to minimize D’s accuracy on generated substructures Reconstruction loss

  • 2. Update

discriminator to maximize accuracy

Classify ground true substructure to 1 Classify generated substructure to 0 Classify generated substructure to 0 Minimize likelihood Maximize likelihood

slide-61
SLIDE 61

Recap: Training and Testing of Adversarial Substructured Learning

64

slide-62
SLIDE 62

What To Do Next: Inferring Next Activity for POI Recommendations

65

User Representation Next Activity Recommendation User Profiling User Adversarial Substructured Learning

…… Office Hospital Costco Gas Station MacDonald Walmart PreSchool Auto Service Home Zoo Sixflag PizzaHut Regal Cinemas Lab Library IT Department Encoder Decoder

Substructure Detector Discriminator

+

  • Real

Fake 1 Real Fake

Substructure Detector

Substructure Substructure

User Activity Graph

  • 1. Given a time period, learn a user’s profiles from

corresponding user activity graph

  • 2. Exploit user profiles to forecast next activity

category

slide-63
SLIDE 63

¨ Data

  • Mobile activity checkin data
  • f NYC and Tokyo

¨ Evaluation Metrics

  • The precision@N of activity

category prediction

  • The precision@N of new

activity recommendation

¨ Baselines

  • Autoencoder
  • DeepWalk: use truncated

random walks to learn latent representations

  • LINE: preserve both local and

global network structures with an edge-sampling algorithm

  • CNN: Convolutional Neural

Network

Performance Comparisons on New York and Tokyo Activity Checkin Data

66

  • Our model achieves the best performances on

user profiling

  • Substructures in a graph are essential for user

behavior patterns Apply the learned representations to predict next activity type (next POI category)

slide-64
SLIDE 64

Study of Node and Circle Substructures

67

  • StructRL: consider both

nodes and circles

  • StructRL-Node: only

consider node based substructure

  • StructRL-Circle: only

consider circle based substructure

slide-65
SLIDE 65

Summary

¨ Task

  • Structured representation learning with global and sub

structure preservation

¨ Modeling

  • Develop an adversarial substructured learning approach
  • Preserving global and sub structures via solving the mini-

max game

¨ Application

  • Precision user profiling and quantification for

personalization and recommender systems

68