Meng Jiang (UIUC), Christos Faloutsos (CMU), Jiawei Han (UIUC)
C ATCH T ARTAN : Representing and Summarizing Dynamic - - PowerPoint PPT Presentation
C ATCH T ARTAN : Representing and Summarizing Dynamic - - PowerPoint PPT Presentation
C ATCH T ARTAN : Representing and Summarizing Dynamic Multicontextual Behaviors Meng Jiang (UIUC), Christos Faloutsos (CMU), Jiawei Han (UIUC) 2 What is Tartan? Visited CMU in 2012-13 Watched lots of Tartans games 3 What is Behavior?
What is Tartan?
2
Visited CMU in 2012-13 Watched lots of Tartans’ games…
What is Behavior? Is it valuable?
vTweeting behavior vPublishing-paper behavior
3
Behavior: interactions made by individuals or organisms in conjunction with themselves or their environment. (Wikipedia)
20:03:09 @ebekahwsm this better be the best halftime show ever in the history of halftimes shows. ever. #SuperBowl 2009 P . Melville, W. Gryc, R. Lawrence, “Sentiment analysis of blogs by combining lexical knowledge with text classification”, KDD’09. Refs: p81623, p84395…
Q: What can we discover from behavioral data?
- Ex. Given every phone call / message between the military
leaders, scientists, businesspersons, Find …
Why We Talk about Behavior Today?
4
Physical Environment Online Environment
The human behaviors are broadly and deeply recorded in an unprecedented level.
This is the first time that we can get insights of human behaviors and the society from large scale real data.
Representing and Summarizing Behavior
5
Understanding Predicting Intervening Representing Summarizing
Raw data to Math Patterns: trends, events, campaigns… Factors underlying the patterns: influence, intentions… What will happen in the future? Recommendation, spam/fraud detection…
Given the behavioral data (e.g., DBLP data, tweets) Return behavioral summaries (e.g., research trends, events)
6
2009 P. Melville, W. Gryc, R. Lawrence, “Sentiment analysis of blogs by combining lexical knowledge with text classification”, KDD’09. Refs: p81623, p84395…
Behaviors: Dynamic and Multicontextual
vTweeting behavior
7
20:03:09 @ebekahwsm this better be the best halftime show ever in the history of halftimes shows.
- ever. #SuperBowl
t u p p p h
Dynamic One-guaranteed value Empty (set
- f) value
Set value Contextual factors: Empty (set
- f) value
Behaviors: Dynamic and Multicontextual
vPublishing-paper
behavior
8
Dynamic Set value One-guaranteed value Set value Contextual factors:
2009 P . Melville, W. Gryc, R. Lawrence, “Sentiment analysis of blogs by combining lexical knowledge with text classification”, KDD’09. Refs: p81623, p84395…
t a p p p v a a c c
Set value
Summarizing Behaviors
vDynamic: taking a set of consecutive time slices vMulticontextual: taking a set of dimensions and a
set of dimensional values in each dimension
9
Tensor Fails
vTensor - modeling
multidimensions: FEMA
(KDD’14), CrossSpot (ICDM’15)
vRepresentation:
(multicontextual)
vEmpty values?
10
Bob http://XXX.YYY
∅ ∅ ∅
Tensor Fails (cont.)
vTensor - modeling
multidimensions: FEMA
(KDD’14), CrossSpot (ICDM’15)
vSummarization:
(dynamic)
vTemporal patterns?
11
Bob http://XXX.YYY
t10 t11 t24 t25 t26 t31 t37
Our Representations for Behavior and Behavioral Summary
vBehavior: “Two-level matrix” vBehavioral summary: “Tartan”
12
The Problem of Behavioral Summarization
13
CATCHTARTAN
vEmploying a lossless encoding scheme
vThe Minimum Description Length (MDL) principle vEstimating the number of bits that encoding the Tartan
can save from merging the meaningful pattern into the encoding of the data
14
Objective Function to Maximize
15
Tartan Data First-level matrix Individual entries
Objective Function to Maximize (cont.)
16
Encoding the Tartan: Dimensions
17
Encoding the Tartan: Dimensional Values
18
Encoding the Tartan: Time Slices
19
Encoding the Tartan: Behaviors
20
Encoding the Tartan: Entries
21
Greedy Search for the Local Minimum
22
Time complexity:
Qualitative Analysis: DBLP data
23
Qualitative Analysis: Super Bowl 2013
24
Quantitative Analysis: Accuracy and Efficiency in Synthetic Experiments
vTartan distribution vData distribution
25
26
Summary
vNovel representations
vBehavior: “two-level matrix” vs. tensor vBehavioral summary: “Tartan” vs. dense block
vA new summarization algorithm
vPrincipled-scoring and Parameter-free: Objective
function based on Minimum Description Length
vScalable: Greedy search for local optimum
vEffectiveness, discovery and efficiency
27
THANK YOU!
CatchTartan: Representing and Summarizing Dynamic Multicontextual Behaviors www.meng-jiang.com
28
The Distributions in Real Data
29
Qualitative Analysis: Grammy 2013
30
Convergence
vSynthetic test vReal-data test
31
Efficiency and Tartan Distributions
32
Things Related with “Two-Level”Matrix
vTime-evolving heterogeneous networks
vBipartite one-to-many graph
33
Author Venue Papers t1
1 1 1 1 1 ... … … …
t2
1 1 1 1 1 1 … … … … p1 p2 … p3 p4 … a1 a2 v1 v2 a1 a2 p1 p2 v1 v2 p3 p4
t2 t1
Things Related with “Two-Level”Matrix (cont.)
vTime-evolving heterogeneous networks
vRelationships
34
Author Paper Venue Relationsihps t1
1 1 1 1 ... … ... … … …
t2
… … … … … … a1 a2 p1 p2 v1 v2 a1 a2 p1 p2 v1 v2 p3 p4
t2 t1
Things Related with “Two-Level”Matrix (cont.)
vThe Meta-Path similarity metric
35
Author Venue Papers t1
1 1 1 1 1 ... … … … p1 p2 … a1 a2 p1 p2 v1 v2 p3 p4 Author – Paper – Venue – Paper – Author Author – Paper – Author
Author Venue Papers t1
1 1 1 1 1 ... … … …
t2
1 1 1 1 1 1 … … … … p1 p2 … p3 p4 … a1 a2 v1 v2