Tensor Decompositions for ensor Decompositions for Big Multi-aspect - - PowerPoint PPT Presentation

tensor decompositions for ensor decompositions for big
SMART_READER_LITE
LIVE PREVIEW

Tensor Decompositions for ensor Decompositions for Big Multi-aspect - - PowerPoint PPT Presentation

Tensor Decompositions for ensor Decompositions for Big Multi-aspect Data Big Multi-aspect Data Analytics Analytics Evangelos Evangelos (V (Vagelis) agelis) Papalexakis Papalexakis UC Riverside Second W Second Workshop of Mission-Critical


slide-1
SLIDE 1

Tensor Decompositions for ensor Decompositions for Big Multi-aspect Data Big Multi-aspect Data Analytics Analytics

Evangelos Evangelos (V (Vagelis) agelis) Papalexakis Papalexakis UC Riverside

Second W Second Workshop of Mission-Critical Big Data Analytics (MCBDA 2017)

  • rkshop of Mission-Critical Big Data Analytics (MCBDA 2017)
slide-2
SLIDE 2

Roadmap Roadmap

  • E. Papalexakis @ MCBDA 17

2

slide-3
SLIDE 3

Multi-Aspect Data??

  • E. Papalexakis @ MCBDA 17

3

slide-4
SLIDE 4

Multi-V Multi-View Social iew Social Networks Networks

  • E. Papalexakis @ MCBDA 17

4

slide-5
SLIDE 5

Social Network Matrix Social Network Matrix

  • E. Papalexakis @ MCBDA 17

5

slide-6
SLIDE 6

Outer Pr Outer Product

  • duct

1 1 2 2 3 3 4 4 5 5 3 3 4 4 5 5 6 6 8 8 10 10

= =

2x1 1x3 2x3

By definition: Rank-one matrix Rank-one matrix Matrix Rank Matrix Rank = Min # of rank-one matrices that add up to that matrix

a a bT 10

  • E. Papalexakis @ MCBDA 17

6

slide-7
SLIDE 7

Matrix Decomposition Matrix Decomposition

=

  • E. Papalexakis @ MCBDA 17

7

Decomposition into rank-1 matrices/components

slide-8
SLIDE 8

Singular V Singular Value Decomposition alue Decomposition

X X =

u1

1

v1

T T

σ1

1

uk vk

T

σk

k

+ … +

  • If k = rank(X) then we have equality
  • If k < rank(X) we have the best rank k approximation

that minimizes the squared error (Eckart Young Theorem)

  • E. Papalexakis @ MCBDA 17

8

slide-9
SLIDE 9

What if we have mor What if we have more than 1 view e than 1 view

  • f the network?
  • f the network?

If we aggregate, we ignore important structure!!

  • E. Papalexakis @ MCBDA 17

9

slide-10
SLIDE 10

Tensors ensors

  • Multi-dimensional matrices
  • Long list of applications: Chemometrics,

Psychometrics, Signal Processing, Data Mining

X X

  • E. Papalexakis @ MCBDA 17

10

slide-11
SLIDE 11

X X

What ar What are we looking for? e we looking for?

Blocks Blocks within the data Subsets / co-clusters of: 1) Users (“senders”) 2) Users (“receivers”) 3) Means of communication

  • E. Papalexakis @ MCBDA 17

11

slide-12
SLIDE 12

Thr Three-way Outer Pr ee-way Outer Product

  • duct

2x3x2 rank-1 tensor

1 1 2 2 3 3 4 4 5 5 6 6 8 8 10 10 12 12 16 16 20 20

= =

2x1 1x3 2x3 a a b 10 2 2 3 3 2x1 9 9 12 12 15 15 18 18 21 21 30 30 2x3 c

  • E. Papalexakis @ MCBDA 17

12

Blocks ar Blocks are rank-1 tensors e rank-1 tensors

slide-13
SLIDE 13

CP/P CP/PARAF ARAFAC Decomposition AC Decomposition

X X

+ … + a1 aK b1 bK c1 cK

min

A,B,C kX

X

k

ak bk ckk2

F

  • uter product
  • E. Papalexakis @ MCBDA 17

13

slide-14
SLIDE 14

GRAPH

RAPHFUSE USE

[Papalexakis et al. Fusion 2013 Fusion 2013]

X

+ … + a1 aK b1 bK c1 cK

Step 1 Step 1 Step 2 Step 2

Assign users to communities (max component membership)

Step 3 Step 3

For users with no assignment create K+1th (umbrella) community

PARAFAC with Sparse Latent Factors [Papalexakis et al. [Papalexakis et al. IEEE ICASSP 2011, IEEE IEEE ICASSP 2011, IEEE TSP 2013 TSP 2013] ]

Output: 1. Assignment of users to communities

  • 2. Influence of a view to each community
  • E. Papalexakis @ MCBDA 17

14

slide-15
SLIDE 15

DBLP Multi-V DBLP Multi-View Graph iew Graph

(a) citation (b) co-auth. (c) co-term

  • Assignment of authors to research communities
  • Measure NMI (Normalized Mutual Information)
  • Baselines

² Spectral clustering on sum of matrices / views ² Linked Matrix Factorization [Tang et al. ICDM 2009]

  • GRAPHFUSE outperforms “2D” baselines

[Papalexakis et al. Fusion 2013 Fusion 2013]

  • E. Papalexakis @ MCBDA 17

15

slide-16
SLIDE 16

Time-Evolving Graphs as T ime-Evolving Graphs as Tensors ensors

…!

t1! t2! t3!

…!

tK!

…!

Detect anomalies / real-life events

  • E. Papalexakis @ MCBDA 17

16

slide-17
SLIDE 17

Time-Evolving Graphs as T ime-Evolving Graphs as Tensors ensors

  • Tensor Decomposition will give us

² Communities in the Graph & ² Their evolution over time

1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 0.2 0.4 0.6 0.8 1

Year Change Detection

ICDE, SIGMOD, VLDB CIKM,ECIR,ICDE,ICDM,IJCAI,JCDL,KDD,SIGIR, WWW

Point of change

  • Fig. 3.

In this Figure we demonstrate how TENSORSPLAT is able to perform change detection. In particular, we observe two components in which a well- known professor appears as an author; the first component mainly contains Databases conferences, whereas the second contains Data Mining conferences. The dashed red line indicates the point of change in research direction.

Koutra, Papalexakis, Faloutsos, “TENSORSPLAT: Spotting Latent Anomalies in Time”

  • E. Papalexakis @ MCBDA 17

17

slide-18
SLIDE 18

Neur Neurosemantics

  • semantics

ed:

20 40 0.05 0.1 0.15 0.2 0.25 0.3 20 40 0.05 0.1 0.15 0.2 0.25 0.3

Real and predicted MEG brain activity Real and predicted MEG brain activity

20 40 0.1 0.2 0.3 0.4
  • “Does it fly?” (y/n)

“Does it fly?” (y/n) “Does it bite?” (y/n) “Does it bite?” (y/n)

fMRI MEG

  • E. Papalexakis @ MCBDA 17

18

slide-19
SLIDE 19

Neur Neurosemantics

  • semantics
  • Semantically coherent brain regions?
  • Similarities/differences between subjects?
  • How is language processed in the brain?

ed: 20 40 0.05 0.1 0.15 0.2 0.25 0.3 20 40 0.05 0.1 0.15 0.2 0.25 0.3 Real and predicted MEG brain activity Real and predicted MEG brain activity 20 40 0.1 0.2 0.3 0.4
  • “Does it fly?” (y/n)

“Does it fly?” (y/n) “Does it bite?” (y/n) “Does it bite?” (y/n)

  • E. Papalexakis @ MCBDA 17

19

slide-20
SLIDE 20

Combining measur Combining measurements for ements for multiple subjects multiple subjects

ed: ed: ed: ed: ed: ed: ed: ed:

Airplane Airplane Dog Dog Puppy Puppy

  • E. Papalexakis @ MCBDA 17

20

slide-21
SLIDE 21

Modeling Brain Data as T Modeling Brain Data as Tensor ensor

Airplane Airplane Dog Dog

ed: ed: ed: ed: ed: ed: ed: ed:

Puppy Puppy fMRI voxels fMRI voxels

  • E. Papalexakis @ MCBDA 17

21

slide-22
SLIDE 22

CP/P CP/PARAF ARAFAC Decomposition AC Decomposition

Dog Dog Airplane Airplane Puppy Puppy

ed:

+ +

1 1 1 1 1 1

= =

ed: ed: ed: ed: ed: ed: ed: ed:

a1 b1 c1 a2 b2 c2

X X

min

ar,br,cr kX R

X

r=1

ar br crk2

F

  • E. Papalexakis @ MCBDA 17

22

slide-23
SLIDE 23

Semantic Infor Semantic Information mation

  • ✔ Is it alive?

✔ Does it bite? Does it fly? ✔ Can you buy it? Is it smaller than a golf ball? …

  • Human readable description of the noun
  • Useful information to guide the analysis
  • Can have different semantic features

(corpus statistics, knowledge base features)

  • E. Papalexakis @ MCBDA 17

23

slide-24
SLIDE 24

Tensor W ensor With Side Infor ith Side Information mation

Dog Dog Airplane Airplane Puppy Puppy

ed: ed: ed: ed: ed: ed: ed: ed:

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

Y X X

[Papalexakis Papalexakis et al. SDM 2014 et al. SDM 2014]

  • E. Papalexakis @ MCBDA 17

24

slide-25
SLIDE 25

Pr Proposed Modeling: Coupled

  • posed Modeling: Coupled

Matrix-T Matrix-Tensor Factorization ensor Factorization = =

ed:

+ +

1 1 1 1 1 1

a1 b1 c1 a2 b2 c2

Dog Dog Airplane Airplane Puppy Puppy

ed: ed: ed: ed: ed: ed: ed: ed:

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

= = + +

1 1 1 1 1 1

a1 a2

✔ ✔ ✔ ✔ ✔

d1 d2

X X Y

  • E. Papalexakis @ MCBDA 17

25

slide-26
SLIDE 26

Pr Proposed Modeling: Coupled

  • posed Modeling: Coupled

Matrix-T Matrix-Tensor Factorization ensor Factorization = =

ed:

+ +

1 1 1 1 1 1

a1 b1 c1 a2 b2 c2

Dog Dog Airplane Airplane Puppy Puppy

ed: ed: ed: ed: ed: ed: ed: ed:

✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔

= = + +

1 1 1 1 1 1

a1 a2

✔ ✔ ✔ ✔ ✔

d1 d2

Y X X

min

ar,br,cr,dr kX R

X

r=1

ar br crk2

F + kY R

X

r=1

ardT

r k2 F

Tensor part Matrix part

  • E. Papalexakis @ MCBDA 17

26

slide-27
SLIDE 27

Pr Pre-motor Cortex e-motor Cortex

50 100 150 200 250 50 100 150 200 250 300 0.01 0.02 0.03 0.04 0.05

Premotor Cortex

NouQuestions Nou

Nouns Q beetle c glass tomato bell

can you pick it up? can you hold it in one hand? is it smaller than a golfball?’

6 50 100 150 200 250 50 100 150 200 250 300 0.01 0.02 0.03 0.04 0.05

Pre$motor(cortex(

✔ Unsupervised ✔ Agrees with Neuroscience

  • E. Papalexakis @ MCBDA 17

27

slide-28
SLIDE 28

Multi-Aspect Data Everywher Multi-Aspect Data Everywhere! e!

Social Networks Social Networks & Urban Comp. & Urban Comp. Neur Neurosemantics

  • semantics

W Web eb Knowledge Knowledge

is president

  • E. Papalexakis @ MCBDA 17

28

slide-29
SLIDE 29

Roadmap Roadmap

  • E. Papalexakis @ MCBDA 17

29

slide-30
SLIDE 30

Tensors in Data Science ensors in Data Science

  • Naturally model multi-aspect data
  • Very powerful modeling tools
  • Big Challenges

Big Challenges

² C1

C1: Data Size & Scalability

² C2

C2: Model Selection, Quality & Interpretability ? ? ? ? ? ? ? ?

  • E. Papalexakis @ MCBDA 17

30

slide-31
SLIDE 31

Fast and Scalable T Fast and Scalable Tensor ensor Decompositions Decompositions

  • Exploiting Sparsity

² Tensor Toolbox for Matlab [Kolda et al.] ² GigaTensor [Kang et al. 2012] ² FlexiFaCT [Beutel et al. 2014] ² DFacto [Choi et al. 2014] ² SPLATT [Smith et al. 2015]

  • All above methods are exact

² Most of them focus on the ”MTTKRP” operation

  • Can we do something by appr

approximating

  • ximating?
  • E. Papalexakis @ MCBDA 17

31

slide-32
SLIDE 32

Appr Approximate “Sketching” Methods

  • ximate “Sketching” Methods

Hashing Hashing

[Wang et al. 2015]

Compr Compression ession

Tucker Compression

[Bro et al. 1998]

PARACOMP [Sidiropoulos et al. 2014]

Sampling Sampling

Tensor CUR [Mahoney et al. 2008] ParCube [Papalexakis et al. 2012] Walk’n’Merge [Erdos et al 2013] MACH [Tsourakakis 2010] SPALS [Cheng et al 2016] CPRAND[Battaglino et al 2017]

  • E. Papalexakis @ MCBDA 17

32

slide-33
SLIDE 33

Appr Approximate “Sketching” Methods

  • ximate “Sketching” Methods

Hashing Hashing

[Wang et al. 2015]

Compr Compression ession

Tucker Compression

[Bro et al. 1998]

PARACOMP [Sidiropoulos et al. 2014]

Sampling Sampling

Tensor CUR [Mahoney et al. 2008] ParCube [Papalexakis et al. 2012] Walk’n’Merge [Erdos et al 2013] MACH [Tsourakakis 2010] SPALS [Cheng et al 2016] CPRAND[Battaglino et al 2017]

  • E. Papalexakis @ MCBDA 17

33

slide-34
SLIDE 34

ParCube ParCube: Sampling-based Parallel : Sampling-based Parallel Tensor Decomposition ensor Decomposition

Xr X1

1

X X

≈ ≈ …

FACTOR

ACTORMERGE ERGE

Papalexakis et al. ECML-PKDD 2012

  • E. Papalexakis @ MCBDA 17

34

slide-35
SLIDE 35

Does it work? Does it work?

Achieves comparable accuracy to exact algorithm Achieves comparable accuracy to exact algorithm

1 1.1 1.2 1.3 1.4 2 4 6 8 10 12

Rela%ve error Repe%%ons 90% sparser 90% sparser

  • E. Papalexakis @ MCBDA 17

35

slide-36
SLIDE 36

Speedup Speedup

Baseline (ALS)

~ 1 day ~ 1 day

4 Intel Xeon E74850 512Gb RAM, Fedora 14 Data size ~0.5Gb

1 2 3 4 5 6 7 8 9 10 11 12 0.001 0.01 0.1 1

Rela%ve error Rela%ve run%me

8 workers

100x faster 100x faster

1/20 1/20 1/10 1/10 1/5 1/5 1/2 1/2 sample size sample size

  • E. Papalexakis @ MCBDA 17

36

slide-37
SLIDE 37

Roadmap Roadmap

  • E. Papalexakis @ MCBDA 17

37

slide-38
SLIDE 38

Model Selection & Quality Model Selection & Quality

  • Problem 1: Rank Estimation

Rank Estimation

² Given a model (e.g. PARAFAC), choose the

right number of components

² Do this without any ground truth

  • E. Papalexakis @ MCBDA 17

38

slide-39
SLIDE 39

Rank Estimation for CP/P Rank Estimation for CP/PARAF ARAFAC AC

  • Maximize both #components and “quality” of decomposition
  • Quality is defined through Core Consistency [Bro et al. 2003]

2 3 4 5 1 2 3 4 5 Rank Error AutoTen Baseline1 Baseline2 Baseline3 Baseline4

Lower is Better

Papalexakis SDM’16 [Best Student Paper Award]

  • E. Papalexakis @ MCBDA 17

39

slide-40
SLIDE 40

Model Selection & Quality Model Selection & Quality

  • Problem 2: Choosing the right model

Choosing the right model

² Given an application/dataset, choose the

most appropriate tensor decomposition(s)

² Again, assume no ground truth!

  • E. Papalexakis @ MCBDA 17

40

slide-41
SLIDE 41

Choosing the right model Choosing the right model

? ? ? ? ? ? ? ?

A A B B

a1 aR b1 bR c1 cR

=

I x R J x R

+…+

U1 U2 G G

I x R1 J x R2 R1 x R2 x R3

A A AT D D R R D D

I x R R x I R x R x K R x R x K R x R

Tucker …

Easy to interpret Captures trilinear structure Used for community detection Captures non-trilinear structure Used for role discovery Works on undirected graphs Used for role discovery

Quick and effective diagnostics?

Social Network Tensor

[Papalexakis et al. Fusion 13] [Gilpin et al. Arxiv.org 16] [Kolda et al. ICDM 07]

  • E. Papalexakis @ MCBDA 17

41

slide-42
SLIDE 42

The End! The End!

  • Thank you! Questions??
  • How to reach me:

web web: http://www.cs.ucr.edu/~epapalex/ e-mail e-mail: epapalex@cs.ucr.edu

  • E. Papalexakis @ MCBDA 17

42