(Some) Challenges in (Some) Challenges in Tensor Mining Tensor - - PowerPoint PPT Presentation

some challenges in some challenges in tensor mining
SMART_READER_LITE
LIVE PREVIEW

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor - - PowerPoint PPT Presentation

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia National Labs., Livermore, CA Tensor Mining Tensor Mining Parafac Parafac unsupervised unsupervised = + + X X E X X = + dense or sparse


slide-1
SLIDE 1

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining

Evrim Acar

Sandia National Labs., Livermore, CA

slide-2
SLIDE 2

X Xtest

test

Tensor Mining Tensor Mining

X X

X

= +

E

+

X E

= + unsupervised unsupervised Parafac Parafac Tucker Tucker X Xtrain y y supervised supervised

Xtrain

≈ y y

X X dense or sparse dense or sparse

slide-3
SLIDE 3

App I: Social Networks Analysis App I: Social Networks Analysis

  • In social networks, we are interested in modeling relationships (links)

evolving over time.

  • Example:

– DBLP dataset: Authors x Conferences x Years (10K x 2K x 14: ~0.1% dense) authors authors conferences conferences

1991 1991 1992 1992

… …

2004 2004

Q1: Can we use tensor decompositions to model the data and extract meaningful underlying factors? Q2: Can we predict who is going to publish at which conferences in future? (Link Prediction in time)

Joint work with Joint work with T.G. T.G. Kolda Kolda and D. M. and D. M. Dunlavy Dunlavy

SIAM CS&E March 2-6, 2009

# of papers by ith author at jth conf. in year k

slide-4
SLIDE 4

Modeling DBLP using PARAFAC

authors authors

conferences conferences

a1 b1 c1

+ +

a2 b2 c2 …

aR bR cR years years

  • Solve using a gradient

Solve using a gradient-

  • based

based

  • ptimization approach
  • ptimization approach
  • Initialization:

Initialization:

  • first two modes using

first two modes using svd svd, ,

  • last mode: random,

last mode: random,

X

slide-5
SLIDE 5

Components make sense!

authors authors

conferences conferences

a1 b1 c1

+ +

a2 b2 c2 …

aR bR cR

X X

year year

c cr

r

b br

r

a ar

r

1992 1994 1996 1998 2000 2002 2004 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Years Coeffs. Time mode 200 400 600 800 1000 1200 1400 1600 1800
  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2
0.2 0.4 Conferences Coeffs. Conference mode

BILDMED CARS DAGM

2000 4000 6000 8000 10000 12000
  • 0.3
  • 0.25
  • 0.2
  • 0.15
  • 0.1
  • 0.05
0.05 Authors Coeffs. Author mode

Hans Peter Meinzer Heinrich Niemann Thomas Martin Lehmann

1992 1994 1996 1998 2000 2002 2004
  • 0.1
0.1 0.2 0.3 0.4 0.5 0.6 Years Coeffs. Time mode 200 400 600 800 1000 1200 1400 1600 1800
  • 0.2
0.2 0.4 0.6 0.8 1 1.2 Conference mode Conferences Coeffs. 2000 4000 6000 8000 10000 12000
  • 0.02
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Author mode Coeffs. Authors

IJCAI Craig Boutilier Daphne Koller

slide-6
SLIDE 6

What if data is a What if data is a Sparse

Sparse tensor with

tensor with

Missing Missing entries?

entries?

  • Sparse Data:
  • Missing Data [Kiers, 1997; Tomasi & Bro, 2005] :
  • Sparse & Missing:

Success with 70% randomly missing data [Tomasi&Bro, 2005]

slide-7
SLIDE 7

App II: Understanding Epileptic Seizures

time time channels channels

Joint work with Joint work with

  • R. Bro, B.
  • R. Bro, B. Yener

Yener, C. A. , C. A. Bingol

  • Bingol. H.

. H. Bingol Bingol

slide-8
SLIDE 8

Epilepsy Tensors Epilepsy Tensors

CWT

Time samples Channels Time samples Scales (freq.) Channels

xij: Electrical potential at ith

sample jth channel

xijk: Power of a wavelet coeff.

at ith sample jth scale kth channel

  • Data rearranged as a three-way array using continuous wavelet

transform (CWT):

  • Let cijk be the wavelet coefficient at time sample i at scale j for the kth channel.
  • An Epilepsy Tensor is a three-way array, X, where each entry xijk is computed as:
slide-9
SLIDE 9

Epilepsy Focus Localization Epilepsy Focus Localization

Time samples Scales Channels

+

a1 b1 c1 a2 b2 c2

Acar et al’07, De Vos et al’07

500 1000 1500 2000

  • 2
  • 1

1 2 3 4 5 6 7 8 Signature in time domain Coeffs. Time Samples 20 40 60 80 100 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Scales Coeffs. Signature in freq. domain Signature in electrodes domain

Fp1 F3 F7 C3 T3 T5 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Pz

  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.6 0.8 1

500 1000 1500 2000

  • 2

2 4 6 8 10 Time Samples Coeffs. Signature in time domain 20 40 60 80 100 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 Scales Coeffs. Signature in freq. domain Signature in electrodes domain

Fp1 F3 F7 C3 T3 T5 O1 Fp2 F4 F8 C4 T4 T6 P4 O2 Fz Pz

  • 1
  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.2 0.4 0.6 0.8 1

ALS ALS

slide-10
SLIDE 10

How many components? How many components?

1000 2000

  • 10

10 Time Samples 50 100 0.1 0.2 Scales 1000 2000

  • 10

10 Time Samples 50 100 0.1 0.2 Scales 1000 2000

  • 10

10 Time Samples 50 100 0.1 0.2 Scales 1000 2000

  • 10

10 Time Samples 50 100 0.1 0.2 Scales 1000 2000

  • 10

10 Time Samples 50 100 0.1 0.2 Scales 1000 2000

  • 20

20 Time Samples 50 100 0.1 0.2 Scales 1000 2000

  • 10

10 Time Samples 50 100 0.1 0.2 Scales 1000 2000

  • 10

10 Time Samples 50 100

  • 0.5

0.5 Scales 1000 2000

  • 20

20 Time Samples 50 100 0.2 0.4 Scales

slide-11
SLIDE 11

How to initialize? How to initialize?

HOSVD HOSVD RANDOM RANDOM

1000 2000

  • 10

10 Time Samples 50 100 0.1 0.2 Scales 1000 2000

  • 10

10 Time Samples 50 100 0.1 0.2 Scales 1000 2000

  • 10

10 Time Samples 50 100 0.1 0.2 Scales 1000 2000

  • 10

10 Time Samples 50 100 0.1 0.2 Scales

slide-12
SLIDE 12

Understanding Epileptic Seizures

time time channels channels

slide-13
SLIDE 13

Epilepsy Feature Tensor Epilepsy Feature Tensor

  • Construction of an Epilepsy Feature Tensor from multi-channel EEG

Time samples

1 2

( ) ( ) ( )

n

f s f s f s ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦

Features Time epochs Channels Channels

Epilepsy Feature Tensor

xij: Electrical potential at ith

channel jth time sample

xijk: Value of jth feature at ith epoch

recorded at kth channel

slide-14
SLIDE 14

Seizure Recognition Seizure Recognition

X ytrain

Training Set

  • Build a model using the training set X

and the labels y.

seizure non-seizure

Pre1 Seizure1 Post1 Seizure3 Pre3 Post3 Seizure2 Pre2 Post2

? ?

Test Test

y ytest

test

Test Set

  • Predict the labels of new recordings.

Time epochs

slide-15
SLIDE 15

I

J

K

y ytrain

train

Multiway Multiway Classification(?) Classification(?)

  • Potential Approaches

– Modify multiway regression models, e.g., multilinear PLS [Bro, 1996; Bro et al., 2001], as classifiers.

Time Epochs

…..

Features - Channels

Xtrain Multilinear PLS

– Unfold the data and apply two-way classification, e.g., SVM.

Xtest Ttest

Linear Discriminant Analysis

ytest

x xi

i

Ttrain WJ WK

R R R

I

J

K

slide-16
SLIDE 16

Some challenges are Some challenges are … …

  • Handling Sparse Data with Missing Entries:

– We need models to capture the underlying sparse factors in sparse tensors with missing entries.

  • Determining the Rank:

– Important also in practice.

  • Initialization:

– Algorithms suffer from the local minima problem. In practice, we may end up interpreting our results differently.

  • Supervised learning on tensors:

– We need classification models for tensors as good as the state-of-the-art two- way classification approaches such as SVMs.

slide-17
SLIDE 17

Thank you! Thank you!

  • References:

– Social Networks Analysis: [Tensor toolbox & Poblano toolbox (by Sandia)]

  • Acar, Kolda and Dunlavy, An Optimization Approach for Fitting Canonical Tensor

Decompositions, SAND2009-0857, Feb. 2009.

– Understanding Epileptic Seizures: [PLS toolbox (by Eigenvector Research)]

  • Acar, Bingol, Bingol, Bro and Yener, Multiway Analysis of Epilepsy Tensors, Bioinformatics,

23(13): i10-i18, 2007.

  • Acar, Bingol, Bingol, Bro and Yener, Seizure Recognition on Epilepsy Feature Tensor, Proc.

29th Int. Conf. IEEE Engineering in Medicine and Biology Society, 2007.

– Survey:

  • Acar and Yener, Unsupervised Multiway Data Analysis: A Literature Survey, IEEE

Transactions on Knowledge and Data Engineering, 21(1): 6-20, 2009.

  • Contact:

Evrim Acar, Sandia National Laboratories, eacarat@sandia.gov