[PPT] - Three right directions and three wrong directions for tensor PowerPoint Presentation

SLIDE 1

Three right directions and three wrong directions for tensor research

Michael W. Mahoney

Stanford University ( For more info, see: http:// cs.stanford.edu/people/mmahoney/

r Google on “Michael Mahoney”)

SLIDE 2

Lots and lots of large data!

High energy physics experimental data
Hyperspectral medical and astronomical image data
DNA microarray data and DNA SNP data
Medical literature analysis data
Collaboration and citation networks
Internet networks and web graph data
Advertiser-bidded phrase data
Static and dynamic social network data

SLIDE 3

“Scientific” and “Internet” data

SNPs individuals

… AG AG AG AG AA CC GG AG CG AC CC AA CC AA GG TT AG CT CG CG CG AT CT CT AG CT … … AA AG AG AG AA CC AG GG CC AC CC AA CG AA GG TT AG CT CG CG CG AT CT CT AG CT … … AA GG GG GG AA CT GG AA CC AC CG AA CC AA GG TT GG CC CG CG CG AT CT CT AG CT … … AG AG AG AG AA CT GG AG CC CC CG AA CC AA GT TT AG CT CG CG CG AT CT CT AG CT … … AA AG AG AG AA CC AG AG CG AA CC AA CG AA GG TT AA TT GG GG GG TT TT CC GG TT …

SLIDE 4

Algorithmic vs. Statistical Perspectives

Computer Scientists

Data: are a record of everything that happened.
Goal: process the data to find interesting patterns and associations.
Methodology: Develop approximation algorithms under different

models of data access since the goal is typically computationally hard. Statisticians

Data: are a particular random instantiation of an underlying process

describing unobserved patterns in the world.

Goal: is to extract information about the world from noisy data.
Methodology: Make inferences (perhaps about unseen events) by

positing a model that describes the random variability of the data around the deterministic model.

Lambert (2000)

SLIDE 5

Matrices and Data

Matrices provide simple representations of data:

Aij = 0 or 1 (perhaps then weighted), depending on whether word

i appears in document j

Aij = -1,0,+1, if homozygous for the major allele, heterozygous,
r homozygous for the minor allele

Can take advantage of “nice” properties of vector spaces:

structural properties: SVD, Euclidean geometry
algorithmic properties: “everything” is O(n3)
statistical properties: PCA, regularization, etc.

SNPs individuals

… AG AG AG AG AA CC GG AG CG AC CC AA CC AA GG TT AG CT CG CG CG AT CT CT AG CT … … AA AG AG AG AA CC AG GG CC AC CC AA CG AA GG TT AG CT CG CG CG AT CT CT AG CT … … AA GG GG GG AA CT GG AA CC AC CG AA CC AA GG TT GG CC CG CG CG AT CT CT AG CT … … AG AG AG AG AA CT GG AG CC CC CG AA CC AA GT TT AG CT CG CG CG AT CT CT AG CT … … AA AG AG AG AA CC AG AG CG AA CC AA CG AA GG TT AA TT GG GG GG TT TT CC GG TT …

SLIDE 6

Graphs and Data

Common variations include:

Directed graphs
Weighted graphs
Bipartite graphs

Interaction graph model of networks:

Nodes represent entities
Edges represent interaction

between pairs of entities

SLIDE 7

Why model data as graphs and matrices?

Graphs and matrices -

provide natural mathematical structures that provide

algorithmic, statistical, and geometric benefits

provide nice tradeoff between rich descriptive framework

and sufficient algorithmic structure

provide regularization due to geometry, either explicitly due

to Rn or implicitly due to approximation algorithms

SLIDE 8

What if graphs/matrices don’t work?

Employ more general mathematical structures:

Hypergraphs
Attributes associated with nodes
“Kernelize” the data using, e.g., a similarity notion
Generalized linear or hierarchical models
Tensors!!

These structures provide greater descriptive flexibility, that typically comes at a (moderate or severe) computational cost.

SLIDE 9

What is a tensor? (1 of 3)

See L.H.Lim’s tutorial on tensors at MMDS 2006.

SLIDE 10

What is a tensor? (2 of 3)

SLIDE 11

What is a tensor? (3 of 3)

IMPORTANT: This is similar to NLA --- but, there is no reason to expect the “subscript manipulation” methods, so useful in NLA, to yield anything meaningful for more general algebraic structures.

SLIDE 12

Tensor ranks and data analysis (1 of 3)

SLIDE 13

Tensor ranks and data analysis (2 of 3)

IMPORTANT: These ill-posedness results are NOT pathological--- they are ubiquitous and essential properties of tensors.

SLIDE 14

Tensor ranks and data analysis (3 of 3)

THAT IS: To get a “simple” or “low-rank” tensor approximation, we focus on exceptions to fundamental ill-posedness properties

f tensors (i.e., rank-1 tensors and 2-mode tensors).

SLIDE 15

Historical Perspective on NLA

NLA grew out of statistics (among other areas) (40s and 50s)
NLA focuses on numerical issues (60s, 70s, and 80s)
Large-scale data generation increasingly common (90s and 00s)
NLA has suffered due to the success of PageRank and HITS.
Large-scale scientific and Internet data problems invite us to

take a broader perspective on traditional NLA:

revisit algorithmic basis of common NLA matrix algorithms revisit statistical underpinnings of NLA expand traditional NLA view of tensors

SLIDE 16

The gap between NLA and TCS

Matrix factorizations:

in NLA and scientific computing - used to express a problem s.t. it can

be solved more easily.

in TCS and statistical data analysis - used to represent structure that

may be present in a matrix obtained from object-feature observations. MMDS06, MMDS08, … were designed to “bridge the gap” between NLA, TCS, and data applications. NLA:

emphasis on optimal conditioning,
backward error analysis issues,
is running time a large or small

constant multiplied by n2 or n3. TCS:

motivated by large data applications
space-constrained or pass-efficient

models

over-sampling and randomness as

computational resources.

SLIDE 17

How to “bridge the gap” (Lessons from MMDS)

In a vector space, “everything is easy,”

multi-linear captures the inherent intractability of NP-hardness.

Convexity is an appropriate generalization of linear

nice algorithmic framework, as with kernels in Machine Learning

Randomness, over-sampling, approximation ...

are powerful algorithmic resources but you need to have a clear objective you are solving

Geometry of combinatorial objects (e.g., graphs, tensors, etc.)

has positive algorithmic, statistical, and conceptual benefits

Approximate computation induces implicit statistical regularization

SLIDE 18

Examples of “tensor data”

Chemistry: model fluorescence excitation-emission data in food science:

Aijk is samples x emission x excitation.

Neuroscience: EEG data as patients, doses, conditions, etc. varied:

Aijk is time samples x frequency x electrodes.

Social network and Web analysis: to discover hidden structures:

Aijk is webpages x webpages x anchor text. Aijk is users x queries x webpages. Aijk is advertisers x bidded-phrases x time.

Computer Vision: image compression and face recognition:

Aijk is pixel x illumination x expression x viewpoint x person.

Quantum mechanics, large-scale computation, hyperspectral data, climate

data, ICA, nonnegative data, blind source separation, NP-hard problems, …

“Tensor-based data are particularly challenging due to their size and since many data analysis tools based on graph theory and linear algebra do not easily generalize.”

- MMD06

(Acar and Yener 2008)

SLIDE 19

Three Right Directions

SLIDE 20

Three Right Directions

1. Understand statistical and algorithmic assumptions s.t. tensor methods work. (NOT just independence.)

SLIDE 21

Three Right Directions

1. Understand statistical and algorithmic assumptions s.t. tensor methods work. (NOT just independence.) 2. Understand the geometry of tensors. (NOT of vector spaces you unfold to.)

SLIDE 22

Three Right Directions

1. Understand statistical and algorithmic assumptions s.t. tensor methods work. (NOT just independence.) 2. Understand the geometry of tensors. (NOT of vector spaces you unfold to.) 3. Understand WHY tensors work in physical applications and what this says about less structured data applications (and vice-versa, which has been very fruitful for matrices*.)

*(E.g., low-rank off-diagonal blocks are common in matrices -- since the world is 3D, which is not true in less structured applications -- this has significant algorithmic implications.)

SLIDE 23

Four! Right Directions

1. Understand statistical and algorithmic assumptions s.t. tensor methods work. (NOT just independence.) 2. Understand the geometry of tensors. (NOT of vector spaces you unfold to.) 3. Understand WHY tensors work in physical applications and what this says about less structured data applications (and vice-versa, which has been very fruitful for matrices*.) 4. Understand “unfolding” as a process of defining features. (Since this puts you in a nice algorithmic place.)

*(E.g., low-rank off-diagonal blocks are common in matrices -- since the world is 3D, which is not true in less structured applications -- this has significant algorithmic implications.)

SLIDE 24

Three Wrong directions

SLIDE 25

Three Wrong directions

1. Viewing tensors as matrices with additional subscripts. (That may be true, but it hampers you, since Rn is so nice.)

SLIDE 26

Three Wrong directions

1. Viewing tensors as matrices with additional subscripts. (That may be true, but it hampers you, since Rn is so nice.) 2. Using methods that damage geometry and enhance sparsity. (BTW, you will do this if you don’t understand the underlying geometric and sparsity structure.)

SLIDE 27

Three Wrong directions

1. Viewing tensors as matrices with additional subscripts. (That may be true, but it hampers you, since Rn is so nice.) 2. Using methods that damage geometry and enhance sparsity. (BTW, you will do this if you don’t understand the underlying geometric and sparsity structure.) 3. Doing “Applied Ramsey Theory”: Theorem: Given a large enough universe of data, then for any algorithm there exists a data set s.t. it performs well. (Show me where your method fails AND where it succeeds! Otherwise, is your result about your data or your method? Of course, this applies more generally in data analysis.)

SLIDE 28

Conclusions

Large-scale data applications have been the main

driver for a lot of the interest in tensors.

Tensors are tricky to deal with, both

algorithmically and statistically.

Let’s use this meeting to refine my directions in

Three right directions and three wrong directions for tensor research

Michael W. Mahoney

Stanford University ( For more info, see: http:// cs.stanford.edu/people/mmahoney/

Lots and lots of large data!

“Scientific” and “Internet” data

Algorithmic vs. Statistical Perspectives

Computer Scientists

models of data access since the goal is typically computationally hard. Statisticians

describing unobserved patterns in the world.

positing a model that describes the random variability of the data around the deterministic model.

Matrices and Data

Matrices provide simple representations of data:

i appears in document j

Can take advantage of “nice” properties of vector spaces:

Graphs and Data

Common variations include:

Interaction graph model of networks:

between pairs of entities

Why model data as graphs and matrices?

Graphs and matrices -

algorithmic, statistical, and geometric benefits

and sufficient algorithmic structure

to Rn or implicitly due to approximation algorithms

What if graphs/matrices don’t work?

Employ more general mathematical structures:

These structures provide greater descriptive flexibility, that typically comes at a (moderate or severe) computational cost.

What is a tensor? (1 of 3)

What is a tensor? (2 of 3)

What is a tensor? (3 of 3)

IMPORTANT: This is similar to NLA --- but, there is no reason to expect the “subscript manipulation” methods, so useful in NLA, to yield anything meaningful for more general algebraic structures.

Tensor ranks and data analysis (1 of 3)

Tensor ranks and data analysis (2 of 3)

IMPORTANT: These ill-posedness results are NOT pathological--- they are ubiquitous and essential properties of tensors.

Tensor ranks and data analysis (3 of 3)

THAT IS: To get a “simple” or “low-rank” tensor approximation, we focus on exceptions to fundamental ill-posedness properties

Historical Perspective on NLA

take a broader perspective on traditional NLA:

revisit algorithmic basis of common NLA matrix algorithms revisit statistical underpinnings of NLA expand traditional NLA view of tensors

The gap between NLA and TCS

Matrix factorizations:

be solved more easily.

may be present in a matrix obtained from object-feature observations. MMDS06, MMDS08, … were designed to “bridge the gap” between NLA, TCS, and data applications. NLA:

constant multiplied by n2 or n3. TCS:

models

computational resources.

How to “bridge the gap” (Lessons from MMDS)

multi-linear captures the inherent intractability of NP-hardness.

nice algorithmic framework, as with kernels in Machine Learning

are powerful algorithmic resources but you need to have a clear objective you are solving

has positive algorithmic, statistical, and conceptual benefits

Examples of “tensor data”

Aijk is samples x emission x excitation.

Aijk is time samples x frequency x electrodes.

Aijk is webpages x webpages x anchor text. Aijk is users x queries x webpages. Aijk is advertisers x bidded-phrases x time.

Aijk is pixel x illumination x expression x viewpoint x person.

data, ICA, nonnegative data, blind source separation, NP-hard problems, …

“Tensor-based data are particularly challenging due to their size and since many data analysis tools based on graph theory and linear algebra do not easily generalize.”

Three Right Directions

Three Right Directions

1. Understand statistical and algorithmic assumptions s.t. tensor methods work. (NOT just independence.)

Three Right Directions

1. Understand statistical and algorithmic assumptions s.t. tensor methods work. (NOT just independence.) 2. Understand the geometry of tensors. (NOT of vector spaces you unfold to.)

Three Right Directions

Four! Right Directions

Three Wrong directions

Three Wrong directions

1. Viewing tensors as matrices with additional subscripts. (That may be true, but it hampers you, since Rn is so nice.)

Three Wrong directions

1. Viewing tensors as matrices with additional subscripts. (That may be true, but it hampers you, since Rn is so nice.) 2. Using methods that damage geometry and enhance sparsity. (BTW, you will do this if you don’t understand the underlying geometric and sparsity structure.)

Three Wrong directions

Conclusions

driver for a lot of the interest in tensors.

algorithmically and statistically.

light of motivating data applications.