Discovery of Latent Factors in High-dimensional Data via Spectral - - PowerPoint PPT Presentation

discovery of latent factors in high dimensional data via
SMART_READER_LITE
LIVE PREVIEW

Discovery of Latent Factors in High-dimensional Data via Spectral - - PowerPoint PPT Presentation

Discovery of Latent Factors in High-dimensional Data via Spectral Methods Furong Huang University of Maryland Workshop on Quantum Machine Learning 1 / 39 Machine Learning - Excitements Success of Supervised Learning Image classification


slide-1
SLIDE 1

Discovery of Latent Factors in High-dimensional Data via Spectral Methods

Furong Huang University of Maryland

Workshop on Quantum Machine Learning

1 / 39

slide-2
SLIDE 2

Machine Learning - Excitements

Success of Supervised Learning

Image classification Speech recognition Text processing

2 / 39

slide-3
SLIDE 3

Machine Learning - Excitements

Success of Supervised Learning

Image classification Speech recognition Text processing

Key to Success

Deep composition of nonlinear units Enormous labeled data Computation power growth

2 / 39

slide-4
SLIDE 4

Machine Learning - Modern Challenges

Automated discovery of features and categories? Filter bank learning Feature extraction Embeddings, Topics

2 / 39

slide-5
SLIDE 5

Machine Learning - Modern Challenges

Automated discovery of features and categories?

Real AI requires Unsupervised Learning

Filter bank learning Feature extraction Embeddings, Topics Summarize key features in data

◮ State-of-the-art: Humans are better than machines ◮ Goal: Intelligent machines that summarize key features in data

Interpretable modeling and learning of the data

◮ Theoretically guaranteed learning ◮ Extracted features are interpretable 2 / 39

slide-6
SLIDE 6

Unsupervised Learning with Big Data

Curse of Dimensionality

More information → more unknowns/variables → challenging model learning

3 / 39

slide-7
SLIDE 7

Unsupervised Learning with Big Data

Information Extraction

High dimension observation vs Low dimension representation

Cell T ypes T

  • pics

Communities

3 / 39

slide-8
SLIDE 8

Unsupervised Learning with Big Data

Information Extraction

High dimension observation vs Low dimension representation

Cell T ypes T

  • pics

Communities

Finding Needle In the Haystack Is Challenging

3 / 39

slide-9
SLIDE 9

Unsupervised Learning with Big Data

Information Extraction

High dimension observation vs Low dimension representation

Cell T ypes T

  • pics

Communities

My Solution: A Unified Tensor Decomposition Framework

3 / 39

slide-10
SLIDE 10

App 1: Automated Categorization of Documents

Topics Education Crime Sports

Document modeling

Observed: words in document corpus: search logs, emails etc Hidden: (mixed) topics: personal interests, professional area etc

4 / 39

slide-11
SLIDE 11

App 1: Automated Categorization of Documents

Topics Education Crime Sports

Document modeling

Observed: words in document corpus: search logs, emails etc Hidden: (mixed) topics: personal interests, professional area etc

4 / 39

slide-12
SLIDE 12

App 2: Community Extraction From Connectivity Graph Social Networks

Observed: network of social ties: friendships, transactions etc Hidden: (mixed) groups/communities of social actors

5 / 39

slide-13
SLIDE 13

App 2: Community Extraction From Connectivity Graph Social Networks

Observed: network of social ties: friendships, transactions etc Hidden: (mixed) groups/communities of social actors

5 / 39

slide-14
SLIDE 14

Tensor Methods Compared with Variational Inference

Learning Topics from PubMed on Spark: 8 million docs

103 104 105

Perplexity Tensor Variational

2 4 6 8 10 ×104

Running Time (s) 6 / 39

slide-15
SLIDE 15

Tensor Methods Compared with Variational Inference

Learning Topics from PubMed on Spark: 8 million docs

103 104 105

Perplexity Tensor Variational

2 4 6 8 10 ×104

Running Time (s)

Learning Communities from Graph Connectivity

Facebook: n ∼ 20k Yelp: n ∼ 40k DBLPsub: n ∼ 0.1m DBLP: n ∼ 1m

10-2 10-1 100 101

Error /group FB YP DBLPsub DBLP

102 103 104 105 106

Running Times (s) FB YP DBLPsub DBLP 6 / 39

slide-16
SLIDE 16

Tensor Methods Compared with Variational Inference

Learning Topics from PubMed on Spark: 8 million docs

103 104 105

Perplexity Tensor Variational

2 4 6 8 10 ×104

Running Time (s)

Learning Communities from Graph Connectivity

Facebook: n ∼ 20k Yelp: n ∼ 40k DBLPsub: n ∼ 0.1m DBLP: n ∼ 1m

10-2 10-1 100 101

Error /group FB YP DBLPsub DBLP

102 103 104 105 106

Running Times (s) FB YP DBLPsub DBLP

O r d e r s

  • f

M a g n i t u d e F a s t e r & M

  • r

e A c c u r a t e

“Online Tensor Methods for Learning Latent Variable Models”, F. Huang, U. Niranjan, M. Hakeem, A. Anandkumar, JMLR14. “Tensor Methods on Apache Spark”, F. Huang, A. Anandkumar, Oct. 2015. 6 / 39

slide-17
SLIDE 17

App 3: Cataloging Neuronal Cell Types In the Brain

Neuroscience

Observed: cellular-resolution brain slices Hidden: neuronal cell types

7 / 39

slide-18
SLIDE 18

App 3: Cataloging Neuronal Cell Types In the Brain

Our method vs Average expression level [Grange 14’]

0.5 1.0 1.5 2.0 2.5 k Spatial point process (ours) Average expression level ( ) previous

Recovered known cell types

1 Interneurons 2 S1Pyramidal 3 Astrocytes 4 Ependymal 5 Microglia 6 Endothelial 7 Mural 8 Oligodendrocytes

“Discovering Neuronal Cell Types and Their Gene Expression Profiles Using a Spatial Point Process Mixture Model”, F. Huang,

  • A. Anandkumar, C. Borgs, J. Chayes, E. Fraenkel, M. Hawrylycz, E. Lein, A. Ingrosso, S. Turaga, NIPS 2015 BigNeuro workshop.

8 / 39

slide-19
SLIDE 19

App 4: Word Sequence Embedding Extraction

football soccer tree

Word Embedding

The weather is good. Her life spanned years of incredible change for women. Mary lived through an era of liberating reform for women.

Word Sequence Embedding

“Convolutional Dictionary Learning through Tensor Factorization”, by F. Huang, A. Anandkumar, In Proceedings of JMLR 2015. 9 / 39

slide-20
SLIDE 20

App 5: Human Disease Hierarchy Discovery

CMS: 1.6 million patients, 168 million diagnostic events, 11 k diseases. Observed: co-occurrence of diseases on patients Hidden: disease similarity/hierarchy

” Scalable Latent TreeModel and its Application to Health Analytics ” by F. Huang, N. U.Niranjan, I. Perros, R. Chen, J. Sun,

  • A. Anandkumar, NIPS 2015 MLHC workshop.

10 / 39

slide-21
SLIDE 21

Involve discovering the hidden and compact structure that is embedded in the high-dimensional complex observed data

11 / 39

slide-22
SLIDE 22

How to model hidden effects?

Basic Approach: mixtures/clusters

Hidden variable h is categorical.

Advanced: Probabilistic models

Hidden variable h has more general distributions. Can model mixed memberships. x1 x2 x3 x4 x5 h1 h2 h3 This talk: basic mixture model and some advanced models (topic model)

12 / 39

slide-23
SLIDE 23

Challenges in Learning

Basic goal in all mentioned applications

Discover hidden structure in data: unsupervised learning.

Words Topics Choice Variable life gene data DNA RNA k1 k2 k3 k4 k5 h A A A A A

Unlabeled data Latent variable model Learning Algorithm Inference 13 / 39

slide-24
SLIDE 24

Challenges in Learning – find hidden structure in data

Words Topics Choice Variable life gene data DNA RNA k1 k2 k3 k4 k5 h A A A A A

Unlabeled data Latent variable model Learning Algorithm Inference

Challenge: Conditions for Identifiability

Whether can model be identified given infinite computation and data? Are there tractable algorithms under identifiability?

13 / 39

slide-25
SLIDE 25

Challenges in Learning – find hidden structure in data

Words Topics Choice Variable life gene data DNA RNA k1 k2 k3 k4 k5 h A A A A A

Unlabeled data Latent Variable model MCMC Inference

Challenge: Conditions for Identifiability

Whether can model be identified given infinite computation and data? Are there tractable algorithms under identifiability?

Challenge: Efficient Learning of Latent Variable Models

MCMC: random sampling, slow

◮ Exponential mixing time 13 / 39

slide-26
SLIDE 26

Challenges in Learning – find hidden structure in data

Words Topics Choice Variable life gene data DNA RNA k1 k2 k3 k4 k5 h A A A A A

Unlabeled data Latent variable model

▲ ✁el ✐✂ ✂ ✄ ☎ ✆ t ✐✂ ✄✝

Inference

Challenge: Conditions for Identifiability

Whether can model be identified given infinite computation and data? Are there tractable algorithms under identifiability?

Challenge: Efficient Learning of Latent Variable Models

MCMC: random sampling, slow

◮ Exponential mixing time

Likelihood: non-convex, not scalable

◮ Exponential critical points 13 / 39

slide-27
SLIDE 27

Challenges in Learning – find hidden structure in data

Words Topics Choice Variable life gene data DNA RNA k1 k2 k3 k4 k5 h A A A A A

Unlabeled data Latent variable model

✞ ✟ ✠el ✟ ✡☛ ☛ ☞ ✌ ✍ ✎ ✡☛ ☞✏

Inference

Challenge: Conditions for Identifiability

Whether can model be identified given infinite computation and data? Are there tractable algorithms under identifiability?

Challenge: Efficient Learning of Latent Variable Models

MCMC: random sampling, slow

◮ Exponential mixing time

Likelihood: non-convex, not scalable

◮ Exponential critical points

Efficient computational and sample complexities?

13 / 39

slide-28
SLIDE 28

Challenges in Learning – find hidden structure in data

Words Topics Choice Variable life gene data DNA RNA k1 k2 k3 k4 k5 h A A A A A

Unlabeled data Latent variable model

ensor Decomposition Inference

= + +

Challenge: Conditions for Identifiability

Whether can model be identified given infinite computation and data? Are there tractable algorithms under identifiability?

Challenge: Efficient Learning of Latent Variable Models

MCMC: random sampling, slow

◮ Exponential mixing time

Likelihood: non-convex, not scalable

◮ Exponential critical points

Efficient computational and sample complexities? Guaranteed and efficient learning through spectral methods

13 / 39

slide-29
SLIDE 29

Unsupervised Learning via Probabilistic Models

Words Topics Choice Variable life gene data DNA RNA k1 k2 k3 k4 k5 h A A A A A

Unlabeled data Latent variable model T ensor Decomposition Inference

= + +

tensor decomposition → correct model

14 / 39

slide-30
SLIDE 30

Unsupervised Learning via Probabilistic Models

Words Topics Choice Variable life gene data DNA RNA k1 k2 k3 k4 k5 h A A A A A

Unlabeled data Latent variable model T ensor Decomposition Inference

= + +

tensor decomposition → correct model

Contributions

Guaranteed online algorithm with global convergence guarantee Highly scalable, highly parallel, dimensionality reduction Tensor library on CPU/GPU/Spark Interdisciplinary applications Extension to model with group invariance

14 / 39

slide-31
SLIDE 31

Outline

1

Introduction

2

Introduction of Method of Moments and Tensor Notations

3

LDA and Community Models From Data Aggregates to Model Parameters Guaranteed Online Algorithm

4

Quantum Algorithms for Leading Eigenvector Computation

5

Conclusion

15 / 39

slide-32
SLIDE 32

Method-of-Moments At A Glance

1

Determine function of model parameters θ estimatable from

  • bservable data:

◮ Moments

Eθ[f(X)]

2

Form estimates of moments using data (iid samples {xi}n

i=1):

◮ Empirical Moments

  • E[f(X)]

3

Solve the approximate equations for parameters θ:

◮ Moment matching

Eθ[f(X)] n→∞ =

  • E[f(X)]

Toy Example

How to estimate Gaussian variable, i.e., (µ,Σ), given iid samples {xi}n

i=1 ∼ N(µ, Σ2)?

16 / 39

slide-33
SLIDE 33

What is a tensor?

Multi-dimensional Array

Tensor - Higher order matrix The number of dimensions is called tensor order.

17 / 39

slide-34
SLIDE 34

Slices

Horizontal slices Lateral slices Frontal slices

18 / 39

slide-35
SLIDE 35

Tensor Product

=

[a ⊗ b]i1,i2 ai1 bi2

=

[a ⊗ b ⊗ c]i1,i2,i3 ai1 bi2 ci3 [a ⊗ b]i1,i2 = ai1bi2 Rank-1 matrix [a ⊗ b ⊗ c]i1,i2,i3 = ai1bi2ci3 Rank-1 tensor

19 / 39

slide-36
SLIDE 36

Tensors in Method of Moments

Matrix: Pair-wise relationship Signal or data observed x ∈ Rd Rank 1 matrix: [x ⊗ x]i,j = xixj Aggregated pair-wise relationship M2 = E[x ⊗ x]

=xi

xj [x⊗x]i,j

Tensor: Triple-wise relationship or higher Signal or data observed x ∈ Rd Rank 1 tensor: [x ⊗ x ⊗ x]i,j,k = xixjxk Aggregated triple-wise relationship M3 = E[x ⊗ x ⊗ x] = E[x⊗3]

=

[x⊗x⊗x]i,j,k xi xj xk

20 / 39

slide-37
SLIDE 37

CP decomposition

X =

R

  • h=1

ah ⊗ bh ⊗ ch Summation of rank-1 tensors

21 / 39

slide-38
SLIDE 38

Why are tensors powerful?

Matrix Orthogonal Decomposition

Not unique without eigenvalue gap

  • 1

1

  • = e1e⊤

1 + e2e⊤ 2 = u1u⊤ 1 + u2u⊤ 2

e1 e2 u1 = [

√ 2 2 , − √ 2 2 ]

u2 = [

√ 2 2 , √ 2 2 ]

22 / 39

slide-39
SLIDE 39

Why are tensors powerful?

Matrix Orthogonal Decomposition

Not unique without eigenvalue gap

  • 1

1

  • = e1e⊤

1 + e2e⊤ 2 = u1u⊤ 1 + u2u⊤ 2

Unique with eigenvalue gap

e1 e2 u1 = [

√ 2 2 , − √ 2 2 ]

u2 = [

√ 2 2 , √ 2 2 ]

22 / 39

slide-40
SLIDE 40

Why are tensors powerful?

Matrix Orthogonal Decomposition

Not unique without eigenvalue gap

  • 1

1

  • = e1e⊤

1 + e2e⊤ 2 = u1u⊤ 1 + u2u⊤ 2

Unique with eigenvalue gap

e1 e2 u1 = [

√ 2 2 , − √ 2 2 ]

u2 = [

√ 2 2 , √ 2 2 ]

Tensor Orthogonal Decomposition (Harshman, 1970)

Unique: eigenvalue gap not needed

+ = ≠

22 / 39

slide-41
SLIDE 41

Why are tensors powerful?

Matrix Orthogonal Decomposition

Not unique without eigenvalue gap

  • 1

1

  • = e1e⊤

1 + e2e⊤ 2 = u1u⊤ 1 + u2u⊤ 2

Unique with eigenvalue gap

e1 e2 u1 = [

√ 2 2 , − √ 2 2 ]

u2 = [

√ 2 2 , √ 2 2 ]

Tensor Orthogonal Decomposition (Harshman, 1970)

Unique: eigenvalue gap not needed Slice of tensor has eigenvalue gap

+ =

22 / 39

slide-42
SLIDE 42

Why are tensors powerful?

Matrix Orthogonal Decomposition

Not unique without eigenvalue gap

  • 1

1

  • = e1e⊤

1 + e2e⊤ 2 = u1u⊤ 1 + u2u⊤ 2

Unique with eigenvalue gap

e1 e2 u1 = [

√ 2 2 , − √ 2 2 ]

u2 = [

√ 2 2 , √ 2 2 ]

Tensor Orthogonal Decomposition (Harshman, 1970)

Unique: eigenvalue gap not needed Slice of tensor has eigenvalue gap

+ = ≠

22 / 39

slide-43
SLIDE 43

Outline

1

Introduction

2

Introduction of Method of Moments and Tensor Notations

3

LDA and Community Models From Data Aggregates to Model Parameters Guaranteed Online Algorithm

4

Quantum Algorithms for Leading Eigenvector Computation

5

Conclusion

23 / 39

slide-44
SLIDE 44

Outline

1

Introduction

2

Introduction of Method of Moments and Tensor Notations

3

LDA and Community Models From Data Aggregates to Model Parameters Guaranteed Online Algorithm

4

Quantum Algorithms for Leading Eigenvector Computation

5

Conclusion

24 / 39

slide-45
SLIDE 45

Probabilistic Topic Models - LDA

Bag of words

Infer topics of documents Learn hidden process drives the

  • bs.

Generative model

Topic proportion ∼ Dir(α) for a doc Draw a topic, then a word for a token

Topics Topic Proportion

police witness campus police witness campus police witness campus 25 / 39

slide-46
SLIDE 46

Probabilistic Topic Models - LDA

Bag of words

Infer topics of documents Learn hidden process drives the

  • bs.

Generative model

Topic proportion ∼ Dir(α) for a doc Draw a topic, then a word for a token

Topics Topic Proportion

police witness campus police witness campus police witness campus police witness

crime S p

  • r

t s Educa

  • n

campus 25 / 39

slide-47
SLIDE 47

Probabilistic Topic Models - LDA

Bag of words

Infer topics of documents Learn hidden process drives the

  • bs.

Generative model

Topic proportion ∼ Dir(α) for a doc Draw a topic, then a word for a token

Topics Topic Proportion

police witness campus police witness campus police witness campus police witness

crime S p

  • r

t s Educa

  • n

campus

Goal

campus police witness

Topic-word matrix P[word = ei|topic = j]

25 / 39

slide-48
SLIDE 48

Moments Matching

Goal: Linearly independent topic-word table

campus police witness E[word|topic = j] =

  • i

P[word = ei|topic = j]ei =column j

26 / 39

slide-49
SLIDE 49

Moments Matching

Goal: Linearly independent topic-word table

campus police witness E[word|topic = j] =

  • i

P[word = ei|topic = j]ei =column j

M1: Occurrence Frequency of Words

E[word] =

  • j

E[word|topic = j]P[topic = j]

= + +

campus police witness

crime Sports Educa

  • n

campus

♣ ✔ ✕ ✖ ✗

e

✇ ✖ ✘ ✙ ✚ ✛ ✛

campus po

ice

itness 26 / 39

slide-50
SLIDE 50

Moments Matching

Goal: Linearly independent topic-word table

campus police witness E[word|topic = j] =

  • i

P[word = ei|topic = j]ei =column j

M1: Occurrence Frequency of Words

E[word] =

  • j

E[word|topic = j]P[topic = j]

= + +

campus police witness

crime Sports Educa

  • n

campus

✣ ✤ ✥ ✦ ✧

e

★ ✦ ✩ ✪ ✫ ✬ ✬

campus po

ice

itness

No unique decomposition of vectors

26 / 39

slide-51
SLIDE 51

Moments Matching

Goal: Linearly independent topic-word table

campus police witness E[word|topic = j] =

  • i

P[word = ei|topic = j]ei =column j

M2: Modified Co-occurrence Frequency of Word Pairs

E[word1 ⊗ word2] =

  • j,j′

E[word1|topic1 = j] ⊗ E[word2|topic2 = j′]P[topic1 = j, topic2 = j′]

= + +

campus police witness

crime Sports Educa

  • n

campus

✰ ✱ ✲ ✳ ✴

e

✵ ✳ ✶ ✷ ✸ ✹ ✹

campus po

ice

itness 26 / 39

slide-52
SLIDE 52

Moments Matching

Goal: Linearly independent topic-word table

campus police witness E[word|topic = j] =

  • i

P[word = ei|topic = j]ei =column j

M2: Modified Co-occurrence Frequency of Word Pairs

E[word1 ⊗ word2] =

  • j,j′

E[word1|topic1 = j] ⊗ E[word2|topic2 = j′]P[topic1 = j, topic2 = j′]

= + +

campus police witness

crime Sports Educa

  • n

campus

✽ ✾ ✿ ❀ ❁

e

❂ ❀ ❃ ❄ ❅ ❆ ❆

campus po

ice

itness

Matrix decomposition recovers subspace, not actual model

26 / 39

slide-53
SLIDE 53

Moments Matching

Goal: Linearly independent topic-word table

Find a W

W W W

such that

M2: Modified Co-occurrence Frequency of Word Pairs

E[word1 ⊗ word2] =

  • j,j′

E[word1|topic1 = j] ⊗ E[word2|topic2 = j′]P[topic1 = j, topic2 = j′]

= + +

campus police witness

crime Sports Educa

  • n

campus

❊ ❋

e

❏ ❍ ❑ ▼ ◆ ❖ ❖

campus po

P

ice

itness

Many such W’s, find one, project data with W

26 / 39

slide-54
SLIDE 54

Moments Matching

Goal: Linearly independent topic-word table

Know a W

W W W

such that

M3: Modified Co-occurrence Frequency of Word Triplets

= + +

W W W Unique orthogonal tensor decomposition, project result with W †

26 / 39

slide-55
SLIDE 55

Moments Matching

Goal: Linearly independent topic-word table

Know a W

W W W

such that

M3: Modified Co-occurrence Frequency of Word Triplets

= + +

W W W Tensor decomposition uniquely discovers the correct model Learning Topic Models through Matrix/Tensor Decomposition

26 / 39

slide-56
SLIDE 56

Mixed Membership Community Models

Mixed memberships

27 / 39

slide-57
SLIDE 57

Mixed Membership Community Models

Mixed memberships What ensures guaranteed learning?

= + +

❘ ❙❯❱ ❲ ❳❨❩ ❬ ❭❪ ❫ ❙❯❲ ❴ ❵ ❛ ❜ ❝ ❞ ❵ ❡

cians

❢ ❣ ❤

e t a r i a n s

❥ ❦ ♠ ♥ ♦ ♥ q r ♠ s ✉ ✈ ① ② ③ ④ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ❶ ❷ ❸

ace

❹ ❺ ❻ ❼ ❽

a

❾ ❿ ➀

27 / 39

slide-58
SLIDE 58

Mixed Membership Community Models

Mixed memberships What ensures guaranteed learning?

= + +

➁ ➂➃➄ ➅ ➆➇➈ ➉ ➊➋ ➌ ➂➃➅ ➍ ➎ ➏ ➐ ➑ ➒ ➎ ➓

cians

➔ → ➣

e t a r i a n s

↔ ↕ ➙ ➛ ➜ ➛ ➝ ➞ ➙ ➟ ➠ ➡ ➢ ➤ ➥ ➦ ➦ ➧ ➨ ➩ ➫ ➭ ➯ ➲ ➳ ➵

ace

➸ ➺ ➻ ➼ ➽

a

➾ ➚ ➪

27 / 39

slide-59
SLIDE 59

Outline

1

Introduction

2

Introduction of Method of Moments and Tensor Notations

3

LDA and Community Models From Data Aggregates to Model Parameters Guaranteed Online Algorithm

4

Quantum Algorithms for Leading Eigenvector Computation

5

Conclusion

28 / 39

slide-60
SLIDE 60

Guaranteed Online Tensor Decomposition

Model is uniquely identifiable! How to identify?

29 / 39

slide-61
SLIDE 61

Guaranteed Online Tensor Decomposition

Model is uniquely identifiable! How to identify?

Online Tensor Decomposition

Tensor T =

i

ai ⊗ ai ⊗ ai ⊗ ai, where ai = 1, a⊤

i aj = 0

29 / 39

slide-62
SLIDE 62

Guaranteed Online Tensor Decomposition

Model is uniquely identifiable! How to identify?

Online Tensor Decomposition

Tensor T =

i

ai ⊗ ai ⊗ ai ⊗ ai, where ai = 1, a⊤

i aj = 0

Objective?

29 / 39

slide-63
SLIDE 63

Guaranteed Online Tensor Decomposition

Model is uniquely identifiable! How to identify?

Online Tensor Decomposition

Tensor T =

i

ai ⊗ ai ⊗ ai ⊗ ai, where ai = 1, a⊤

i aj = 0

Objective? Objective min

∀i,ui2=1

  • i=j

T(ui, ui, uj, uj) Non-convex! Theorem: The proposed objective function has equivalent local optima.

29 / 39

slide-64
SLIDE 64

Guaranteed Online Tensor Decomposition

Model is uniquely identifiable! How to identify?

Online Tensor Decomposition

Tensor T =

i

ai ⊗ ai ⊗ ai ⊗ ai, where ai = 1, a⊤

i aj = 0

Objective? Objective min

∀i,ui2=1

  • i=j

T(ui, ui, uj, uj) Non-convex! Theorem: The proposed objective function has equivalent local optima. Will SGD work?

➶➹ddle Point

29 / 39

slide-65
SLIDE 65

Guaranteed Online Tensor Decomposition

Model is uniquely identifiable! How to identify?

Online Tensor Decomposition

Tensor T =

i

ai ⊗ ai ⊗ ai ⊗ ai, where ai = 1, a⊤

i aj = 0

Objective? Objective min

∀i,ui2=1

  • i=j

T(ui, ui, uj, uj) Non-convex! Theorem: The proposed objective function has equivalent local optima. Will SGD work?

➘➴ddle Point

Theorem: For smooth, twice-diff fn. with non-degenerate saddle points, noisy SGD converges to a local optimum in polynomial steps.

29 / 39

slide-66
SLIDE 66

Guaranteed Online Tensor Decomposition

Model is uniquely identifiable! How to identify?

Online Tensor Decomposition

Tensor T =

i

ai ⊗ ai ⊗ ai ⊗ ai, where ai = 1, a⊤

i aj = 0

Objective? Objective min

∀i,ui2=1

  • i=j

T(ui, ui, uj, uj) Non-convex! Theorem: The proposed objective function has equivalent local optima. Will SGD work?

➷➬ddle Point

Theorem: For smooth, twice-diff fn. with non-degenerate saddle points, noisy SGD converges to a local optimum in polynomial steps. Global Convergence Guarantee For Online Tensor Decomposition

29 / 39

slide-67
SLIDE 67

Why could we escape from saddle points?

Stochastic Gradient Descent with Noise

➮ ➱ddle Point

Saddle point has 0 gradient

“Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition”,by R. Ge, F. Huang, C. Jin, Y. Yuan, COLT 2015. 30 / 39

slide-68
SLIDE 68

Why could we escape from saddle points?

Stochastic Gradient Descent with Noise

escape stuck

Saddle point has 0 gradient Non-degenerate saddle: Hessian has ± eigenvalue

“Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition”,by R. Ge, F. Huang, C. Jin, Y. Yuan, COLT 2015. 30 / 39

slide-69
SLIDE 69

Why could we escape from saddle points?

Stochastic Gradient Descent with Noise

escape stuck

Saddle point has 0 gradient Non-degenerate saddle: Hessian has ± eigenvalue Negative eigenvalue: direction of escape

“Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition”,by R. Ge, F. Huang, C. Jin, Y. Yuan, COLT 2015. 30 / 39

slide-70
SLIDE 70

Why could we escape from saddle points?

Stochastic Gradient Descent with Noise

escape stuck

Saddle point has 0 gradient Non-degenerate saddle: Hessian has ± eigenvalue Negative eigenvalue: direction of escape Noise could help!

“Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition”,by R. Ge, F. Huang, C. Jin, Y. Yuan, COLT 2015. 30 / 39

slide-71
SLIDE 71

Outline

1

Introduction

2

Introduction of Method of Moments and Tensor Notations

3

LDA and Community Models From Data Aggregates to Model Parameters Guaranteed Online Algorithm

4

Quantum Algorithms for Leading Eigenvector Computation

5

Conclusion

31 / 39

slide-72
SLIDE 72

First PCA

PCA problem

Sample S = {xi}m

i=1, where xi ∈ Rd

Q: Identifies the direction of the largest variance in the data?

Problem Formulation

Solving max

u∈Rd,u2=1 u⊤Au,

where covariance matrix A = 1

m m

  • i=1

xixi⊤

Problem Regime

Assume 0 A I and s-sparse (i.e., nnz(each row or column) ≤ s)

32 / 39

slide-73
SLIDE 73

Classical Algorithm

Spectral Gap

Spectral gap ∆ = λ1 − λ2

◮ ordered eigenvalues 1 ≥ λ1 ≥ . . . λd ≥ 0 ◮ and corresponding eigenvectors u1, . . . , ud.

Methods under Warm Start

Warm start: Initialization v0 such that |< v0, u1 >| > φ > 0 Iteration methods achieve ǫ precision: < vk, u1 >≥ 1 − ǫ

◮ Power method

Akv0 Akv0 takes O( sd ∆ log( 1 φǫ))

◮ Lanczos method or accelerated power method takes O( sd

√ ∆ log( 1 φǫ))

⋆ Replacing the monomial Ak by its Chebyshev polynomial approximation

Question: Speedup from O(d) to poly(log d)?

33 / 39

slide-74
SLIDE 74

Quantum Speedup

Motivation

Quantum effects can achieve significant speedup.

Examples

Shor’s algorithm

◮ exponential speed-up for factoring integers

Grover’s algorithm

◮ quadratic speed-up for searching in unstructured database

(Harrow, Hassidim, Lloyd ’09) & (Childs, Kothari, Somma ’17)

◮ Ω(d) → poly(log d) for solving d-dimensional linear equation systems. ◮ weaker output requirement ⋆ a quantum state whose vector representation is roughly the solution to

the linear equation system.

34 / 39

slide-75
SLIDE 75

Quantum Leading PCA

Input model

Quantum oracle which generates a quantum state whose vector representation is v0 and A.

Output model

A quantum state whose vector representation is vk

Main Result

Under warm start | < v0, u1 > | = φ > 0, there is a quantum algorithm which prepares a quantum state with vector representation vk such that < vk, u1 >≥ 1 − ǫ with probability at least 2/3 using O(s log(s/φǫ)/φ √ ∆) queries to quantum oracle UA,s, UA,e O(1/φ) queries to Uv0

  • w. O(s(log d log( s

φǫ) + log3.5( s φǫ))/φ

√ ∆) 2-qubit quantum gates in total. Joint work with Tongyang Li and Xiaodi Wu.

35 / 39

slide-76
SLIDE 76

Intuition for Speedup

Chebyshev polynomials can be significantly accelerated in quantum computation Matrix power Akb is the key

◮ Quantum-walk ⋆ effectively constructs a degree-m Chebyshev polynomial of A/s. ◮ Quantum primitive: the linear combination of unitaries (LCU) ⋆ effectively linearly combines these Chebyshev polynomials to derive the

desired approximation polynomial.

Quantum Computation for Linear Algebraic Problems

36 / 39

slide-77
SLIDE 77

Outline

1

Introduction

2

Introduction of Method of Moments and Tensor Notations

3

LDA and Community Models From Data Aggregates to Model Parameters Guaranteed Online Algorithm

4

Quantum Algorithms for Leading Eigenvector Computation

5

Conclusion

37 / 39

slide-78
SLIDE 78

Summary

Spectral methods reveal hidden structure

Text/Image processing Social networks Neuroscience, healthcare ...

38 / 39

slide-79
SLIDE 79

Summary

Spectral methods reveal hidden structure

Text/Image processing Social networks Neuroscience, healthcare ...

Versatile for latent variable models

Flat model → hierarchical model Sparse coding → convolutional model Efficient, convergence guarantee

= + + = + + = + + = + + M3 f1 sf1 f2 sf2

= +...+ + +...+

escape stuck

38 / 39

slide-80
SLIDE 80

Thank You

furongh@cs.umd.edu

39 / 39