What is Latent Tree Analysis (LTA)? Repeated event co-occurrences - - PowerPoint PPT Presentation

what is latent tree analysis lta repeated event co
SMART_READER_LITE
LIVE PREVIEW

What is Latent Tree Analysis (LTA)? Repeated event co-occurrences - - PowerPoint PPT Presentation

MLA 2017 Latent Tree Analysis Nevin L. Zhang The Hong Kong University of Science and Technology www.cse.ust.hk/~lzhang What is Latent Tree Analysis (LTA)? Repeated event co-occurrences might Due to common hidden causes or genuine direct


slide-1
SLIDE 1

MLA 2017

Latent Tree Analysis

Nevin L. Zhang The Hong Kong University of Science and Technology www.cse.ust.hk/~lzhang

slide-2
SLIDE 2

2

Nevin L. Zhang/HKUST MLA 2017

 Repeated event co-occurrences might  Due to common hidden causes or genuine direct correlations, OR  Be coincidental, esp. in big data  Challenge: Identify co-occurrences due to hidden causes or correlations.  Latent tree analysis solves a related and simpler problem:  Detect co-occurrences that can be statistically explained by a tree of latent

variables

 Can be used to solve interesting tasks

Multidimensional clustering

Hierarchical topic detection

Latent structure discovery

What is Latent Tree Analysis (LTA)?

slide-3
SLIDE 3

3

Nevin L. Zhang/HKUST MLA 2017

Basic Latent Tree Models (LTM)

 Tree-structured Bayesian network  All variables are discrete  Variables at leaf nodes are observed  Variables at internal nodes are latent  Parameters:

P(Y1), P(Y2|Y1),P(X1|Y2), P(X2|Y2), …

 Semantics:

Also known as Hierarchical latent class (HLC) models, HLC models (Zhang. JMLR 2004)

slide-4
SLIDE 4

4

Nevin L. Zhang/HKUST MLA 2017

Pouch Latent Tree Models (PLTM)

 An extension of basic LTM

Rooted tree

Internal nodes represent discrete latent variables

Each leaf node consists of one or more continuous observed variables, called a pouch. (Poon et al. ICML 2010)

slide-5
SLIDE 5

5

Nevin L. Zhang/HKUST MLA 2017

More General Latent Variable Tree Models

 Internal nodes can be observed  Internal nodes can be continuous  Forest  Primary focus of this talk: the basic LTM

(Choi et al. JMLR 2011)

slide-6
SLIDE 6

6

Nevin L. Zhang/HKUST MLA 2017

 Root change lead to equivalent models  So, edge orientations unidentifiable  Hence, we are really talking about undirected

models

Undirected LTM represents an equivalent class

  • f directed LTMs

In implementation, represented as directed model instead of MRF so that partition function is always 1.

Identifiability Issues

(Zhang, JMLR 2004)

slide-7
SLIDE 7

7

Nevin L. Zhang/HKUST MLA 2017

Identifiability Issues

 |X|: Cardinality of variable X, i.e., the number of states.

  • Theor
  • rem:

em: The set of all regular models for a given set of observed variables is finite. (Zhang, JMLR 2004) Latent variables cannot have too many states.

slide-8
SLIDE 8

8

Nevin L. Zhang/HKUST MLA 2017

Latent Tree Analysis (LTA)

Learning latent tree models: Determine

  • Number of latent variables
  • Numbers of possible states for each latent variable
  • Connections among variables
  • Probability distributions
slide-9
SLIDE 9

9

Nevin L. Zhang/HKUST MLA 2017

Latent Tree Analysis (LTA)

Learning latent tree models: Determine

  • Number of latent variables
  • Numbers of possible states for each latent variable
  • Connections among nodes
  • Probability distributions

Difficult, but doable

slide-10
SLIDE 10

10

Nevin L. Zhang/HKUST MLA 2017

 Setting 1: CLRG (Choi et al, 2011, Huang et al. 2015)

Assume that data generated from an unknown LTM.

Investigate properties of LTMs and use them for learning

 E.g., model structure from tree additivity of information distance 

Theoretical guarantees to recover generative model under conditions

 Setting 2: EAST, BI (Chen et al, 2012, Liu et al, 2013)

Do not assume that data generated from an LTM.

Fit to LTM to data using BIC score, via search or heuristics

Does not make sense to talk about theoretical guarantees

Obtains better models than Setting 1 because the assumption usually untrue.

 Setting 3: HLTA (Liu et al, 2014, Chen et al, 2016)

Consider usefulness in addition to model fit. Hierarchy of latent variables.

Three Settings for Algorithm Developments

slide-11
SLIDE 11

11

Nevin L. Zhang/HKUST MLA 2017

Current Capabilities

 Takes a few hours on a single machine to analyze data sets with  Thousands of variables, and  Hundreds of thousands of instances  Significant additional speedup can be achieved via simplification and parallel

computing

slide-12
SLIDE 12

12

Nevin L. Zhang/HKUST MLA 2017

 Multidimensional clustering  Hierarchical topic detection  Latent structure discovery  Other applications

What can LTA be used for?

slide-13
SLIDE 13

13

Nevin L. Zhang/HKUST MLA 2017

 Multidimensional clustering  Hierarchical topic detection  Latent structure discovery  Other applications

What can LTA be used for?

slide-14
SLIDE 14

14

Nevin L. Zhang/HKUST MLA 2017

How to Cluster?

 Cluster analysis: Grouping of objects into clusters such that

Objects in the same cluster are similar

Objects from different clusters are dissimilar.

slide-15
SLIDE 15

15

Nevin L. Zhang/HKUST MLA 2017

How to Cluster these?

slide-16
SLIDE 16

16

Nevin L. Zhang/HKUST MLA 2017

How to Cluster these?

slide-17
SLIDE 17

17

Nevin L. Zhang/HKUST MLA 2017

How to Cluster these?

slide-18
SLIDE 18

18

Nevin L. Zhang/HKUST MLA 2017

Multidimensional Clustering

 Complex data usually have multiple facets, and can be meaningfully

partitioned in multiple ways.

 More reasonable to look for multiple ways to partition data  How to get multiple partitions?

slide-19
SLIDE 19

19

Nevin L. Zhang/HKUST MLA 2017

How to get one partition?

 Finite mixture models: One latent variable z  Gaussian mixture models: Continuous data  Latent class model (mixture of multinomial

distributions): Categorical data

 Key point: Use models with one latent variable for

  • ne partition
slide-20
SLIDE 20

20

Nevin L. Zhang/HKUST MLA 2017

 Use models with multiple latent variables for multiple partitions  Latent tree models  Probabilistic graphical models with multiple latent variables  A generalization of latent class models

How to get multiple partitions?

slide-21
SLIDE 21

21

Nevin L. Zhang/HKUST MLA 2017

Multidimensional Clustering of Social Survey Data

// Survey on corruption in Hong Kong and performance of the anti-corruption agency -- ICAC

//31 questions, 1200 samples

C_City: s0 s1 s2 s3 // very common, quite common, uncommon, very uncommon C_Gov: s0 s1 s2 s3 C_Bus: s0 s1 s2 s3 Tolerance_C_Gov: s0 s1 s2 s3 //totally intolerable, intolerable, tolerable, totally tolerable Tolerance_C_Bus: s0 s1 s2 s3 WillingReport_C: s0 s1 s2 // yes, no, depends LeaveContactInfo: s0 s1 // yes, no I_EncourageReport: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... I_Effectiveness: s0 s1 s2 s3 s4 //very e, e, a, in-e, very in-e I_Deterrence: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... …..

  • 1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 0 1 1 -1 -1 2 0 2 2 1 3 1 1 4 1 0 1.0
  • 1 -1 -1 0 0 -1 -1 1 1 -1 -1 0 0 -1 1 -1 1 3 2 2 0 0 0 2 1 2 0 0 2 1 0 1.0
  • 1 -1 -1 0 0 -1 -1 2 1 2 0 0 0 2 -1 -1 1 1 1 0 2 0 1 2 -1 2 0 1 2 1 0 1.0

….

(Chen et al, AIJ 2012)

slide-22
SLIDE 22

22

Nevin L. Zhang/HKUST MLA 2017

Latent Structure Discovery

Y2: Demographic background; Y4: ICAC performance; Y6: Level of corruption; Y3: Tolerance toward corruption; Y5: Change in level of corruption; Y7: ICAC accountability

slide-23
SLIDE 23

23

Nevin L. Zhang/HKUST MLA 2017

Multidimensional Clustering

Y2=s0: Low income youngsters; Y2=s1: Women with no/little income; Y2=s2: people with good education and good income; Y2=s3: people with poor education and average income.

slide-24
SLIDE 24

24

Nevin L. Zhang/HKUST MLA 2017

Multidimensional Clustering

Interpretations of values of latent variables

Y3=s0: people who find corruption totally intolerable; 57% Y3=s1: people who find corruption intolerable; 27% Y3=s2: people who find corruption tolerable; 15%

Interesting finding:

People who are tough on corruption are equally tough toward C-Gov and C-Bus. People who are lenient about corruption are more lenient toward C-Bus than C-Gov Values of observed variable: S0 - totally intolerable, …., s3 -totally tolerable

slide-25
SLIDE 25

25

Nevin L. Zhang/HKUST MLA 2017

Multidimensional Clustering

  • Who are the toughest toward corruption among the 4

groups?

Interesting finding:

Y2=s2: ( good education and good income) the toughest on corruption. Y2=s3: (poor education and average income) the most lenient on corruption The other two classes are in between.

slide-26
SLIDE 26

26

Nevin L. Zhang/HKUST MLA 2017

 Latent tree analysis has  found several interesting ways to partition the ICAC data, and  revealed some interesting relationships between different partitions.

Multidimensional Clustering Summary

(Chen et al. AIJ 2012)

slide-27
SLIDE 27

27

Nevin L. Zhang/HKUST MLA 2017

 Multidimensional clustering  Hierarchical topic detection  Latent structure discovery  Other applications

What Can LTA be used for?

slide-28
SLIDE 28

28

Nevin L. Zhang/HKUST MLA 2017

 Each word is a binary variable

0 – absence from doc, 1 – presence in doc

 Each document is a binary vector over vocabulary

Hierarchical Latent Tree Analysis (HLTA)

slide-29
SLIDE 29

29

Nevin L. Zhang/HKUST MLA 2017

 Each latent variable partitions docs into 2 clusters

 Document clusters interpreted as topics  Z14=0: background topic  Z14=1: “video-card-driver”  Each latent variable gives one topic

Topics

slide-30
SLIDE 30

30

Nevin L. Zhang/HKUST MLA 2017

 Latent variables at high levels  “long-range” word co-occurrences, more general topics  Latent variables at low levels  “short-range” word co-occurrences, more specific topics.

Topic Hierarchy

slide-31
SLIDE 31

31

Nevin L. Zhang/HKUST MLA 2017

 From UCI  300,000 articles from 1987 - 2007  10,000 words selected using TF/IDF  HLTA took 7 hours on a desktop machine

The New York Times Dataset

slide-32
SLIDE 32

32

Nevin L. Zhang/HKUST MLA 2017

Model Learned from 300,000 New York Times Articles

(http://www.cse.ust.hk/~lzhang/topic/NYTimes/NYT-graph.pdf)

slide-33
SLIDE 33

33

Nevin L. Zhang/HKUST MLA 2017

slide-34
SLIDE 34

34

Nevin L. Zhang/HKUST MLA 2017

slide-35
SLIDE 35

35

Nevin L. Zhang/HKUST MLA 2017

 Latent tree analysis gives a novel method for hierarchical topic detection.

It is able to find meaningful topics and topic hierarchies.

 It differs from the LDA-based methods in several important ways.  In empirical evaluations, the new method significantly outperforms the

LDA methods.

 aipano.cse.ust.hk is based on HLTA.

Hierarchical Latent Tree Analysis: Summary

(Chen et al. AAAI 2016, AIJ 2017)

slide-36
SLIDE 36

36

Nevin L. Zhang/HKUST MLA 2017

 Multidimensional clustering  Hierarchical topic detection  Latent structure discovery  Other applications

What Can LTA be used for?

slide-37
SLIDE 37

37

Nevin L. Zhang/HKUST MLA 2017

 Commonalities between HLTM vs DBN  Define distribution over observed binary variables  Multiple layers of latent variables  Differences between HLTM vs DBN  HLTM: Tree-structured, learned from data  DBN: Full-connections between layers, manually specified

Link to Deep Learning

slide-38
SLIDE 38

38

Nevin L. Zhang/HKUST MLA 2017

 Potential method for learning structures for deep models  Learn HLTM from data  Use it as skeleton for deep model  Add additional links for better model fit

Link to Deep Learning

slide-39
SLIDE 39

39

Nevin L. Zhang/HKUST MLA 2017

 Sparse Boltzmann Machines (SBM) (Chen et al, AAAI 2017)

The idea can be applied to feedforward networks and CNNs.

Link to Deep Learning

slide-40
SLIDE 40

40

Nevin L. Zhang/HKUST MLA 2017

 Use latent tree models to model co-consumption of items

The latent variables identify user taste groups

Potentially giving rise to a brand new approach to collaborative filtering

Analysis of Online Transaction Data

slide-41
SLIDE 41

41

Nevin L. Zhang/HKUST MLA 2017

Use latent tree models to model co-occurrence of symptoms

Results from traditional Chinese medicine (TCM) data

 Find patterns that correspond to TCM concepts  Show TCM theories have (statistical) scientific contents.

Analysis of Data from Medicine

(Zhang et al. JACM 2008)

slide-42
SLIDE 42

42

Nevin L. Zhang/HKUST MLA 2017

 Quote from: D. Haughton and J. Haughton. Living

Standards Analytics: Development through the Lens of Household Survey Data. Springer. 2012

 Zhang et al. provide a very interesting application of

latent class (tree) models to diagnoses in traditional Chinese medicine (TCM).

 The results tend to confirm known theories in Chinese

traditional medicine.

 This is a significant advance, since the scientific bases

for these theories are not known.

 The model proposed by the authors provides at least a

statistical justification for them.

Analysis of Data from Traditional Chinese Medicine (TCM)

slide-43
SLIDE 43

43

Nevin L. Zhang/HKUST MLA 2017

 TCM patient classification is subjective. No gold standard.  An data-driven approach to establish the objective standards

1.

Cluster patients based on symptom patterns

2.

Identify patient clusters that correspond to TCM classes

3.

Use the characteristics of the clusters to establish classification rules

 This way, objective classification rules established based on unlabeled data.  Latent tree models used in step 1

A Practical Problem with TCM: Patient Classification

slide-44
SLIDE 44

44

Nevin L. Zhang/HKUST MLA 2017

(Gitter et al, 2016)

 Use latent tree models to model co-expressions of genes  Helpful in reconstructing transcriptional regulatory networks

Analysis of Gene Expression Data

slide-45
SLIDE 45

45

Nevin L. Zhang/HKUST MLA 2017

 Multidimensional clustering  Hierarchical topic detection  Latent structure discovery  Other applications

What Can LTA be used for?

slide-46
SLIDE 46

46

Nevin L. Zhang/HKUST MLA 2017

 Attractive Representation of Joint Distributions  Computationally very simple to work with.  Represent complex relationships among observed variables.  Those two properties are exploited to build tractable probabilistic models in

(Wang, Zhang, and Chen, 2008; Kaltwang, Todorovic, and Pantic, 2015; Yu, Huang, and Dauwels, 2016).

LTMs for Probabilistic Modelling

(Pear 1988)

slide-47
SLIDE 47

47

Nevin L. Zhang/HKUST MLA 2017

 Latent class model (LCM)  LTM with 1 latent variable

Known as mixture of multinomials.

Widely used for cluster analysis in social, behavioral and health sciences (Collins and Lanza, 2010).

Key weakness: Local independence assumption is too strong.

Latent tree models offer a natural framework where the local independence assumption can be relaxed.

Uni-Dimensional Clustering of Categorical Data

slide-48
SLIDE 48

48

Nevin L. Zhang/HKUST MLA 2017

 Co-occurrence is a fundamental phenomenon in data, and it is ubiquitous

Co-occurrences of words in documents,

Co-consumption of items in online transaction data

Co-occurrences of symptoms among patients

Co-expressions of genes

 Latent tree analysis is useful tool for modelling co-occurrences, and it has many

applications

Multidimensional clustering, Tractable probabilistic models, Relax local independence assumption of latent class models

Hierarchical topic detection, Recommendation making, structure learning (for deep models)

Medicine, survey studies

Summary