What is Latent Tree Analysis (LTA)? Repeated event co-occurrences - - PowerPoint PPT Presentation
What is Latent Tree Analysis (LTA)? Repeated event co-occurrences - - PowerPoint PPT Presentation
MLA 2017 Latent Tree Analysis Nevin L. Zhang The Hong Kong University of Science and Technology www.cse.ust.hk/~lzhang What is Latent Tree Analysis (LTA)? Repeated event co-occurrences might Due to common hidden causes or genuine direct
2
Nevin L. Zhang/HKUST MLA 2017
Repeated event co-occurrences might Due to common hidden causes or genuine direct correlations, OR Be coincidental, esp. in big data Challenge: Identify co-occurrences due to hidden causes or correlations. Latent tree analysis solves a related and simpler problem: Detect co-occurrences that can be statistically explained by a tree of latent
variables
Can be used to solve interesting tasks
Multidimensional clustering
Hierarchical topic detection
Latent structure discovery
…
What is Latent Tree Analysis (LTA)?
3
Nevin L. Zhang/HKUST MLA 2017
Basic Latent Tree Models (LTM)
Tree-structured Bayesian network All variables are discrete Variables at leaf nodes are observed Variables at internal nodes are latent Parameters:
P(Y1), P(Y2|Y1),P(X1|Y2), P(X2|Y2), …
Semantics:
Also known as Hierarchical latent class (HLC) models, HLC models (Zhang. JMLR 2004)
4
Nevin L. Zhang/HKUST MLA 2017
Pouch Latent Tree Models (PLTM)
An extension of basic LTM
Rooted tree
Internal nodes represent discrete latent variables
Each leaf node consists of one or more continuous observed variables, called a pouch. (Poon et al. ICML 2010)
5
Nevin L. Zhang/HKUST MLA 2017
More General Latent Variable Tree Models
Internal nodes can be observed Internal nodes can be continuous Forest Primary focus of this talk: the basic LTM
(Choi et al. JMLR 2011)
6
Nevin L. Zhang/HKUST MLA 2017
Root change lead to equivalent models So, edge orientations unidentifiable Hence, we are really talking about undirected
models
Undirected LTM represents an equivalent class
- f directed LTMs
In implementation, represented as directed model instead of MRF so that partition function is always 1.
Identifiability Issues
(Zhang, JMLR 2004)
7
Nevin L. Zhang/HKUST MLA 2017
Identifiability Issues
|X|: Cardinality of variable X, i.e., the number of states.
- Theor
- rem:
em: The set of all regular models for a given set of observed variables is finite. (Zhang, JMLR 2004) Latent variables cannot have too many states.
8
Nevin L. Zhang/HKUST MLA 2017
Latent Tree Analysis (LTA)
Learning latent tree models: Determine
- Number of latent variables
- Numbers of possible states for each latent variable
- Connections among variables
- Probability distributions
9
Nevin L. Zhang/HKUST MLA 2017
Latent Tree Analysis (LTA)
Learning latent tree models: Determine
- Number of latent variables
- Numbers of possible states for each latent variable
- Connections among nodes
- Probability distributions
Difficult, but doable
10
Nevin L. Zhang/HKUST MLA 2017
Setting 1: CLRG (Choi et al, 2011, Huang et al. 2015)
Assume that data generated from an unknown LTM.
Investigate properties of LTMs and use them for learning
E.g., model structure from tree additivity of information distance
Theoretical guarantees to recover generative model under conditions
Setting 2: EAST, BI (Chen et al, 2012, Liu et al, 2013)
Do not assume that data generated from an LTM.
Fit to LTM to data using BIC score, via search or heuristics
Does not make sense to talk about theoretical guarantees
Obtains better models than Setting 1 because the assumption usually untrue.
Setting 3: HLTA (Liu et al, 2014, Chen et al, 2016)
Consider usefulness in addition to model fit. Hierarchy of latent variables.
Three Settings for Algorithm Developments
11
Nevin L. Zhang/HKUST MLA 2017
Current Capabilities
Takes a few hours on a single machine to analyze data sets with Thousands of variables, and Hundreds of thousands of instances Significant additional speedup can be achieved via simplification and parallel
computing
12
Nevin L. Zhang/HKUST MLA 2017
Multidimensional clustering Hierarchical topic detection Latent structure discovery Other applications
What can LTA be used for?
13
Nevin L. Zhang/HKUST MLA 2017
Multidimensional clustering Hierarchical topic detection Latent structure discovery Other applications
What can LTA be used for?
14
Nevin L. Zhang/HKUST MLA 2017
How to Cluster?
Cluster analysis: Grouping of objects into clusters such that
Objects in the same cluster are similar
Objects from different clusters are dissimilar.
15
Nevin L. Zhang/HKUST MLA 2017
How to Cluster these?
16
Nevin L. Zhang/HKUST MLA 2017
How to Cluster these?
17
Nevin L. Zhang/HKUST MLA 2017
How to Cluster these?
18
Nevin L. Zhang/HKUST MLA 2017
Multidimensional Clustering
Complex data usually have multiple facets, and can be meaningfully
partitioned in multiple ways.
More reasonable to look for multiple ways to partition data How to get multiple partitions?
19
Nevin L. Zhang/HKUST MLA 2017
How to get one partition?
Finite mixture models: One latent variable z Gaussian mixture models: Continuous data Latent class model (mixture of multinomial
distributions): Categorical data
Key point: Use models with one latent variable for
- ne partition
20
Nevin L. Zhang/HKUST MLA 2017
Use models with multiple latent variables for multiple partitions Latent tree models Probabilistic graphical models with multiple latent variables A generalization of latent class models
How to get multiple partitions?
21
Nevin L. Zhang/HKUST MLA 2017
Multidimensional Clustering of Social Survey Data
// Survey on corruption in Hong Kong and performance of the anti-corruption agency -- ICAC
//31 questions, 1200 samples
C_City: s0 s1 s2 s3 // very common, quite common, uncommon, very uncommon C_Gov: s0 s1 s2 s3 C_Bus: s0 s1 s2 s3 Tolerance_C_Gov: s0 s1 s2 s3 //totally intolerable, intolerable, tolerable, totally tolerable Tolerance_C_Bus: s0 s1 s2 s3 WillingReport_C: s0 s1 s2 // yes, no, depends LeaveContactInfo: s0 s1 // yes, no I_EncourageReport: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... I_Effectiveness: s0 s1 s2 s3 s4 //very e, e, a, in-e, very in-e I_Deterrence: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... …..
- 1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 0 1 1 -1 -1 2 0 2 2 1 3 1 1 4 1 0 1.0
- 1 -1 -1 0 0 -1 -1 1 1 -1 -1 0 0 -1 1 -1 1 3 2 2 0 0 0 2 1 2 0 0 2 1 0 1.0
- 1 -1 -1 0 0 -1 -1 2 1 2 0 0 0 2 -1 -1 1 1 1 0 2 0 1 2 -1 2 0 1 2 1 0 1.0
….
(Chen et al, AIJ 2012)
22
Nevin L. Zhang/HKUST MLA 2017
Latent Structure Discovery
Y2: Demographic background; Y4: ICAC performance; Y6: Level of corruption; Y3: Tolerance toward corruption; Y5: Change in level of corruption; Y7: ICAC accountability
23
Nevin L. Zhang/HKUST MLA 2017
Multidimensional Clustering
Y2=s0: Low income youngsters; Y2=s1: Women with no/little income; Y2=s2: people with good education and good income; Y2=s3: people with poor education and average income.
24
Nevin L. Zhang/HKUST MLA 2017
Multidimensional Clustering
Interpretations of values of latent variables
Y3=s0: people who find corruption totally intolerable; 57% Y3=s1: people who find corruption intolerable; 27% Y3=s2: people who find corruption tolerable; 15%
Interesting finding:
People who are tough on corruption are equally tough toward C-Gov and C-Bus. People who are lenient about corruption are more lenient toward C-Bus than C-Gov Values of observed variable: S0 - totally intolerable, …., s3 -totally tolerable
25
Nevin L. Zhang/HKUST MLA 2017
Multidimensional Clustering
- Who are the toughest toward corruption among the 4
groups?
Interesting finding:
Y2=s2: ( good education and good income) the toughest on corruption. Y2=s3: (poor education and average income) the most lenient on corruption The other two classes are in between.
26
Nevin L. Zhang/HKUST MLA 2017
Latent tree analysis has found several interesting ways to partition the ICAC data, and revealed some interesting relationships between different partitions.
Multidimensional Clustering Summary
(Chen et al. AIJ 2012)
27
Nevin L. Zhang/HKUST MLA 2017
Multidimensional clustering Hierarchical topic detection Latent structure discovery Other applications
What Can LTA be used for?
28
Nevin L. Zhang/HKUST MLA 2017
Each word is a binary variable
0 – absence from doc, 1 – presence in doc
Each document is a binary vector over vocabulary
Hierarchical Latent Tree Analysis (HLTA)
29
Nevin L. Zhang/HKUST MLA 2017
Each latent variable partitions docs into 2 clusters
Document clusters interpreted as topics Z14=0: background topic Z14=1: “video-card-driver” Each latent variable gives one topic
Topics
30
Nevin L. Zhang/HKUST MLA 2017
Latent variables at high levels “long-range” word co-occurrences, more general topics Latent variables at low levels “short-range” word co-occurrences, more specific topics.
Topic Hierarchy
31
Nevin L. Zhang/HKUST MLA 2017
From UCI 300,000 articles from 1987 - 2007 10,000 words selected using TF/IDF HLTA took 7 hours on a desktop machine
The New York Times Dataset
32
Nevin L. Zhang/HKUST MLA 2017
Model Learned from 300,000 New York Times Articles
(http://www.cse.ust.hk/~lzhang/topic/NYTimes/NYT-graph.pdf)
33
Nevin L. Zhang/HKUST MLA 2017
34
Nevin L. Zhang/HKUST MLA 2017
35
Nevin L. Zhang/HKUST MLA 2017
Latent tree analysis gives a novel method for hierarchical topic detection.
It is able to find meaningful topics and topic hierarchies.
It differs from the LDA-based methods in several important ways. In empirical evaluations, the new method significantly outperforms the
LDA methods.
aipano.cse.ust.hk is based on HLTA.
Hierarchical Latent Tree Analysis: Summary
(Chen et al. AAAI 2016, AIJ 2017)
36
Nevin L. Zhang/HKUST MLA 2017
Multidimensional clustering Hierarchical topic detection Latent structure discovery Other applications
What Can LTA be used for?
37
Nevin L. Zhang/HKUST MLA 2017
Commonalities between HLTM vs DBN Define distribution over observed binary variables Multiple layers of latent variables Differences between HLTM vs DBN HLTM: Tree-structured, learned from data DBN: Full-connections between layers, manually specified
Link to Deep Learning
38
Nevin L. Zhang/HKUST MLA 2017
Potential method for learning structures for deep models Learn HLTM from data Use it as skeleton for deep model Add additional links for better model fit
Link to Deep Learning
39
Nevin L. Zhang/HKUST MLA 2017
Sparse Boltzmann Machines (SBM) (Chen et al, AAAI 2017)
The idea can be applied to feedforward networks and CNNs.
Link to Deep Learning
40
Nevin L. Zhang/HKUST MLA 2017
Use latent tree models to model co-consumption of items
The latent variables identify user taste groups
Potentially giving rise to a brand new approach to collaborative filtering
Analysis of Online Transaction Data
41
Nevin L. Zhang/HKUST MLA 2017
Use latent tree models to model co-occurrence of symptoms
Results from traditional Chinese medicine (TCM) data
Find patterns that correspond to TCM concepts Show TCM theories have (statistical) scientific contents.
Analysis of Data from Medicine
(Zhang et al. JACM 2008)
42
Nevin L. Zhang/HKUST MLA 2017
Quote from: D. Haughton and J. Haughton. Living
Standards Analytics: Development through the Lens of Household Survey Data. Springer. 2012
Zhang et al. provide a very interesting application of
latent class (tree) models to diagnoses in traditional Chinese medicine (TCM).
The results tend to confirm known theories in Chinese
traditional medicine.
This is a significant advance, since the scientific bases
for these theories are not known.
The model proposed by the authors provides at least a
statistical justification for them.
Analysis of Data from Traditional Chinese Medicine (TCM)
43
Nevin L. Zhang/HKUST MLA 2017
TCM patient classification is subjective. No gold standard. An data-driven approach to establish the objective standards
1.
Cluster patients based on symptom patterns
2.
Identify patient clusters that correspond to TCM classes
3.
Use the characteristics of the clusters to establish classification rules
This way, objective classification rules established based on unlabeled data. Latent tree models used in step 1
A Practical Problem with TCM: Patient Classification
44
Nevin L. Zhang/HKUST MLA 2017
(Gitter et al, 2016)
Use latent tree models to model co-expressions of genes Helpful in reconstructing transcriptional regulatory networks
Analysis of Gene Expression Data
45
Nevin L. Zhang/HKUST MLA 2017
Multidimensional clustering Hierarchical topic detection Latent structure discovery Other applications
What Can LTA be used for?
46
Nevin L. Zhang/HKUST MLA 2017
Attractive Representation of Joint Distributions Computationally very simple to work with. Represent complex relationships among observed variables. Those two properties are exploited to build tractable probabilistic models in
(Wang, Zhang, and Chen, 2008; Kaltwang, Todorovic, and Pantic, 2015; Yu, Huang, and Dauwels, 2016).
LTMs for Probabilistic Modelling
(Pear 1988)
47
Nevin L. Zhang/HKUST MLA 2017
Latent class model (LCM) LTM with 1 latent variable
Known as mixture of multinomials.
Widely used for cluster analysis in social, behavioral and health sciences (Collins and Lanza, 2010).
Key weakness: Local independence assumption is too strong.
Latent tree models offer a natural framework where the local independence assumption can be relaxed.
Uni-Dimensional Clustering of Categorical Data
48
Nevin L. Zhang/HKUST MLA 2017
Co-occurrence is a fundamental phenomenon in data, and it is ubiquitous
Co-occurrences of words in documents,
Co-consumption of items in online transaction data
Co-occurrences of symptoms among patients
Co-expressions of genes
…
Latent tree analysis is useful tool for modelling co-occurrences, and it has many
applications
Multidimensional clustering, Tractable probabilistic models, Relax local independence assumption of latent class models
Hierarchical topic detection, Recommendation making, structure learning (for deep models)