Latent Tree Copulas Sergey Kirshner skirshne@purdue.edu Purdue - PowerPoint PPT Presentation

Latent Tree Copulas Sergey Kirshner skirshne@purdue.edu Purdue University West Lafayette, IN, USA Granada, Spain, September 19, 2012

Coming Attractions • Want to fit density to model multivariate data? – and organize real-valued data into a hierarchy of features? • New density estimation model based on tree- structured dependence with latent variables – Distribution = Univariate Marginals + Copula – Hierarchy of variables as a latent tree-copula – Parameter estimation and structure learning • Efficient inference for Gaussian copulas (100s of variables), several structure learning approaches • Variational inference for other copulas (10-30 variables) September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 2

Building a Hierarchy of Rainfall Stations 42 41.5 41 40.5 State of Indiana 40 Latitude (USA) 39.5 Average monthly 39 observations for 15 rainfall stations 38.5 1951-1996 (47 years) 38 37.5 -88.5 -88 -87.5 -87 -86.5 -86 -85.5 -85 -84.5 Longitude September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 3

Most Popular Distribution… • Interpretable • Closed under taking marginals 0.2 • Generalizes to 0.15 multiple dimensions 0.1 • Models pairwise 0.05 dependence 0 • Tractable 3 2 3 1 2 0 1 • 245 pages out of 691 0 -1 -1 -2 -2 -3 from Continuous -3 Multivariate Distributions by Kotz, Balakrishnan, and Johnson September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 4

What If the Data Is NOT Gaussian? September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 5

Separating Univariate Marginals univariate marginals, multivariate dependence term, independent variables, copula September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 6

Monotonic Transformation of the Variables September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 7

Copula Copula C is a multivariate distribution (cdf) defined on a unit hypercube with uniform univariate marginals: September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 8

Sklar’s Theorem [Sklar 59] = + September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 9

Example: Multivariate Gaussian Copula September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 10

Separating Univariate Marginals 1. Fit univariate marginals (parametric or non- parametric) 2. Replace data points with cdf’s of the marginals 3. Estimate copula density Inference for the margins [Joe and Xu 96]; canonical maximum likelihood [Genest et al 95] September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 11

Graphical Model Using a Copula x 2 x 3 x 2 a 2 x 3 a 3 x 1 x 4 a 1 a 4 x 5 a 5 x 1 x 4 x 5 September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 12

Graphical Model Approaches to Estimating Copulas • Vines [Bedford and Cooke 02] • Trees [Kirshner 08] • Nonparanormal model [Liu et al 09] • Copula Bayesian networks [Elidan 10] September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 13

Tree-Structured Densities x 2 x 3 x 1 x 4 x 5 September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 14

Tree-Structured Copulas [Kirshner 08] x 2 x 3 x 2 a 2 a 3 x 3 x 1 x 4 a 1 a 4 x 5 a 5 x 1 x 4 x 5 September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 15

Using Tree-Structured Copulas • Tree-structured copulas are convenient, but are restrictive – True distribution may require much larger cliques to decompose • Can approximate other dependencies using latent variables – Mixtures [Kirshner 08] : discrete latent variables – Latent tree copulas: continuous random variables embedded in copula trees September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 16

Latent Tree Copulas x 2 x 3 a 2 a 3 x 2 x 3 a 6 a 7 a 8 x 6 x 7 x 8 a 1 a 5 a 4 x 1 x 5 x 4 x 1 x 4 x 5 • Defined as a tuple of variables, tree structure, and bivariate copulas September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 17

Latent Tree Copulas x 2 x 3 a 2 a 3 x 2 x 3 a 6 a 7 a 8 x 6 x 7 x 8 a 1 a 5 a 4 x 1 x 5 x 4 x 1 x 4 x 5 • Defined as a tuple of variables, tree structure, and bivariate copulas • “Siblings” of latent tree models (LTMs) for categorical variables [e.g., Zhang 02, 04] September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 18

Inference a 2 a 5 a 3 a 1 a 6 a 7 a 8 a 4 • Good news: posterior distribution is also tree- structured – Fairly easy to carry out inference for LTMs • Bad news: Latent variables are continuous: infinite number of possible values – Need to estimate the joint posterior densities September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 19

Inference a 2 a 5 a 3 a 1 a 6 a 7 a 8 a 4 • Easy for Gaussian copulas – Apply inverse standard normal CDF; use belief propagation on jointly Gaussian distribution • Difficult for non-Gaussian copulas – May have no exact form for the posterior! September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 20

Inference for non-Gaussian Case • Variational approach: – Approximate the posterior distribution using a tree-structured distribution over piece-wise uniform variables – Essentially, approximate using the tree over categorical variables – Use s iterations to find the fixed point – Requires integrating logarithm of bivariate copula pdfs – numerical integration! September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 21

Parameter Estimation with Known Structure a 2 a 5 a 3 a 1 a 6 a 7 a 8 a 4 • (Variational) EM – E-step: minimize KL divergence – M-step: maximize the expected compete-data log- likelihood September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 22

Parameter Estimation with Known Structure a 2 a 5 a 3 a 1 a 6 a 7 a 8 a 4 • Gaussian copula case: EM – E-step: closed form inference, O(Nt) per iteration – M-step: maximize the expected compete-data log- likelihood September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 23

Parameter Estimation with Known Structure a 2 a 5 a 3 a 1 a 6 a 7 a 8 a 4 • Non-gaussian copula case: variational EM – E-step: approximate inference, O(sN|E|k 2 ) + |E| bivariate integrals per iteration – M-step: approximate maximization, need to update |E| bivariate copula parameters September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 24

Unknown Structure • Gaussian LTCs: same as for tree-structured Gaussians – Size of possible trees can be limited – e.g., can use information distances [Choi et al 2011] • Non-Gaussian LTCs: need to restrict the space of possible models – Very large space of structures/copula families – Fix the bivariate copula family – Consider only binary latent tree copulas • Observed nodes = leafs • Motivation: Any Gaussian LTC is equivalent to some binary LTC September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 25

Bottom-up Binary LTC Learning [Similar to Harmeling and Williams 11] • Initialize the subtrees to consist of individual variables (variable = root of a subtree) • Iterate until all variables are in one tree – Estimate mutual informations (Mis) between the root nodes – Pick the pair of roots with the largest MI – Merge the subtrees by creating a new latent root node – Re-estimate parameters (EM) September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 26

Bottom-up Binary LTC Learning [Similar to Harmeling and Williams 11] a 9 θ 89 a 8 θ 19 θ 68 θ 78 a 7 a 6 θ 26 θ 46 θ 37 θ 57 a 1 a 2 a 4 a 3 a 5 September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 27

Illustration for Building of Hierarchy of Rainfall Stations 42 41.5 41 40.5 State of Indiana 40 Latitude (USA) 39.5 Average monthly 39 observations for 15 rainfall stations 38.5 1951-1996 (47 years) 38 37.5 -88.5 -88 -87.5 -87 -86.5 -86 -85.5 -85 -84.5 Longitude September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 28

Experiments: Log-Likelihood on Test Data UCI ML -2.6 Repository MAGIC data set -2.7 Log-likelihood per feature 12000 10- -2.8 dimensional vectors -2.9 Independent KDE 2000 examples in Product KDE test sets Gaussian -3 Gaussian Copula Gaussian TCopula Average over 10 Frank TCopula -3.1 partitions 2-mix of Gaussian TCopulas Gaussian LTC -3.2 50 100 200 500 1000 2000 5000 10000 Training set size September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 29

Summary • Multivariate distribution = univariate marginals + copula • New model: tree-structured multivariate distribution with marginally uniform latent variables (latent tree copula, LTC) – Sufficient to employ only bivariate copula families! • Closed form inference for Gaussian copulas (efficient) • Variational inference for non-Gaussian copulas (slow) • Parameter estimation using the EM algorithm • Bottom-up structure learning for bivariate LTCs • Can be used for parsimonious multivariate density estimation or to structure variables into hierarchies September 19, 2012 Sergey Kirshner, Latent Tree Copulas (PGM 2012) 30

Latent Tree Copulas Sergey Kirshner skirshne@purdue.edu Purdue - PowerPoint PPT Presentation

Latent Tree Copulas Sergey Kirshner skirshne@purdue.edu Purdue University West Lafayette, IN, USA Granada, Spain, September 19, 2012 Coming Attractions Want to fit density to model multivariate data? and organize real-valued data

Goodness-of-Fit Testing with Empirical Copulas Sami Umut Can John Einmahl Roger Laeven

Vine Copulas as a Way to Describe and Main Idea: Using . . . Analyze Multi-Variate Dependence in

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Copulas: An Introduction Part II: Models Johan Segers Universit catholique de Louvain (BE)

Why Invariant Functions . . . Clayton & Gumbel Copulas: Not All Physical . . . Why Scalings

Archimax Copulas Arthur Charpentier charpentier.arthur@uqam.ca http

On the Choice of Parametric Families of Copulas Radu Craiu Department of Statistics University

Tails of Archimedean Copulas tail dependence in risk management Arthur Charpentier

Algebraic properties of copulas defined from matrices C ecile Amblard*, St ephane Girard**,

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

1 Latent variable models In the next section we will discuss latent variable models for

What is Latent Tree Analysis (LTA)? Repeated event co-occurrences might Due to common

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

PLTree A tree programming language Overview Philosophy: Everything is a tree All data structures

26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate

26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate

Probabilistic Programming Frank Wood fwood@robots.ox.ac.uk http://www.robots.ox.ac.uk/~fwood

Lecture 1: Probability and Counting Ziyu Shao School of Information Science and Technology

Comonotone lower probabilities for bivariate Introduction and discrete structures Comonotonicity

THE EFFECT OF SAMPLE SIZE ON BIVARIATE RAINFALL FREQUENCY ANALYSIS OF EXTREME PRECIPITATION

Lecture 5: The multivariate normal distribution The bivariate normal distribution Suppose x ,

Multivariate Normal Distribution Max Turgeon STAT 4690Applied Multivariate Analysis Building

Latent Tree Copulas Sergey Kirshner skirshne@purdue.edu Purdue - PowerPoint PPT Presentation

Latent Tree Copulas Sergey Kirshner skirshne@purdue.edu Purdue University West Lafayette, IN, USA Granada, Spain, September 19, 2012 Coming Attractions Want to fit density to model multivariate data? and organize real-valued data

Goodness-of-Fit Testing with Empirical Copulas Sami Umut Can John Einmahl Roger Laeven

Vine Copulas as a Way to Describe and Main Idea: Using . . . Analyze Multi-Variate Dependence in

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Copulas: An Introduction Part II: Models Johan Segers Universit catholique de Louvain (BE)

Why Invariant Functions . . . Clayton &amp; Gumbel Copulas: Not All Physical . . . Why Scalings

Archimax Copulas Arthur Charpentier charpentier.arthur@uqam.ca http

On the Choice of Parametric Families of Copulas Radu Craiu Department of Statistics University

Tails of Archimedean Copulas tail dependence in risk management Arthur Charpentier

Algebraic properties of copulas defined from matrices C ecile Amblard*, St ephane Girard**,

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

1 Latent variable models In the next section we will discuss latent variable models for

What is Latent Tree Analysis (LTA)? Repeated event co-occurrences might Due to common

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

PLTree A tree programming language Overview Philosophy: Everything is a tree All data structures

26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate

26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate

Probabilistic Programming Frank Wood fwood@robots.ox.ac.uk http://www.robots.ox.ac.uk/~fwood

Lecture 1: Probability and Counting Ziyu Shao School of Information Science and Technology

Comonotone lower probabilities for bivariate Introduction and discrete structures Comonotonicity

THE EFFECT OF SAMPLE SIZE ON BIVARIATE RAINFALL FREQUENCY ANALYSIS OF EXTREME PRECIPITATION

Lecture 5: The multivariate normal distribution The bivariate normal distribution Suppose x ,

Multivariate Normal Distribution Max Turgeon STAT 4690Applied Multivariate Analysis Building

Why Invariant Functions . . . Clayton & Gumbel Copulas: Not All Physical . . . Why Scalings