Advances in EM-test for Finite Mixture Models Jiahua Chen Canada - PowerPoint PPT Presentation

Advances in EM-test for Finite Mixture Models Jiahua Chen Canada Research Chair, Tier I Department of Statistics University of British Columbia International Workshop on Perspectives on High-dimensional Data Analysis Jiahua Chen (UBC) Advances June 9-11, 2011 1 / 1

Outline 1 Finite mixture models Genetic Example Finite mixture models 2 Hypothesis test Test of homogeneity Advances toward realistic solution 3 EM-test Further advances Limiting distribution Jiahua Chen (UBC) Advances June 9-11, 2011 2 / 1

A genetic example: trait Geneticists often study Sodium-lithium countertransport (SLC) activity in red blood cells, since it relates to blood pressure and the prevalence of hypertension; is relatively easier to study than blood pressure. A search of “Sodium-lithium countertransport” shows up 12,400 results. The leading one is cited 676 times. Jiahua Chen (UBC) Advances June 9-11, 2011 3 / 1

Population heterogeneity One genetic hypothesis is that the SLC activity is determined by a simple model of inheritance compatible with the action of a single gene with two alleles. Each observation (of SLC value) was composed of the sum of the effect of a genetic component and a normally distributed fluctuation. Thus, a general population may be divided into three subpopulations: (1) those has two copies of the allele that elevates the SLC activity; (2) those have one copy; and (3) those have 0 copies Hence, a random sample from the population should behave as a finite mixture of up to three components. Jiahua Chen (UBC) Advances June 9-11, 2011 4 / 1

Heterogeneity leads to mixture model There are two competing genetic models: simple dominance model and additive model. If one allele is dominant, then the data are a random sample from a two-component normal mixture model; If the genetic effect is additive, then the data are a random sample from a three-component normal mixture model. The data will be shown in the next slide. Jiahua Chen (UBC) Advances June 9-11, 2011 5 / 1

SLC data Figure: Histogram of 190 SLC measurements and suggestive normal mixture models with 2 and 3 components. Two−component mixture with unequal variances Three−component mixture with equal variance 0.5 0.4 0.3 Density 0.2 0.1 0.0 1 2 3 4 5 6 SLC measurement Jiahua Chen (UBC) Advances June 9-11, 2011 6 / 1

Reading from the histogram and fits It is not apparent whether a 2-component or a 3-component model is the “correct model”. A rigorous statistical analysis would be helpful to shed light to the preference of the two competing models. One may take model selection approach, diagnostic approach and so on to answer this question. A statistical hypothesis test is likely the most desired approach. Jiahua Chen (UBC) Advances June 9-11, 2011 7 / 1

Density function of a finite mixture Let { f ( x ; θ ) : θ ∈ Θ } be a parametric distribution family where Θ is parameter space for θ . A finite mixture model is a class of distributions with density function in the form of m � f ( x ; Ψ) = α h f ( x ; θ h ) . h =1 f ( x ; θ ): kernel/component density function. m : order of the finite mixture model. θ h : the parameter of the h th sub-population. α h : the proportion of the h th sub-population. Jiahua Chen (UBC) Advances June 9-11, 2011 8 / 1

Mixing distribution One may put all parameters into a mixing distribution: Ψ( θ ) = � m h =1 α h I ( θ h ≤ θ ). Ψ( θ ) is a distribution on Θ with m support points. Jiahua Chen (UBC) Advances June 9-11, 2011 9 / 1

Density function of a 2-component normal mixture 0.6 0.5 0.4 yy 0.3 0.2 0.1 0.0 −4 −2 0 2 4 6 xx Jiahua Chen (UBC) Advances June 9-11, 2011 10 / 1

Incomplete data structure A random variable X from a finite mixture model can be regarded as generated in two steps. In the first step, a value of θ is generated from the mixing distribution Ψ. When Ψ is discrete, this θ is labelled by h , the h th subpopulation. Given θ h , X is a random outcome from sub-population f ( x ; θ h ). Thus, the data from mixture models are “by definite” incomplete observations. Jiahua Chen (UBC) Advances June 9-11, 2011 11 / 1

Genetic example and the mixture model An individual can have genotypes AA , Aa or aa . The SLC activity level of a randomly selected individual has density function � α h φ ( x ; µ h , σ 2 f ( x ; Ψ) = h ) . h ∈{ AA , Aa , aa } where φ ( x ; µ h , σ 2 h ) is the normal density with mean µ h and variance σ 2 h . The genotype of the sample individual is generally unknown, particularly in this case. Jiahua Chen (UBC) Advances June 9-11, 2011 12 / 1

Genetic question in statistical terminology Ignore some details, the statistical problem on the existence of a major gene is to test the null hypothesis of m = 1 against m > 1. This is homogeneity test. To determine whether the major gene (allele) is additive or dominate, the statistical problem is to test the null hypothesis of m = 2 against m = 3. This is to test the order of the mixture model. Jiahua Chen (UBC) Advances June 9-11, 2011 13 / 1

Two-component model Given an iid sample X 1 , . . . , X n from a two-component mixture, the log-likelihood function of the mixing distribution is given by � ℓ n ( α 1 , α 2 , θ 1 , θ 2 ) = log { α 1 f ( x i ; θ 1 ) + α 2 f ( x i ; θ 2 ) } . i Is the underlying population in fact homogeneous? That is, does θ 1 = θ 2 ? Jiahua Chen (UBC) Advances June 9-11, 2011 14 / 1

Likelihood ratio test (LRT) for homogeneity The standard approach is to compute likelihood ratio test statistic: R n = 2 { sup ℓ n ( α 1 , α 2 , θ 1 , θ 2 ) − sup ℓ n ( α 1 , α 2 , θ, θ ) } . Reject H 0 if R n is larger than some threshold value. It only leaves a technical issue of computing the proper threshold value. Jiahua Chen (UBC) Advances June 9-11, 2011 15 / 1

The technical issue is challenging For regular models, R n has an asymptotic chisquared distribution under the null hypothesis. Chisquared distributions are well documented and easily computed numerically. Hence, a proper threshold value can be easily determined based on chisquared distribution for hypothesis testing under regular models. Jiahua Chen (UBC) Advances June 9-11, 2011 16 / 1

Advances in EM-test for Finite Mixture Models Jiahua Chen Canada - PowerPoint PPT Presentation

Advances in EM-test for Finite Mixture Models Jiahua Chen Canada Research Chair, Tier I Department of Statistics University of British Columbia International Workshop on Perspectives on High-dimensional Data Analysis Jiahua Chen (UBC)

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State University November 28, 2017 Jarad

Model-Based Testing (ISTQB Chapter 4) Arie van Deursen 1 4.1 ISTQB Test Design Test Scripts

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

Finite A to B implies |A| = |B| Cardinality for finite A, B finite-card .1 finite-card .2

Vine copula mixture models and clustering for non-Gaussian data Statistical Methods in Machine

200511316 200511316 Test plan Test design specification g p

FLSA DUTIES TEST Exemption/Duties Test Types of Duties/Exemption Test Executive Exemption

Engineering Best Practices Test, test, test, and test some more; test as you go Start from a

Test automation Building automatically repeatable test suites Test automation n Test automation

Nehemiah Prays Nehemiah 1-2 Here is some test text Here is some test text Here is some test

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families

Phylogenetics: Recovering Evolutionary History COMP 571 Luay Nakhleh, Rice University The

Twin data analysis with ACE-decomposed explanatory variables using Stata German Stata Users Group

Systems genetics with graphical Markov models Robert Castelo robert.castelo@upf.edu @robertclab

1 A review of fitness Fitness has two components: 1. Viability; an individuals ability to

Causality Bernhard Sch olkopf and Jonas Peters MPI for Intelligent Systems, T ubingen

Introduction to DNA Microarray Data Longhai Li Department of Mathematics and Statistics

GENETIC CONTROL ON GROWTH AND WOOD DENSITY OF EUCALYPTS HYBRIDS UNDER TWO NUTRIENT CONDITIONS

hic sunt dracones . here be dragons! Genetic and phenotypic architecture of complex traits

Sambuz

Useful Links

Newsletter

Mail Us

Advances in EM-test for Finite Mixture Models Jiahua Chen Canada - PowerPoint PPT Presentation

Advances in EM-test for Finite Mixture Models Jiahua Chen Canada Research Chair, Tier I Department of Statistics University of British Columbia International Workshop on Perspectives on High-dimensional Data Analysis Jiahua Chen (UBC)

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State University November 28, 2017 Jarad

Model-Based Testing (ISTQB Chapter 4) Arie van Deursen 1 4.1 ISTQB Test Design Test Scripts

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Gaussian Mixture Models &amp; EM CE-717: Machine Learning Sharif University of Technology M.

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

Finite A to B implies |A| = |B| Cardinality for finite A, B finite-card .1 finite-card .2

Vine copula mixture models and clustering for non-Gaussian data Statistical Methods in Machine

200511316 200511316 Test plan Test design specification g p

FLSA DUTIES TEST Exemption/Duties Test Types of Duties/Exemption Test Executive Exemption

Engineering Best Practices Test, test, test, and test some more; test as you go Start from a

Test automation Building automatically repeatable test suites Test automation n Test automation

Nehemiah Prays Nehemiah 1-2 Here is some test text Here is some test text Here is some test

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families

Phylogenetics: Recovering Evolutionary History COMP 571 Luay Nakhleh, Rice University The

Twin data analysis with ACE-decomposed explanatory variables using Stata German Stata Users Group

Systems genetics with graphical Markov models Robert Castelo robert.castelo@upf.edu @robertclab

1 A review of fitness Fitness has two components: 1. Viability; an individuals ability to

Causality Bernhard Sch olkopf and Jonas Peters MPI for Intelligent Systems, T ubingen

Introduction to DNA Microarray Data Longhai Li Department of Mathematics and Statistics

GENETIC CONTROL ON GROWTH AND WOOD DENSITY OF EUCALYPTS HYBRIDS UNDER TWO NUTRIENT CONDITIONS

hic sunt dracones . here be dragons! Genetic and phenotypic architecture of complex traits

Sambuz

Useful Links

Newsletter

Mail Us

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.