Learning Mixtures of Truncated Basis Functions from Data Helge - PowerPoint PPT Presentation

Learning Mixtures of Truncated Basis Functions from Data Helge Langseth, Thomas D. Nielsen, and Antonio Salmerón PGM 2012 This work is supported by an Abel grant from Iceland, Liechtenstein, and Norway through the EEA Financial Mechanism (Nils mobility project). Supported and Coordinated by Universidad Complutense de Madrid, by the Spanish Ministry of Science and Innovation through projects TIN2010-20900-C04-02-03, and by ERDF (FEDER) funds. Learning MoTBFs from data 1

Background: Approximations Learning MoTBFs from data Background: Approximations 2

Geometry of approximations A quick recall of how of how to do approximations in R n : 5 4 3 2 1 5 4 0 3 0 1 2 2 3 1 4 5 0 We want to approximate the vector f = (3 , 2 , 5) with A vector along e 1 = (1 , 0 , 0) . Learning MoTBFs from data Background: Approximations 2

Geometry of approximations A quick recall of how of how to do approximations in R n : 5 4 3 2 1 5 4 0 3 0 1 2 2 3 1 4 5 0 We want to approximate the vector f = (3 , 2 , 5) with A vector along e 1 = (1 , 0 , 0) . Best choice is � f , e 1 � · e 1 = (3 , 0 , 0) . Learning MoTBFs from data Background: Approximations 2

Geometry of approximations A quick recall of how of how to do approximations in R n : 5 4 3 2 1 5 4 0 3 0 1 2 2 3 1 4 5 0 We want to approximate the vector f = (3 , 2 , 5) with A vector along e 1 = (1 , 0 , 0) . Best choice is � f , e 1 � · e 1 = (3 , 0 , 0) . Now, add a vector along e 2 . Best choice is � f , e 2 � · e 2 , independently of the choice made for e 1 . Also, the choice we made for e 1 is still optimal since e 1 ⊥ e 2 . Best approximation is in general � ℓ � f , e ℓ � · e ℓ . Learning MoTBFs from data Background: Approximations 2

Geometry of approximations A quick recall of how of how to do approximations in R n : 5 4 3 2 1 5 4 0 3 0 1 2 2 3 1 4 5 0 We want to approximate the vector f = (3 , 2 , 5) with A vector along e 1 = (1 , 0 , 0) . Best choice is � f , e 1 � · e 1 = (3 , 0 , 0) . Now, add a vector along e 2 . Best choice is � f , e 2 � · e 2 , independently of the choice made for e 1 . Also, the choice we made for e 1 is still optimal since e 1 ⊥ e 2 . Best approximation is in general � ℓ � f , e ℓ � · e ℓ . All of this maps over to approximations of functions! We only need a definition of the inner product and the equivalent to orthonormal basis vectors. Learning MoTBFs from data Background: Approximations 2

Geometry of approximations A quick recall of how of how to do approximations in R n : 5 4 3 2 1 5 4 0 3 0 1 2 2 3 1 4 5 0 We want to approximate the vector f = (3 , 2 , 5) with A vector along e 1 = (1 , 0 , 0) . Best choice is � f , e 1 � · e 1 = (3 , 0 , 0) . Now, add a vector along e 2 . Best choice is � f , e 2 � · e 2 , independently of the choice made for e 1 . Also, the choice we made for e 1 is still optimal since e 1 ⊥ e 2 . Best approximation is in general � ℓ � f , e ℓ � · e ℓ . Inner product for functions For two functions u ( · ) and v ( · ) defined on Ω ⊆ R , we use � u, v � = � Ω u ( x ) v ( x ) d x . Learning MoTBFs from data Background: Approximations 2

Generalised Fourier series Definition (Legal set of basis functions) Let Ψ = { ψ i } ∞ i =0 be an indexed set of basis functions . Let Q be the set of all linear combination of functions in Ψ . Ψ is a legal set of basis functions if: ψ 0 is constant; 1 u ∈ Q and v ∈ Q implies that ( u · v ) ∈ Q ; 2 For any pair of real numbers s and t , s � = t , there exists a function ψ i ∈ Ψ 3 s.t. ψ i ( s ) � = ψ i ( t ) . Legal basis functions { 1 , x, x 2 , x 3 , . . . } is a legal set of basis functions. { 1 , exp( − x ) , exp( x ) , exp( − 2 x ) , exp(2 x ) , . . . } is also legal. { 1 , log( x ) , log(2 x ) , log(3 x ) , . . . } is not a legal set of basis functions. Learning MoTBFs from data Background: Approximations 3

Generalised Fourier series Definition (Legal set of basis functions) Let Ψ = { ψ i } ∞ i =0 be an indexed set of basis functions . Let Q be the set of all linear combination of functions in Ψ . Ψ is a legal set of basis functions if: ψ 0 is constant; 1 u ∈ Q and v ∈ Q implies that ( u · v ) ∈ Q ; 2 For any pair of real numbers s and t , s � = t , there exists a function ψ i ∈ Ψ 3 s.t. ψ i ( s ) � = ψ i ( t ) . Generalized Fourier series Assume Ψ is legal and contains orthonormal basis functions (if not, they can be made orthonormal through a Gram-Schmidt process). Then, the Generalized Fourier Series approximation to a function f is defined as ˆ f ( · ) = � ℓ � f, ψ ℓ � ψ ℓ ( · ) . Learning MoTBFs from data Background: Approximations 3

Generalised Fourier series Definition (Legal set of basis functions) Let Ψ = { ψ i } ∞ i =0 be an indexed set of basis functions . Let Q be the set of all linear combination of functions in Ψ . Ψ is a legal set of basis functions if: ψ 0 is constant; 1 u ∈ Q and v ∈ Q implies that ( u · v ) ∈ Q ; 2 For any pair of real numbers s and t , s � = t , there exists a function ψ i ∈ Ψ 3 s.t. ψ i ( s ) � = ψ i ( t ) . Important properties Any function – including density functions – can be approximated 1 arbitrarily well by this approach. 2 � � 2 � � 2 � � f ( x ) − � k f ( x ) − � k ℓ =0 c i ψ ℓ ( x ) d x ≥ ℓ =0 � f, ψ ℓ � ψ ℓ ( x ) d x , Ω Ω so the generalized Fourier series approximation is optimal in L 2 sense. Learning MoTBFs from data Background: Approximations 3

MoTBFs Learning MoTBFs from data MoTBFs 4

The marginal MoTBF potential Definition Let Ψ = { ψ i } ∞ i =0 with ψ i : R �→ R define a legal set of basis functions on Ω ⊆ R . Then g k : Ω �→ R + 0 is an MoTBF potential at level k wrt. Ψ . . . if 1 k � g k ( x ) = a i ψ i ( x ) i =0 for all x ∈ Ω , where a i are real constants; . . . or there is a partition of Ω into intervals I 1 , . . . , I m s.t. g k is defined as 2 above on each I j . Special cases An MoTBFs potential at level k = 0 is simply a standard discretisation. MoPs (original definition) and MTEs are also special cases of MoTBFs. Learning MoTBFs from data MoTBFs 4

The marginal MoTBF potential Definition Let Ψ = { ψ i } ∞ i =0 with ψ i : R �→ R define a legal set of basis functions on Ω ⊆ R . Then g k : Ω �→ R + 0 is an MoTBF potential at level k wrt. Ψ . . . if 1 k � g k ( x ) = a i ψ i ( x ) i =0 for all x ∈ Ω , where a i are real constants; . . . or there is a partition of Ω into intervals I 1 , . . . , I m s.t. g k is defined as 2 above on each I j . Simplification We do not utilize the option to split the domain into subdomains here. Learning MoTBFs from data MoTBFs 4

Example: Polynomials vs. the Std. Gaussian 0.4 0.4 0.4 0.35 0.35 0.35 0.3 0.3 0.3 0.25 0.25 0.25 0.2 0.2 0.2 0.15 0.15 0.15 0.1 0.1 0.1 0.05 0.05 0.05 0 0 0 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 g 0 = 0 . 4362 · ψ 0 g 2 = 0 . 4362 · ψ 0 + g 8 = 0 . 4362 · ψ 0 + 0 · ψ 1 + 0 · ψ 1 + . . − 0 . 1927 · ψ 2 . . . . 0 . 0052 · ψ 8 Use orthonormal polynomials (shifted & scaled Legendre polynomials). Approximation always integrates to unity . Direct computations give the g k closest in L 2 -norm. Positivity constraint and KL minimisation � convex optimization. Learning MoTBFs from data MoTBFs 5

Learning Univariate Distributions Learning MoTBFs from data Learning Univariate Distributions 6

Relationship between KL and ML Idea for learning MoTBFs from data Generate a kernel density for a (marginal) probability distribution, and use the translation-scheme to approximate it with an MoTBF. Learning MoTBFs from data Learning Univariate Distributions 6

Relationship between KL and ML Idea for learning MoTBFs from data Generate a kernel density for a (marginal) probability distribution, and use the translation-scheme to approximate it with an MoTBF. Setup Let f ( x ) be the density generating { x 1 , . . . , x N } . Let g k ( x | θ ) = � k i =0 θ i · ψ i ( x ) be an MoTBF of order k . Let h N ( x ) be a kernel density estimator. Result: KL minimization is likelihood maximization in the limit Let ˆ θ N = arg min θ D ( h N ( · ) � g k ( ·| θ ) ) . Then ˆ θ N converges to the maximum likelihood estimator of θ as N → ∞ (given certain regularity conditions). Learning MoTBFs from data Learning Univariate Distributions 6

Example: Learning the standard Gaussian 0.6 0.4 0.2 0 -3 -2 -1 0 1 2 3 Density estimate; 50 samples. Learning MoTBFs from data Learning Univariate Distributions 7

Example: Learning the standard Gaussian 0.6 0.4 0.2 0 -3 -2 -1 0 1 2 3 Density estimate; 50 samples. g 0 : BIC = − 91 . 54 . Learning MoTBFs from data Learning Univariate Distributions 7

Example: Learning the standard Gaussian 0.6 0.4 0.2 0 -3 -2 -1 0 1 2 3 Density estimate; 50 samples. g 0 : BIC = − 91 . 54 . g 2 : BIC = − 83 . 21 . Learning MoTBFs from data Learning Univariate Distributions 7

Example: Learning the standard Gaussian 0.6 0.4 0.2 0 -3 -2 -1 0 1 2 3 Density estimate; 50 samples. g 0 : BIC = − 91 . 54 . g 2 : BIC = − 83 . 21 . g 4 : BIC = − 76 . 13 . Learning MoTBFs from data Learning Univariate Distributions 7

Learning Mixtures of Truncated Basis Functions from Data Helge - PowerPoint PPT Presentation

Learning Mixtures of Truncated Basis Functions from Data Helge Langseth, Thomas D. Nielsen, and Antonio Salmern PGM 2012 This work is supported by an Abel grant from Iceland, Liechtenstein, and Norway through the EEA Financial Mechanism (Nils

Truncated Differentials Lars R. Knudsen June 2014 Lars R. Knudsen Truncated Differentials

Learning Conditional Distributions using Mixtures of Truncated Basis Functions Inmaculada

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

On truncated discrete moment problems Tobias Kuna University of Reading, UK (Joint work with

Maximum Likelihood vs. Least Squares for Estimating Mixtures of Truncated Exponentials Helge

Parameter Estimation in Mixtures of Truncated Exponentials Helge Langseth 1 Thomas D. Nielsen 2

Inference in hybrid Bayesian networks with H. Langseth, T.D. Nielsen, R. Rum , mixtures of

Release granular mushrooms Release granular mushrooms and dried mixtures and dried mixtures

The science of mixtures and separation techniques Rahul Bhambure PhD Scientist, Chemical

Mixtures of models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory

Orthonormal bases of functions April 24, 2018 Data - Vectors or Functions Vectors Functions

Minimax risk of truncated series estimators over symmetric convex polytopes Adel Javanmard

Bivariate Truncated Moment Problems with Algebraic Variety in the Nonnegative Quadrant in R 2

Truncated Sums, Matrix Iteration Giacomo Boffi

Asymmetric truncated Toeplitz operators of rank one Bartosz anucha Maria Curie-Skodowska

Truncated Sums, Matrix Iteration Giacomo Boffi

State of Reliability Report Preview of Key Findings Mark Lauby, Senior Vice President and Chief

Labour market integration in Ireland: migration channels, ethnicity and Irish citizenship DATE

Relevant Persons of Northern Ireland and the EU Settlement Scheme Statement of changes in

and their impact on research led teaching in Classics. Simon Mahony (University College London)

Chapter 1: Programming Principles Abstraction and Information Hiding Abstraction Object

Operational readiness in transition Sven Ternov, LFV/Swedavia Barry Kirwan, EEC Paul Humphreys,

Autonomous Intelligent Robotics Instructor: Shiqi Zhang

The Chronology of OECD Waste Decisions, EU -Legislation, Basel Convention Source: OECD