Approximate Kernel Methods and Learning on Aggregates Dino - PowerPoint PPT Presentation

Approximate Kernel Methods and Learning on Aggregates Dino Sejdinovic joint work with Leon Law, Seth Flaxman, Dougal Sutherland, Kenji Fukumizu, Ewan Cameron, Tim Lucas, Katherine Battle (and many others) Department of Statistics University of Oxford GPSS Workshop on Advances in Kernel Methods, Sheffield 06/09/2018 D.Sejdinovic (University of Oxford) Approximate Kernel Embeddings Sheffield, 06/09/2018 1 / 24

Learning on Aggregates Supervised learning : obtaining inputs has a lower cost than obtaining outputs/labels, hence we build a (predictive) functional relationship or a conditional probabilistic model of outputs given inputs. Semisupervised learning : because of the lower cost, there is much more unlabelled than labelled inputs. Weakly supervised learning on aggregates : because of the lower cost, inputs are at a much higher resolution than outputs. Figure: left : Malaria incidences reported per administrative unit; centre : land surface temperature at night; centre : topographic wetness index D.Sejdinovic (University of Oxford) Approximate Kernel Embeddings Sheffield, 06/09/2018 2 / 24

Outline Preliminaries on Kernels and GPs 1 Bayesian Approaches to Distribution Regression 2 Variational Learning on Aggregates with GPs 3 D.Sejdinovic (University of Oxford) Approximate Kernel Embeddings Sheffield, 06/09/2018 2 / 24

Reproducing Kernel Hilbert Space (RKHS) Definition ( [Aronszajn, 1950; Berlinet & Thomas-Agnan, 2004] ) Let X be a non-empty set and H be a Hilbert space of real-valued functions defined on X . A function k : X × X → R is called a reproducing kernel of H if: 1 ∀ x ∈ X , k ( · , x ) ∈ H , and 2 ∀ x ∈ X , ∀ f ∈ H , � f, k ( · , x ) � H = f ( x ) . If H has a reproducing kernel, it is said to be a reproducing kernel Hilbert space . Equivalent to the notion of kernel as an inner product of features : any function k : X × X → R for which there exists a Hilbert space H and a map ϕ : X → H s.t. k ( x, x ′ ) = � ϕ ( x ) , ϕ ( x ′ ) � H for all x, x ′ ∈ X . In particular, for any x, y ∈ X , k ( x, y ) = � k ( · , y ) , k ( · , x ) � H = � k ( · , x ) , k ( · , y ) � H . Thus H servers as a canonical feature space with feature map x �→ k ( · , x ) . Equivalently, all evaluation functionals f �→ f ( x ) are continuous (norm convergence implies pointwise convergence). Moore-Aronszajn Theorem: every positive semidefinite k : X × X → R is a reproducing kernel and has a unique RKHS H k . D.Sejdinovic (University of Oxford) Approximate Kernel Embeddings Sheffield, 06/09/2018 3 / 24

Reproducing Kernel Hilbert Space (RKHS) Definition ( [Aronszajn, 1950; Berlinet & Thomas-Agnan, 2004] ) Let X be a non-empty set and H be a Hilbert space of real-valued functions defined on X . A function k : X × X → R is called a reproducing kernel of H if: 1 ∀ x ∈ X , k ( · , x ) ∈ H , and 2 ∀ x ∈ X , ∀ f ∈ H , � f, k ( · , x ) � H = f ( x ) . If H has a reproducing kernel, it is said to be a reproducing kernel Hilbert space . � 2 γ 2 � x − x ′ � 2 � Gaussian RBF kernel k ( x, x ′ ) = exp − 1 has an infinite-dimensional H with elements h ( x ) = � n i =1 α i k ( x i , x ) and their limits which give completion with respect to the inner product � n m � � � α i k ( x i , · ) , β j k ( y j , · ) = i =1 j =1 n m � � α i β j k ( x i , y j ) . i =1 j =1 D.Sejdinovic (University of Oxford) Approximate Kernel Embeddings Sheffield, 06/09/2018 3 / 24

Kernel Trick and Kernel Mean Trick implicit feature map x �→ k ( · , x ) ∈ H k replaces x �→ [ φ 1 ( x ) , . . . , φ s ( x )] ∈ R s � k ( · , x ) , k ( · , y ) � H k = k ( x, y ) inner products readily available • nonlinear decision boundaries, nonlinear regression [Cortes & Vapnik, 1995; Schölkopf & functions, learning on non-Euclidean/structured Smola, 2001] data D.Sejdinovic (University of Oxford) Approximate Kernel Embeddings Sheffield, 06/09/2018 4 / 24

Kernel Trick and Kernel Mean Trick implicit feature map x �→ k ( · , x ) ∈ H k replaces x �→ [ φ 1 ( x ) , . . . , φ s ( x )] ∈ R s � k ( · , x ) , k ( · , y ) � H k = k ( x, y ) inner products readily available • nonlinear decision boundaries, nonlinear regression [Cortes & Vapnik, 1995; Schölkopf & functions, learning on non-Euclidean/structured Smola, 2001] data RKHS embedding : implicit feature mean [Smola et al, 2007; Sriperumbudur et al, 2010; Muandet et al, 2017] P �→ µ k ( P ) = E X ∼ P k ( · , X ) ∈ H k replaces P �→ [ E φ 1 ( X ) , . . . , E φ s ( X )] ∈ R s � µ k ( P ) , µ k ( Q ) � H k = E X ∼ P,Y ∼ Q k ( X, Y ) [Gretton et al, 2005; Gretton et al, inner products easy to estimate 2006; Fukumizu et al, 2007; DS et al, 2013; Muandet et al, 2012; • nonparametric two-sample, independence, conditional independence, interaction testing, Szabo et al, 2015] learning on distributions D.Sejdinovic (University of Oxford) Approximate Kernel Embeddings Sheffield, 06/09/2018 4 / 24

Maximum Mean Discrepancy Maximum Mean Discrepancy (MMD) [Borgwardt et al, 2006; Gretton et al, 2007] between P and Q : 1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.4 6 4 2 0 2 4 6 MMD k ( P, Q ) = � µ k ( P ) − µ k ( Q ) � H k = sup | E f ( X ) − E f ( Y ) | f ∈H k : � f � H k ≤ 1 Characteristic kernels: MMD k ( P, Q ) = 0 iff P = Q (also metrizes weak* [Sriperumbudur,2010] ). 2 σ 2 � x − x ′ � 2 1 • Gaussian RBF exp( − 2 ) , Matérn family, inverse multiquadrics. Can encode structural properties in the data: kernels on non-Euclidean domains, networks, images, text... D.Sejdinovic (University of Oxford) Approximate Kernel Embeddings Sheffield, 06/09/2018 5 / 24

GPs and RKHSs: shared mathematical foundations The same notion of a (positive definite) kernel, but conceptual gaps between communities. Orthogonal projection in RKHS ⇔ Conditioning in GPs. Beware! 0/1 laws: GP sample paths with (infinite-dimensional) covariance kernel k almost surely fall outside of H k . • But the space of sample paths is only slightly larger than H k (outer shell). • It is typically also an RKHS (with another kernel). Worst-case in RKHS ⇔ Average-case in GPs. � 2 � � ( P f − Q f ) 2 � MMD 2 ( P, Q ; H k ) = = E f ∼GP (0 ,k ) sup ( Pf − Qf ) . � f � H k ≤ 1 Radford Neal, 1998: “ prior beliefs regarding the true function being modeled and expectations regarding the properties of the best predictor for this function [...] need not be at all similar. ” Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences M. Kanagawa, P. Hennig, DS, and B. K. Sriperumbudur ArXiv e-prints:1807.02582 https://arxiv.org/abs/1807.02582 D.Sejdinovic (University of Oxford) Approximate Kernel Embeddings Sheffield, 06/09/2018 6 / 24

Some uses of MMD MMD has been applied to: two-sample tests and independence tests (on graphs, text, audio...) [Gretton et al, within-sample average similarity – 2009, Gretton et al, 2012] between-sample average similarity model criticism and interpretability [Lloyd & Ghahramani, 2015; Kim, Khanna & Koyejo, 2016] analysis of Bayesian quadrature [Briol et al, 2018] k ( dog i , dog j ) k (dog i , fish j ) ABC summary statistics [Park, Jitkrittum & DS, 2015; Mitrovic, DS & Teh, 2016] summarising streaming data [Paige, DS & k ( fish j , dog i ) k (fish i , fish j ) Wood, 2016] traversal of manifolds learned by convolutional nets [Gardner et al, 2015] MMD-GAN: training deep generative Figure by Arthur Gretton models [Dziugaite, Roy & Ghahramani, 2015; Sutherland et al, 2017; Li et al, 2017] MMD 2 P k ( X, X ′ ) + E Y ,Y ′ i.i.d. Q k ( Y , Y ′ ) − 2 E X ∼ P ,Y ∼ Q k ( X, Y ) . k ( P, Q ) = E X,X ′ i.i.d. ∼ ∼ D.Sejdinovic (University of Oxford) Approximate Kernel Embeddings Sheffield, 06/09/2018 7 / 24

Some uses of MMD MMD has been applied to: two-sample tests and independence tests (on graphs, text, audio...) [Gretton et al, within-sample average similarity – 2009, Gretton et al, 2012] between-sample average similarity model criticism and interpretability [Lloyd & Ghahramani, 2015; Kim, Khanna & Koyejo, 2016] analysis of Bayesian quadrature [Briol et al, 2018] k ( dog i , dog j ) k (dog i , fish j ) ABC summary statistics [Park, Jitkrittum & DS, 2015; Mitrovic, DS & Teh, 2016] summarising streaming data [Paige, DS & k ( fish j , dog i ) k (fish i , fish j ) Wood, 2016] traversal of manifolds learned by convolutional nets [Gardner et al, 2015] MMD-GAN: training deep generative Figure by Arthur Gretton models [Dziugaite, Roy & Ghahramani, 2015; Sutherland et al, 2017; Li et al, 2017] 1 1 2 � � � � MMD 2 k ( P, Q ) = k ( X i , X j )+ k ( Y i , Y j ) − k ( X i , Y j ) . n x ( n x − 1) n y ( n y − 1) n x n y i,j i � = j i � = j D.Sejdinovic (University of Oxford) Approximate Kernel Embeddings Sheffield, 06/09/2018 7 / 24

Approximate Kernel Methods and Learning on Aggregates Dino - PowerPoint PPT Presentation

Approximate Kernel Methods and Learning on Aggregates Dino Sejdinovic joint work with Leon Law, Seth Flaxman, Dougal Sutherland, Kenji Fukumizu, Ewan Cameron, Tim Lucas, Katherine Battle (and many others) Department of Statistics University of

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

AGGREGATES AND POZZOLANIC MATERIALS OVERVIEW Presented by Tom Adams, P.E. April 10, 2018

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Socially and Environmentally Responsible Aggregates (SERA) Andrea Bourrie Dufferin Aggregates

Breedon Aggregates Breedon Aggregates Full-year 2013 results Preliminary results 4 March 2014

An introduction to Breedon Aggregates October 2013 Peter Tom Simon Vivian Introduction Peter

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Multiple Kernel Learning and Feature Space Denoising Fei Yan, Josef Kittler and Krystian

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban Spring 2015

Classification Accuracies of Malaria Infected Cells Using Deep Convolutional Neural Networks

EPSS 15 Fall 2017 Introduction to Oceanography Laboratory #1 Maps, Cross-sections, Vertical

Primary 1 2020 Parents Briefing School Leadership Team Name Position Mr Muhammad Farizal Bin

P. Malar Kodi Department of Electrical Engineering Jamia Millia Islamia New Delhi India

Unit 5: Inference for categorical data 2. Comparing two proportions Review materials will be

Reducing Child Mortality in the Last Mile: A Randomized Social Entrepreneurship Intervention in

Machine learning applications in developing-world sustainability John Quinn AI-DEV Group School

Access the Audio Portion of the Webinar Thanks for joining! To access the audio portion of the