Nonparametric Distributed Learning Architecture: Algorithm and - PowerPoint PPT Presentation

Introduction Elements of Distributed Statistical Learning Applications Nonparametric Distributed Learning Architecture: Algorithm and Application Scott Bruce, Zeda Li, Hsiang-Chieh(Alex) Yang, and Subhadeep (DEEP) Mukhopadhyay Temple University - Department Award Day Seminar Best Paper Award , JSM 2016 Section on Nonparametric Statistics of ASA. Winner Fox School Ph.D. Student Research Competition. April 15, 2016

Introduction Elements of Distributed Statistical Learning Applications Big Data Statistical Inference: Motivating Example Goal: Nonparametric two-sample inference algorithm for Expedia personalized hotel recommendation engine. We develop a scalable distributed algorithm that can mine search data from millions of travelers to find the important features that best predict customers’ likelihood to book a hotel. Key Challenges Variety : Different data types require different stasitical measures. Volume : Over 10 million observations across 52 variables. Scalability : Distributed, parallel processing for massive data analysis. Nonparametric Distributed Learning Architecture: Algorithm and Application 1/17

Introduction Elements of Distributed Statistical Learning Applications Summary of Main Contributions Dramatic increases in the size of datasets have made traditional “ centralized ” statistical inference techniques prohibitive. Surprisingly very little attention has been given to developing inferential algorithms for data whose volume exceeds the capacity of a single-machine system. Indeed, the topic of big data statistical inference is very much in its nascent stage of development. A question of immediate concern is how can we design a data-intensive statistical inference architecture without changing the basic fundamental data modeling principles that were developed for ‘small’ data over the last century? To address this problem we present MetaLP –a flexible and distributed statistical modeling paradigm suitable for large-scale data analysis addressing: (1) massive volume and (2) variety or mixed data problem. Nonparametric Distributed Learning Architecture: Algorithm and Application 2/17

Introduction Elements of Distributed Statistical Learning Applications LP Nonparametric Harmonic Analysis Conventional statistical approach fails to address the ‘mixed data problem’. We resolve this by representing data in a new transform domain via a specially designed procedure (analogous to time → frequency domain representation via Fourier transform). Theorem (Mukhopadhyay and Parzen, 2014) Random variable X (discrete or continuous) with finite variance admits the following decomposition: X − E ( X ) = � j > 0 T j ( X ; X ) E [ XT j ( X ; X )] with probability 1 . Traditional and modern statistical measures developed for different data-types can be compactly expressed as inner products in the LP Hilbert space. Nonparametric Distributed Learning Architecture: Algorithm and Application 3/17

Introduction Elements of Distributed Statistical Learning Applications Data-Adaptive Shapes 1.5 2.0 2 1.0 0.5 1.0 1 −0.5 0.0 0 0.0 −1 −1.5 −1.0 −1.0 0.5 0.6 0.7 0.8 0.9 1.0 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 5 3 3 4 2 2 3 2 1 2 1 1 0 1 −1 0 0 0 −2 −1 −1 −2 0.5 0.6 0.7 0.8 0.9 1.0 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Figure 1: The left 2x2 panel shows the first four LP orthonormal score functions for discrete variable length of stay . The right panel shows the shape of the score functions for continuous price usd . Nonparametric Distributed Learning Architecture: Algorithm and Application 4/17

Introduction Elements of Distributed Statistical Learning Applications LP Hilbert Functional Representation Define the two-sample LP statistic for variable selection of a mixed random variable X (either continuous or discrete) based on our specially designed score functions LP[ j ; X , Y ] = Cor[ T j ( X ; X ) , Y ] = E [ T j ( X ; X ) T 1 ( Y ; Y )] . (1) Properties Sample LP statistics √ n � LP[ j ; X , Y ] asymptotically converge to i.i.d. standard normal distributions (Mukhopadhyay and Parzen, 2014). LP[1;X,Y] unifies various measures of linear association for different data-type combinations . Higher order LP statistics capture distributional differences Allows data scientists to write a single computing formula irrespective of its data type, with a common metric and asymptotic characteristics. Steps towards Unified Algorithms. Nonparametric Distributed Learning Architecture: Algorithm and Application 5/17

Introduction Elements of Distributed Statistical Learning Applications Meta-Analysis and Data-Parallelism The key is to recognize that meta-analytic logic can provide a formal statistical framework to address: How to judiciously combine the “local” LP-inferences executed in parallel by different servers to get the “global” inference for the original big data? Towards large-scale parallel computing: We use meta-analysis to parallelize the statistical inference process for massive datasets. Nonparametric Distributed Learning Architecture: Algorithm and Application 6/17

Introduction Elements of Distributed Statistical Learning Applications What to Combine? Instead of simply providing point estimates we seek to provide a distribution estimator (analogous to the Bayesian posterior distribution) for the LP-statistics via a Confidence Distribution ( CD ) that contains information for virtually all types of statistical inference (e.g. estimation, hypothesis testing, CI, etc.). Definition (Confidence Distribution) Suppose Θ is the parameter space of the unknown parameter of interest θ , and ω is the sample space corresponding to data X n = { X 1 , X 2 , . . . , X n } T . Then a function H n ( · ) = H n ( X , · ) on ω × Θ → [0 , 1] is a confidence distribution (CD) if: (i). For each given X n ∈ ω, H n ( · ) is a continuous cumulative distribution function on Θ; (ii). At the true parameter value θ = θ 0 , H n ( θ 0 ) = H n ( X , θ 0 ), as a function of the sample X n , following U [0 , 1]. Nonparametric Distributed Learning Architecture: Algorithm and Application 7/17

Introduction Elements of Distributed Statistical Learning Applications How to Combine? The combining function for CDs across k different studies can be expressed as: H ( c ) (LP[ j ; X , Y ]) = G c { g c ( H (LP 1 [ j ; X , Y ]) , . . . , H (LP k [ j ; X , Y ]) } . The function G c is determined by the monotonic g c function defined as G c ( t ) = P ( g c ( U 1 , . . . , U k ) ≤ t ) , in which U 1 , . . . , U k are independent U [0 , 1] random variables. A popular and useful choice for g c is g c ( u 1 , . . . , u k ) = α 1 F − 1 0 ( u 1 ) + . . . + α k F − 1 0 ( u k ) , where F 0 ( · ) is a given cumulative distribution function and α ℓ ≥ 0 , with at least one α ℓ � = 0, are generic weights. Nonparametric Distributed Learning Architecture: Algorithm and Application 8/17

Introduction Elements of Distributed Statistical Learning Applications Combining Formula for the LP CD’s Theorem (Bruce, Li, Yang and Mukhopadhyay, 2016) 0 ( t ) = Φ − 1 ( t ) and α l = √ n ℓ , where n ℓ is the size of Setting F − 1 subpopulation ℓ = 1 , . . . , k, the following combined aCD for LP[ j ; X , Y ]) follows: �� 1 / 2 � k � ( c ) [ j ; X , Y ] LP[ j ; X , Y ] − � H ( c ) (LP[ j ; X , Y ]) = Φ n ℓ LP ℓ =1 � k ℓ =1 n ℓ � LP ℓ [ j ; X , Y ] ( c ) [ j ; X , Y ] = � LP Where � k ℓ =1 n ℓ �� k � − 1 ( c ) [ j ; X , Y ] and where � LP ℓ =1 n ℓ are the mean and variance respectively of the combined aCD for LP[ j ; X , Y ] . Nonparametric Distributed Learning Architecture: Algorithm and Application 9/17

Parallel Broken-Big Datasets are Often Heterogeneous Failure to take heterogeneity into account can easily spoil the big data discovery process. � iid ∼ N (LP ℓ [ j ; X , Y ] , s 2 LP ℓ [ j ; X , Y ] | LP ℓ [ j ; X , Y ] , s i i ) (2) iid ∼ N (LP[ j ; X , Y ] , τ 2 ) LP ℓ [ j ; X , Y ] | LP[ j ; X , Y ] , τ (3) 100 75 Partition count 50 Random Visit ID 25 0 −0.3 −0.2 −0.1 0.0 0.1 0.2 LP statistics for Price usd Figure 2: Histogram of LP-statistic of the variable price usd based on random partition and visitor location country id partition.

Heterogeneity-Corrected LP Confidence Distribution Theorem (Bruce, Li, Yang and Mukhopadhyay, 2016) � ( τ 2 + (1 / n ℓ )) , where n ℓ is Setting F − 1 0 ( t ) = Φ − 1 ( t ) and α ℓ = 1 / the size of subpopulation ℓ = 1 , . . . , k, the following combined aCD for LP[ j ; X , Y ] follows: H ( c ) (LP[ j ; X , Y ]) = �� 1 / 2 � k 1 ( c ) [ j ; X , Y ]) (LP[ j ; X , Y ] − � Φ LP , τ 2 + (1 / n ℓ ) ℓ =1 � k ℓ =1 ( τ 2 + (1 / n ℓ )) − 1 � LP ℓ [ j ; X , Y ]) ( c ) [ j ; X , Y ]) = � LP � k ℓ =1 ( τ 2 + (1 / n ℓ )) − 1 ( c ) [ j ; X , Y ]) and ( � k ℓ =1 1 / ( τ 2 + (1 / n ℓ ))) − 1 are the mean where � LP and variance respectively of the combined aCD for LP[ j ; X , Y ] .

Introduction Elements of Distributed Statistical Learning Applications Expedia: Variable Importance, Impact of Regularization 0.06 LP Confidence Interval 0.03 0.00 −0.03 0 10 20 30 40 Variables Figure 3: 95% Confidence Intervals for each variables’ LP Statistics under Random Sampling Partitioning (black) and Country ID Partitioning (red). Nonparametric Distributed Learning Architecture: Algorithm and Application 12/17

Nonparametric Distributed Learning Architecture: Algorithm and - PowerPoint PPT Presentation

Introduction Elements of Distributed Statistical Learning Applications Nonparametric Distributed Learning Architecture: Algorithm and Application Scott Bruce, Zeda Li, Hsiang-Chieh(Alex) Yang, and Subhadeep (DEEP) Mukhopadhyay Temple

Nonparametric Regression Splines for Nonparametric Regression Splines for Regional Atmospheric

Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Ylmaz Electrical

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

Nonparametric analysis of CMB Nonparametric analysis of CMB power spectrum data and consistency

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Introduction to Big Data and Machine Learning Nonparametric methods Dr. Mihail October 1, 2019

Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

Dr. Nonparametric Bayes Or: How I Learned to Stop Worrying and Love the Dirichlet Process Kurt

Outline Introduction Background Distributed DBMS Architecture Distributed Database Design

Nonparametric combinatorial sequence models Fabian L. Wauthier, UC Berkeley with Nebojsa Jojic

Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of

Nonparametric Density Estimation October 1, 2018 Introduction If we cant fit a

Advanced fMRI Prac/cal Nonparametric Inference, Power & Meta-Analysis Thomas E. Nichols

NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES Sotirios Damouras Advisor: Mark

Core Working Group Report Philip Levis ( speaking on behalf of the WG ) TTX 5 2/22/08 Core WG

tt tt s

ATTORNEY LIABILITY IN PRIVATE OFFERINGS David S. Hunt TODAYS PRESENTATION: WHAT TO EXPECT

Fluctuations in Non-Singular Bouncing Motivation Cosmologies from Type II Superstrings

Active Living & Libraries Collections, Programs, and Services Physical literacy is the

OSL : Online Structure Learning using Background Knowledge Axiomatization Evangelos

Stifel: The Knall/Cohen Group Market Commentary Second Quarter 2017 Index Returns: Second Quarter

System Integration Issues System Integration Issues of DC to DC converters in of DC to DC

Nonparametric Distributed Learning Architecture: Algorithm and - PowerPoint PPT Presentation

Introduction Elements of Distributed Statistical Learning Applications Nonparametric Distributed Learning Architecture: Algorithm and Application Scott Bruce, Zeda Li, Hsiang-Chieh(Alex) Yang, and Subhadeep (DEEP) Mukhopadhyay Temple

Nonparametric Regression Splines for Nonparametric Regression Splines for Regional Atmospheric

Nonparametric Sequential Change Detection for High-Dimensional Problems Yasin Ylmaz Electrical

The np package np : A Package for Nonparametric Kernel The np package implements a variety of

Nonparametric analysis of CMB Nonparametric analysis of CMB power spectrum data and consistency

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Introduction to Big Data and Machine Learning Nonparametric methods Dr. Mihail October 1, 2019

Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

Dr. Nonparametric Bayes Or: How I Learned to Stop Worrying and Love the Dirichlet Process Kurt

Outline Introduction Background Distributed DBMS Architecture Distributed Database Design

Nonparametric combinatorial sequence models Fabian L. Wauthier, UC Berkeley with Nebojsa Jojic

Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of

Nonparametric Density Estimation October 1, 2018 Introduction If we cant fit a

Advanced fMRI Prac/cal Nonparametric Inference, Power &amp; Meta-Analysis Thomas E. Nichols

NONPARAMETRIC TIME SERIES ANALYSIS USING GAUSSIAN PROCESSES Sotirios Damouras Advisor: Mark

Core Working Group Report Philip Levis ( speaking on behalf of the WG ) TTX 5 2/22/08 Core WG

tt tt s

ATTORNEY LIABILITY IN PRIVATE OFFERINGS David S. Hunt TODAYS PRESENTATION: WHAT TO EXPECT

Fluctuations in Non-Singular Bouncing Motivation Cosmologies from Type II Superstrings

Active Living &amp; Libraries Collections, Programs, and Services Physical literacy is the

OSL : Online Structure Learning using Background Knowledge Axiomatization Evangelos

Stifel: The Knall/Cohen Group Market Commentary Second Quarter 2017 Index Returns: Second Quarter

System Integration Issues System Integration Issues of DC to DC converters in of DC to DC

Advanced fMRI Prac/cal Nonparametric Inference, Power & Meta-Analysis Thomas E. Nichols

Active Living & Libraries Collections, Programs, and Services Physical literacy is the