Big Data Big Bias Small Surprise S. Ejaz Ahmed Faculty of Math and - PowerPoint PPT Presentation

Executive Summary Bancroft (1944) suggested two problems on preliminary test strategy. Data pooling problem based on a preliminary test. This stream followed by a host of researchers. Model selection problem in linear regression model based on a preliminary test. Stein (1956, 1961) developed highly efficient shrinkage estimators in balanced designs. Most statisticians have ignored these (perhaps due to lack of understanding) Modern regularization estimation strategies based on penalized least squares with penalties extend Stein’s procedures powerfully. S. Ejaz Ahmed Big Data Analysis

Big Data Analysis Penalty Estimation Strategy The penalty estimators are members of the penalized least squares (PLS) family and they are obtained by optimizing a quadratic function subject to a penalty. PLS estimation provides a generalization of both nonparametric least squares and weighted projection estimators. A popular version of the PLS is given by Tikhonov (1963) regularization. A generalized version of penalty estimator is the bridge regression (Frank and Friedman,1993). S. Ejaz Ahmed Big Data Analysis

Big Data Analysis Penalty Estimation Strategy For a given penalty function π ( · ) and regularization parameter λ , the general form of the objective function can be written as φ ( β ) = ( y − X β ) T ( y − X β ) + λπ ( β ) , Penalty function is of the form p � | β j | γ , γ > 0 . π ( β ) = (2) j = 1 S. Ejaz Ahmed Big Data Analysis

Big Data Analysis Penalty Estimation Strategy For γ = 2, we have ridge estimates which are obtained by minimizing the penalized residual sum of squares � � � � p p � 2 � � � � � β ridge = arg min ˆ � � � � || β j || 2 , � y − X j β j + λ (3) � � � β j = 1 j = 1 λ is the tuning parameter which controls the amount of shrinkage and || · || = || · || 2 is the L 2 norm. S. Ejaz Ahmed Big Data Analysis

Big Data Analysis Penalty Estimation Strategy For γ < 2, it shrinks the coefficient towards zero, and depending on the value of λ, it sets some of the coefficients to exactly zero. The procedure combines variable selection and shrinking of the coefficients of a penalized regression. An important member of the penalized least squares family is the L 1 penalized least squares estimator, which is obtained when γ = 1. This is known as the Least Absolute Shrinkage and Selection Operator (LASSO): Tibshirani(1996) S. Ejaz Ahmed Big Data Analysis

Big Data Analysis Penalty Estimation Strategy LASSO is closely related to the ridge regression and its solutions are similarly obtained by replacing the squared penalty || β j || 2 in the ridge solution ( ?? ) with the absolute penalty || β j || 1 in the LASSO– � � � � p p � 2 � � � � � β LASSO = arg min ˆ � � � � � y − X j β j + λ || β j || 1 . (4) � � � β j = 1 j = 1 Good Strategy if Model is Sparse S. Ejaz Ahmed Big Data Analysis

Penalty Estimation Algorithm, Algorithm, Algorithm Efron et al. (2004, Annals of Statistics,32) proposed an efficient algorithm called Least Angle Regression (LARS) that produce the entire Lasso solution paths in only p steps . In comparison, the classical Lasso require hundreds or thousands of steps. LARS, least angle regression provides a clever and very efficient algorithm of computing the complete LASSO sequence of solutions as s is varied from 0 to ∞ Friedman, et al. (2007, 2008) and Wu and Lange developed the coordinate descent (CD) algorithm for penalized linear regression and penalized logistic regression and was shown to gain computational superiority. For a review, we refer to Zhang et al. (2010) S. Ejaz Ahmed Big Data Analysis

Penalty Estimation Family Ever Growing!! Adaptive LASSO Elastic Net Penalty Minimax Concave Penalty SCAD S. Ejaz Ahmed Big Data Analysis

Penalty Estimation Extension and Comparison with non-penalty Estimators Ahmed et al. (2008, 2009) penalty estimation for partially linear models. Fallahpour, Ahmed and Doksum (2010) partially linear models with Random Coefficient autoregressive Errors. Ahmed and Fallahpour (2012) for Quasi-likelihood models. Ahmed et al. (2012) for Weibull censored regression models. A relative performance of penalty, shrinkage and pretest estimators were showcased. S. Ejaz Ahmed Big Data Analysis

Penalty Estimation Extension and Comparison with non-penalty Estimators S. E. Ahmed (2014). Penalty, Pretest and Shrinkage Estimation: Variable Selection and Estimation . Springer. S. E. Ahmed (Editor). Perspectives on Big Data Analysis: Methodologies and Applications. To be published by Contemporary Mathematics, a co-publication of American Mathematical Society and CRM, 2014 . S. Ejaz Ahmed Big Data Analysis

Innate Difficulties: Can Signals be Septated from Noise? All penalty estimators may not provide an estimator with both estimation consistency and variable selection consistency simultaneously. Adaptive LASSO, SCAD, and MCP are Oracle (asymptoticaly). Asymptotic properties are based on assumptions on both true model and designed covariates. Sparsity in the model (most coefficients are exactly 0), few are not Nonzero coefficients are big enough to to be separated from zero ones. S. Ejaz Ahmed Big Data Analysis

Innate Difficulties: Ultrahigh Dimensional Features In genetic micro-array studies, n is measured in hundreds, the number of features p per sample can exceed millions!!! penalty estimators are not efficient when the dimension p becomes extremely large compared with sample size n . There are still challenging problems when p grows at a non-polynomial rate with n . Non-polynomial dimensionality poses substantial computational challenges. The developments in the arena of penalty estimation is still infancy. S. Ejaz Ahmed Big Data Analysis

Shrinkage Estimation for Big Data The classical shrinkage estimation methods are limited to fixed p . The asymptotic results depend heavily on a maximum likelihood full estimation with component-wise consistency at rate of √ n . When p n > n , a component-wise consistent estimator of β n is not available since β n is not identifiable. Here β n is not identifiable in the sense that there always exist two different estimations of β n , β ( 1 ) and β ( 2 ) n , such n i β ( 1 ) i β ( 2 ) that x ′ = x ′ for 1 ≤ i ≤ n . n n S. Ejaz Ahmed Big Data Analysis

Shrinkage Estimation for Big Data we write the p n − dimensional coefficients vector β n = ( β ′ 1 n , β ′ 2 n ) ′ , , where β 1 n is the coefficient vector for main covariates, β 2 n include all nuisance parameters. Sub-vectors β 1 n , β 2 n , have dimensions p 1 n , p 2 n , respectively, where p 1 n ≤ n and p 1 n + p 2 n = p n . Let X 1 n and X 2 n be the sub-matrices of X n corresponding to β 1 n and β 2 n , respectively. Let us assume true parameter vector β 0 = ( β 01 , · · · , β 0 p n ) ′ = ( β ′ 10 , β ′ 20 ) ′ . S. Ejaz Ahmed Big Data Analysis

Shrinkage Estimator for High Dimensional Data Let S 10 and S 20 represent the corresponding index sets for β 10 and β 20 , respectively. Specifically, S 10 includes important predictors and S 20 includes sparse and weak signals satisfying the following assumption. (A0) | β 0 j | = O ( n − ς ) , for ∀ j ∈ S 20 , where ς > 1 / 2 does not change with n . Condition (A0) is considered to be the sparsity of the model. A simpler representation for the finite sample is that β 0 j = 0 ∀ j ∈ S 20 , that is, most coefficients are 0 exactly. S. Ejaz Ahmed Big Data Analysis

Shrinkage Estimator for High Dimensional Data A Class of Submodels Predictors indexed by S 10 are used to construct a submodel. However, other predictors, especially ones in S 20 may also make some contributions to the response and cannot be ignored. Consider 20 ) ′ = 0 p 2 n . ( β ′ UPI or AI : S. Ejaz Ahmed Big Data Analysis

A Candidate Submodel Estimator We make the following assumptions on the random error and design matrix of the true model: (A1) The random error ǫ i ’s are independent and identically distributed with mean 0 and variance 0 < σ 2 < ∞ . Further, E ( ǫ m i ) < ∞ , for an even integer m not depending on n . (A2) ρ 1 n > 0, for all n , the smallest eigenvalue of C 12 n Under (A1-A2) and UPI/AE, the submodel estimator (SME) of β 1 n is defined as β SM ˆ 1 n X 1 n ) − 1 X ′ 1 n = ( X ′ 1 n y . S. Ejaz Ahmed Big Data Analysis

A Candidate Full Model Estimator Weighted Ridge Estimation We estimate an estimator of β n by minimizing a partial penalized objective function, β ( r n ) = argmin {� y − X 1 n β 1 n − X 2 n β 2 n � 2 + r n � β 2 n � 2 } ˆ where “ � · � ” is the ℓ 2 norm and r n > 0 is a tuning parameter. S. Ejaz Ahmed Big Data Analysis

Weighted Ridge Estimation Since p n >> n and under the sparsity assumption Define a n = c 1 n − ω , 0 < ω ≤ 1 / 2 , c 1 > 0 . We define a weighted ridge estimator of β n is denoted as � � ˆ β WR 1 n ( r n ) ˆ β WR ( r n , a n ) = , where ˆ n β WR 2 n ( r n , a n ) ˆ 1 n ( r n ) = ˆ β WR β 1 n ( r n ) and for j / ∈ S 10 , � ˆ ˆ β j ( r n , a n ) , β j ( r n , a n ) > a n ; ˆ β WR ( r n , a n ) = j 0 , otherwise . S. Ejaz Ahmed Big Data Analysis

Weighted Ridge Estimation We call ˆ β ( r n , a n ) as a weighted ridge estimator from two aspects. We use a weighted ridge instead of ridge penalty for the HD shrinkage estimation strategy since we do not want to generate some additional biases caused by an additional penalty on β 1 n if we already have a candidate subset model. Here ˆ 1 n ( r n ) changes with r n and ˆ β WR β WR 2 n ( r n , a n ) changes with both r n and a n . For the notation’s convenience, we denote the weighted ridge estimators as ˆ 1 n and ˆ β WR β WR 2 n . S. Ejaz Ahmed Big Data Analysis

A Candidate HD Shrinkage Estimator A HD shrinkage estimators (HD-SE) ˆ β S 1 n is β S ˆ 1 n = ˆ β WR 1 n − ( h − 2 ) T − 1 n ( ˆ β WR 1 n − ˆ β SM 1 n ) , h > 2 is the number of nonzero elements in ˆ β WR 2 n T n = ( ˆ 2 M 1 X 2 ) ˆ β WR β WR σ 2 , ) ′ ( X ′ / ˆ (5) 2 2 1 n X 1 n ) − 1 X ′ M 1 = I n − X 1 n ( X ′ 1 n σ 2 is a consistent estimator of σ 2 . ˆ For example, we can choose σ 2 = � n i ˆ β SM ) 2 / ( n − 1 ) under UPI or AI. ˆ i = 1 ( y i − x ′ S. Ejaz Ahmed Big Data Analysis

A Candidate HD Positive Shrinkage Estimator A HD positive shrinkage estimator (HD-PSE), β PSE ˆ = ˆ β WR 1 n − (( h − 2 ) T − 1 n ) 1 ( ˆ β WR 1 n − ˆ β SM 1 n ) , 1 n where ( a ) 1 = 1 and a for a > 1 and a ≤ 1, respectively. S. Ejaz Ahmed Big Data Analysis

Consistency and Asymptotic Normality Weighted Ridge Estimation Let s 2 n = σ 2 d ′ n Σ − 1 n d n for any p 12 n × 1 vector d n satisfying � d n � ≤ 1. n � n 1 / 2 s − 1 n ( ˆ β WR 12 n − β 120 ) = n − 1 / 2 s − 1 n Σ − 1 n d ′ ǫ i d ′ n z i + o P ( 1 ) n i = 1 → N ( 0 , 1 ) . d − S. Ejaz Ahmed Big Data Analysis

Asymptotic Distributional Risk Define Σ n 11 = lim n →∞ X ′ 1 n X 1 n / n , Σ n 22 = lim n →∞ X ′ 2 n X 2 n / n , Σ n 12 = lim n →∞ X ′ Σ n 21 = lim n →∞ X ′ 1 n X 2 n / n , 2 n X 1 n / n , Σ n 22 . 1 = lim n →∞ n − 1 X ′ 1 n X 1 n ) − 1 X ′ 2 n X 2 n − X ′ 2 n X 1 n ( X ′ 1 n X 2 n Σ n 11 . 2 = lim n →∞ n − 1 X ′ 2 n X 2 n ) − 1 X ′ 1 n X 1 n − X ′ 1 n X 2 n ( X ′ 2 n X 1 n S. Ejaz Ahmed Big Data Analysis

Asymptotic Distributional Risk K n : β 20 = n − 1 / 2 δ β 30 = 0 p 3 n , and δ = ( δ 1 , δ 2 , · · · , δ p 2 n ) ′ ∈ R p 2 n , δ j is fixed . Define ∆ n = δ ′ Σ n 22 . 1 δ , 1 n s − 1 n 1 / 2 d ′ 1 n ( β ∗ 1 n − β 10 ) is asymptotically normal under 1 n Σ − 1 { K n } , where s 2 1 n = σ 2 d ′ n 11 . 2 d 1 n . The asymptotic distributional risk (ADR) of d ′ 1 n β ∗ 1 n is n →∞ E { [ n 1 / 2 s − 1 1 n − β 10 )] 2 } . ADR ( d ′ 1 n β ∗ 1 n ) = lim 1 n d ′ 1 n ( β ∗ S. Ejaz Ahmed Big Data Analysis

Asymptotic Distributional Risk Analysis Mathematical Proof Under regularity conditions and K n , and suppose there exists 0 ≤ c ≤ 1 such that c = lim n →∞ s − 2 1 n Σ − 1 1 n d ′ n 11 d 1 n , we have 1 n ˆ β WR ADR ( d ′ 1 n ) = 1 , (6a) 1 n ˆ β SM ADR ( d ′ 1 n ) = 1 − ( 1 − c )( 1 − ∆ d 1 n ) , (6b) 1 n ˆ β S ADR ( d ′ 1 n ) = 1 − E [ g 1 ( z 2 + δ )] , (6c) 1 n ˆ β PSE ADR ( d ′ 1 n ) = 1 − E [ g 2 ( z 2 + δ )] , (6d) 1 n ( Σ − 1 n 11 Σ n 12 δδ ′ Σ n 21 Σ − 1 d ′ n 11 ) d 1 n ∆ d 1 n = . 1 n ( Σ − 1 n 11 Σ n 12 Σ − 1 n 22 . 1 Σ n 21 Σ − 1 d ′ n 11 ) d 1 n s − 1 2 n d ′ 2 n z 2 → N ( 0 , 1 ) d 2 n = Σ n 21 Σ − 1 n 11 d 1 n s 2 2 n Σ − 1 2 n = d ′ n 22 . 1 d 2 n S. Ejaz Ahmed Big Data Analysis

Asymptotic Distributional Risk Analysis Mathematical Proof � � 2 − x ′ (( p 2 n + 2 ) d 2 n d ′ n →∞ ( 1 − c ) p 2 n − 2 2 n ) x g 1 ( x ) = lim , s 2 x ′ Σ n 22 . 1 x 2 n x ′ Σ n 22 . 1 x � � �� p 2 n − 2 2 − x ′ (( p 2 n + 2 ) d 2 n d ′ 2 n ) x g 2 ( x ) = lim n →∞ ( 1 − c ) x ′ Σ n 22 . 1 x s 2 2 n x ′ Σ n 22 . 1 x I ( x ′ Σ n 22 . 1 x ≥ p 2 n − 2 ) + lim n →∞ [( 2 − s − 2 2 n x ′ δ 2 n δ ′ 2 n x )( 1 − c )] I ( x ′ Σ n 22 . 1 x ≤ p 2 n − 2 ) S. Ejaz Ahmed Big Data Analysis

Moral of the Story By Ignoring the Bias, it will Not go away! Submodel estimator provided by some existing variable selection techniques when p n ≫ n are subject to bias. The prediction performance can be improved by the shrinkage strategy. Particulary when an under-fitted submodel is selected by an aggressive penalty parameter. S. Ejaz Ahmed Big Data Analysis

Big Data Big Bias Small Surprise S. Ejaz Ahmed Faculty of Math and - PowerPoint PPT Presentation

Big Data Big Bias Small Surprise S. Ejaz Ahmed Faculty of Math and Science Brock University, ON, Canada sahmed5@brocku.ca www.brocku.ca/sahmed Fields Workshop May 23, 2014 Joint Work with X. Gao S. Ejaz Ahmed Big Data Analysis Outline of

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Surprise Billing Surprise Billing Dataset Review Dataset Review October 9, October 9, 2019

Entropy Let X be a discrete random variable The surprise of observing X = x is defined as

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

The Health Continuum Cascillas Guy, Adelante Surprise Site Surprise, Arizona Introduction

Developing Startle and Surprise Training Interventions for Airline Training Programs Dr Wayne

ConnectHome Nation Webinar 2020 US Census, ConnectHomeUSA Surprise & Closing 1 Agenda

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

go to the source The Media Bias Chart The Media Bias Chart A new taxonomy for discussing the

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Introduction Todays Host Host: Andy Reyes Manages the design, development and delivery

Exact Design of All-MOS Log Filters X.Redondo and F.Serra-Graells Design Department Institut de

A 1.2V 130 A 10-bit MOS-Only Log-Domain Modulator X. Redondo, J. Pallars and F.

My Project By- Dylan chapley Planning process I started to plan my project about a week after

@odin odinthe thener nerd not the god Auto-Intern GmbH 1 @odinthenerd Hana Duskov

3. Discrete Probability CSE 312 Winter 2017 W.L. Ruzzo 2 Probability theory: an

Polynomials, Number Theory, and Experimental Mathematics Michael Mossinghoff Davidson

Learning Unbounded Stress Systems via Local Inference Jeff Heinz University of California, Los

Sambuz

Useful Links

Newsletter

Mail Us

Big Data Big Bias Small Surprise S. Ejaz Ahmed Faculty of Math and - PowerPoint PPT Presentation

Big Data Big Bias Small Surprise S. Ejaz Ahmed Faculty of Math and Science Brock University, ON, Canada sahmed5@brocku.ca www.brocku.ca/sahmed Fields Workshop May 23, 2014 Joint Work with X. Gao S. Ejaz Ahmed Big Data Analysis Outline of

BIAS What Is Bias? Bias can be defined as favoring one side, position, or belief being

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

BIAS BIAS LIGHT LIGHT &amp; &amp; MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

Surprise Billing Surprise Billing Dataset Review Dataset Review October 9, October 9, 2019

Entropy Let X be a discrete random variable The surprise of observing X = x is defined as

Expectancy bias and Bias and forensic evidence Bias and speech research forensic speech

Publication bias in QCA Publication bias in QCA Publication bias in QCA Meaning, diagnosis and

The Health Continuum Cascillas Guy, Adelante Surprise Site Surprise, Arizona Introduction

Developing Startle and Surprise Training Interventions for Airline Training Programs Dr Wayne

ConnectHome Nation Webinar 2020 US Census, ConnectHomeUSA Surprise &amp; Closing 1 Agenda

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Equity &amp; Excellence: Hidden Bias Implicit Bias Inherent Bias

Bias in, Bias out: Gender Equality and the Fourth Industrial Revolution Debra Howcroft and

Review Selection bias, overfitting Bias v. variance v. residual Bias-variance tradeoff

go to the source The Media Bias Chart The Media Bias Chart A new taxonomy for discussing the

Implicit Bias Implicit bias Implicit bias refers to attitudes or stereotypes that affect our

Introduction Todays Host Host: Andy Reyes Manages the design, development and delivery

Exact Design of All-MOS Log Filters X.Redondo and F.Serra-Graells Design Department Institut de

A 1.2V 130 A 10-bit MOS-Only Log-Domain Modulator X. Redondo, J. Pallars and F.

My Project By- Dylan chapley Planning process I started to plan my project about a week after

@odin odinthe thener nerd not the god Auto-Intern GmbH 1 @odinthenerd Hana Duskov

3. Discrete Probability CSE 312 Winter 2017 W.L. Ruzzo 2 Probability theory: an

Polynomials, Number Theory, and Experimental Mathematics Michael Mossinghoff Davidson

Learning Unbounded Stress Systems via Local Inference Jeff Heinz University of California, Los

Sambuz

Useful Links

Newsletter

Mail Us

BIAS BIAS LIGHT LIGHT & & MEDIUM MEDIUM TR TRUCK UCK TIRES TIRES Bias Bias Ligh

ConnectHome Nation Webinar 2020 US Census, ConnectHomeUSA Surprise & Closing 1 Agenda

Equity & Excellence: Hidden Bias Implicit Bias Inherent Bias