Great plot. Now need to find the theory that explains it Deville - PowerPoint PPT Presentation

Arthur CHARPENTIER, Advanced Econometrics Graduate Course Advanced Econometrics #3: Model & Variable Selection * A. Charpentier (Université de Rennes 1) Université de Rennes 1, Graduate Course, 2018. 1 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course “ Great plot. Now need to find the theory that explains it ” Deville (2017) http://twitter.com 2 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course Preliminary Results: Numerical Optimization Problem : x ⋆ ∈ argmin { f ( x ); x ∈ R d } Gradient descent : x k +1 = x k − η ∇ f ( x k ) starting from some x 0 Problem : x ⋆ ∈ argmin { f ( x ); x ∈ X ⊂ R d } � � x k − η ∇ f ( x k ) Projected descent : x k +1 = Π X starting from some x 0 A constrained problem is said to be convex if   min { f ( x ) } with f convex   s.t. g i ( x ) = 0 , ∀ i = 1 , · · · , n with g i linear    h i ( x ) ≤ 0 , ∀ i = 1 , · · · , m with h i convex n m � � Lagrangian : L ( x , λ , µ ) = f ( x ) + λ i g i ( x ) + µ i h i ( x ) where x are primal i =1 i =1 variables and ( λ , µ ) are dual variables. Remark L is an affine function in ( λ , µ ) 3 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course Preliminary Results: Numerical Optimization Karush–Kuhn–Tucker conditions : a convex problem has a solution x ⋆ if and only if there are ( λ ⋆ , µ ⋆ ) such that the following condition hold • stationarity : ∇ x L ( x , λ , µ ) = 0 at ( x ⋆ , λ ⋆ , µ ⋆ ) • primal admissibility : g i ( x ⋆ ) = 0 and h i ( x ⋆ ) ≤ 0, ∀ i • dual admissibility : µ ⋆ ≥ 0 x {L ( x , λ , µ ) } Let L denote the associated dual function L ( λ , µ ) = min L is a convex function in ( λ , µ ) and the dual problem is max λ , µ { L ( λ , µ ) } . 4 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course References Motivation Banerjee, A., Chandrasekhar, A.G., Duflo, E. & Jackson, M.O. (2016). Gossip: Identifying Central Individuals in a Social Networks . References Belloni, A. & Chernozhukov, V. 2009. Least squares after model selection in high-dimensional sparse models . Hastie, T., Tibshirani, R. & Wainwright, M. 2015 Statistical Learning with Sparsity: The Lasso and Generalizations . CRC Press. 5 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course Preambule Assume that y = m ( x ) + ε , where ε is some idosyncatic impredictible noise. The error E [( y − m ( x )) 2 ] is the sum of three terms m ( x )) 2 ] • variance of the estimator : E [( y − � • bias 2 of the estimator : [ m ( x ) − � m ( x )] 2 • variance of the noise : E [( y − m ( x )) 2 ] (the latter exists, even with a ‘perfect’ model). 6 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course Preambule Consider a parametric model, with true (unkown) parameter θ , then � θ − θ ) 2 � � ) 2 � � − θ ) 2 � � ˆ � � ˆ � mse(ˆ (ˆ (ˆ θ ) = E = E θ − E θ + E ( E θ � �� variance bias 2 0.8 Let � θ denote an unbiased estimator of θ . Then 0.6 mse( � θ 2 θ ) ˆ · � θ = � · � θ = θ − θ θ 2 + mse( � θ 2 + mse( � 0.4 θ ) θ ) � �� penalty 0.2 variance satisfies mse(ˆ θ ) ≤ mse( � θ ). 0.0 −2 −1 0 1 2 3 4 7 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course Bayes vs. Frequentist, inference on heads/tails Consider some Bernoulli sample x = { x 1 , x 2 , · · · , x n } , where x i ∈ { 0 , 1 } . X i ’s are i.i.d. B ( p ) variables, f X ( x ) = p x [1 − p ] 1 − x , x ∈ { 0 , 1 } . Standard frequentist approach � � n n � � p = 1 � x i = argman f X ( x i ) n i =1 i =1 � �� L ( p ; x ) From the central limit theorem √ n p − p � L � → N (0 , 1) as n → ∞ p (1 − p ) we can derive an approximated 95% confidence interval � � � p ± 1 . 96 √ n � p (1 − � � p ) 8 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course Bayes vs. Frequentist, inference on heads/tails Example out of 1,047 contracts, 159 claimed a loss 0.035 (True) Binomial Distribution Poisson Approximation 0.030 Gaussian Approximation 0.025 0.020 Probability 0.015 0.010 0.005 0.000 100 120 140 160 180 200 220 Number of Insured Claiming a Loss 9 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course Bayes’s theorem Consider some hypothesis H and some evidence E , then P E ( H ) = P ( H | E ) = P ( H ∩ E ) = P ( H ) · P ( E | H ) P ( E ) P ( E ) Bayes rule,   prior probability P ( H )  versus posterior probability after receiving evidence E, P E ( H ) = P ( H | E ) . In Bayesian (parametric) statistics, H = { θ ∈ Θ } and E = { X = x } . Bayes’ Theorem, π ( θ | x ) = π ( θ ) · f ( x | θ ) π ( θ ) · f ( x | θ ) � = f ( x | θ ) π ( θ ) dθ ∝ π ( θ ) · f ( x | θ ) f ( x ) 10 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course Small Data and Black Swans Consider sample x = { 0 , 0 , 0 , 0 , 0 } . Here the likelihood is   f ( x i | θ ) = θ x i [1 − θ ] 1 − x i  f ( x | θ ) = θ x T 1 [1 − θ ] n − x T 1 and we need a priori distribution π ( · ) e.g. a beta distribution π ( θ ) = θ α [1 − θ ] β 1 0.8 0.6 B ( α, β ) 0.4 0.2 0 0 θ α + x T 1 [1 − θ ] β + n − x T 1 π ( θ | x ) = B ( α + x T 1 , β + n − x T 1 ) 11 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course On Bayesian Philosophy, Confidence vs. Credibility for frequentists, a probability is a measure of the the frequency of repeated events → parameters are fixed (but unknown), and data are random for Bayesians, a probability is a measure of the degree of certainty about values → parameters are random and data are fixed “ Bayesians : Given our observed data, there is a 95% probability that the true value of θ falls within the credible region vs. Frequentists : There is a 95% probability that when I compute a confidence interval from data of this sort, the true value of θ will fall within it. ” in Vanderplas (2014) Example see Jaynes (1976) , e.g. the truncated exponential 12 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course On Bayesian Philosophy, Confidence vs. Credibility Example What is a 95% confidence interval ● ● ● ● ● ● ● ● ● ● ● ● ● ● of a proportion ? Here x = 159 and n = 1047. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1. draw sets (˜ x 1 , · · · , ˜ x n ) k with X i ∼ B ( x/n ) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2. compute for each set of values confidence ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● intervals ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3. determine the fraction of these confidence ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● interval that contain x ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● → the parameter is fixed, and we guarantee ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● that 95% of the confidence intervals will con- ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● tain it. ● 140 160 180 200 13 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course On Bayesian Philosophy, Confidence vs. Credibility Example What is 95% credible region of a proportion ? Here x = 159 and n = 1047. 1. draw random parameters p k with from the posterior distribution, π ( ·| x ) x 1 , · · · , ˜ x n ) k with X i,k ∼ B ( p k ) 2. sample sets (˜ 3. compute for each set of values means x k 4. look at the proportion of those x k that are within this credible region [Π − 1 ( . 025 | x ); Π − 1 ( . 975 | x )] 140 160 180 200 → the credible region is fixed, and we guarantee that 95% of possible values of x will fall within it it. 14 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course Occam’s Razor The “law of parsimony”, “ pluralitas non est ponenda sine necessitate ” Penalize too complex models 15 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course James & Stein Estimator Let X ∼ N ( µ , σ 2 I ). We want to estimate µ . � � µ , σ 2 µ mle = X n ∼ N � . n I From James & Stein (1961) Estimation with quadratic loss � � 1 − ( d − 2) σ 2 µ JS = � y n � y � 2 where � · � is the Euclidean norm. One can prove that if d ≥ 3, �� 2 � �� 2 � µ JS − � µ mle − � µ < E µ E Samworth (2015) Stein’s paradox , “ one should use the price of tea in China to obtain a better estimate of the chance of rain in Melbourne ”. 16 @freakonometrics

Arthur CHARPENTIER, Advanced Econometrics Graduate Course James & Stein Estimator Heuristics : consider a biased estimator, to decrease the variance. See Efron (2010) Large-Scale Inference 17 @freakonometrics

Great plot. Now need to find the theory that explains it Deville - PowerPoint PPT Presentation

Arthur CHARPENTIER, Advanced Econometrics Graduate Course Advanced Econometrics #3: Model & Variable Selection * A. Charpentier (Universit de Rennes 1) Universit de Rennes 1, Graduate Course, 2018. 1 @freakonometrics Arthur CHARPENTIER,

Monetary Policy Mr. Clifford Explains Monetary Policy in 2 Minutes Mr. Clifford Explains the Fed

Novel Plot Structure Hello! 2 what is plot is all the events that happen in a story

Checking Assumptions Normal distributions: use probability plot (or quantile-quantile plot);

Contour plot: NO Old plot New plot Likelihood space: NO Outlook: 2 more plots almost done.

What is Christmas? What is Christmas? Islam Explains Islam Explains National Nasirat Team

Plot the definition The organized sequence of events that make up a story Every plot is

Engineering Culture Secret Sauce of Great Software Great Software process model Great

Great People Great Companies Great People Great Companies Executive Search & Campaign

A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM

Find Us! Find Us! Find Us! Find Us! Like Us! Like Us! Interact With Us! Interact With Us!

PROPOSED CAMPUS FOR NIMCaR HAJIPUR, GUJARAT BRIC GROUP | NIMCaR- HAJIPUR, GUJARAT | CONCEPT GME

Introduction to Data Science: Basic Let's create a dot plot of the example points and text, by

Mafic magma types (chemical distinction) Alkalis vs silica plot Alkali versus Silica plot was

Case Study I Bag Plot Data Visualization with ggplot2 ggplot2 2.0 Write your own

Perform EDA AN ALYZ IN G IOT DATA IN P YTH ON Matthias Voppichler IT Developer Plot dataframe

Plotting Data March 5, 2010 Derek Ruths Why plot data programmatically? Different kinds of

Bayesian Analysis of RR Lyrae Distances and Kinematics Thomas R. Jefferys 1 a Thomas G. Barnes III

Presentation to Faculty Senate ! khenning@pdx.edu Steve Harmon Committee March 3, 2014

SEMANTIC SEARCH IN E-DISCOVERY: AN INTERDISCIPLINARY APPROACH DESI V Workshop UvA: David Graus,

Honours Thesis Criminology and Criminal Justice CRCJ 4908 (1.0 credit) It is NOT

Investment for Growth" Rome Investment Forum 2015 11-12 December 2015 Mario Nava EU -

Learning Review: Access to mainstream education for children with learning and physical

Considerations for Implementing a Facility Quality Forum 1. Secure the commitment of the facility

Laboratory Testing: Then and Now DONNA D. CASTELLONE MS MS, MASCP, MT(ASCP)SH COLUMBIA

Great plot. Now need to find the theory that explains it Deville - PowerPoint PPT Presentation

Arthur CHARPENTIER, Advanced Econometrics Graduate Course Advanced Econometrics #3: Model & Variable Selection * A. Charpentier (Universit de Rennes 1) Universit de Rennes 1, Graduate Course, 2018. 1 @freakonometrics Arthur CHARPENTIER,

Monetary Policy Mr. Clifford Explains Monetary Policy in 2 Minutes Mr. Clifford Explains the Fed

Novel Plot Structure Hello! 2 what is plot is all the events that happen in a story

Checking Assumptions Normal distributions: use probability plot (or quantile-quantile plot);

Contour plot: NO Old plot New plot Likelihood space: NO Outlook: 2 more plots almost done.

What is Christmas? What is Christmas? Islam Explains Islam Explains National Nasirat Team

Plot the definition The organized sequence of events that make up a story Every plot is

Engineering Culture Secret Sauce of Great Software Great Software process model Great

Great People Great Companies Great People Great Companies Executive Search &amp; Campaign

A Plot for Visualizing Multivariate Data Rida E. A. Moustafa George Mason University ADM

Find Us! Find Us! Find Us! Find Us! Like Us! Like Us! Interact With Us! Interact With Us!

PROPOSED CAMPUS FOR NIMCaR HAJIPUR, GUJARAT BRIC GROUP | NIMCaR- HAJIPUR, GUJARAT | CONCEPT GME

Introduction to Data Science: Basic Let's create a dot plot of the example points and text, by

Mafic magma types (chemical distinction) Alkalis vs silica plot Alkali versus Silica plot was

Case Study I Bag Plot Data Visualization with ggplot2 ggplot2 2.0 Write your own

Perform EDA AN ALYZ IN G IOT DATA IN P YTH ON Matthias Voppichler IT Developer Plot dataframe

Plotting Data March 5, 2010 Derek Ruths Why plot data programmatically? Different kinds of

Bayesian Analysis of RR Lyrae Distances and Kinematics Thomas R. Jefferys 1 a Thomas G. Barnes III

Presentation to Faculty Senate ! khenning@pdx.edu Steve Harmon Committee March 3, 2014

SEMANTIC SEARCH IN E-DISCOVERY: AN INTERDISCIPLINARY APPROACH DESI V Workshop UvA: David Graus,

Honours Thesis Criminology and Criminal Justice CRCJ 4908 (1.0 credit) It is NOT

Investment for Growth&quot; Rome Investment Forum 2015 11-12 December 2015 Mario Nava EU -

Learning Review: Access to mainstream education for children with learning and physical

Considerations for Implementing a Facility Quality Forum 1. Secure the commitment of the facility

Laboratory Testing: Then and Now DONNA D. CASTELLONE MS MS, MASCP, MT(ASCP)SH COLUMBIA

Great People Great Companies Great People Great Companies Executive Search & Campaign

Investment for Growth" Rome Investment Forum 2015 11-12 December 2015 Mario Nava EU -