Random projections, reweighting and half-sampling for - PowerPoint PPT Presentation

MC for high-dimensional statistics 1 Random projections, reweighting and half-sampling for high-dimensional statistical inference Art B. Owen Stanford University based on joint works with: Dean Eckles Facebook Inc. Sarah Emerson Oregon State University MCQMC 2012, February 2012

MC for high-dimensional statistics 2 About these slides These are the slides I presented on February 15 at MCQMC 2012 in Sydney Australia. I have corrected some typos and extended the presentation of the challenging integral over the Stiefel manifold. A few of these slides were skipped over in order to allow time for questions. This talk covers two projects. The bootstrap work with Dean Eckles has now been accepted by the Annals of Applied statistics. The projection work with Sarah Emerson is still in progress. MCQMC 2012, February 2012

MC for high-dimensional statistics 3 Monte Carlo methods for statistics 1) Markov chain Monte Carlo 2) Bootstrap resampling The above are mainstays. We also use: 1) Random permutations 2) Random projections 3) Sample splitting Probability/Statistics and Monte Carlo are closely intertwined. MCQMC 2012, February 2012

MC for high-dimensional statistics 4 Statistics and Monte Carlo M. C. Escher (1948) MCQMC 2012, February 2012 This talk will show some uses of MC uses in statistics.

MC for high-dimensional statistics 5 Some statistical notions X ∼ F random vector X has distribution F iid ∼ F X i are statistically I ndependent and I dentically D istributed (IID) from F X i The Gaussian distribution with mean µ ∈ R d N d ( µ, Σ) and variance covariance matrix Σ ∈ R d × d X ∼ N ( µ, Σ) , means � Pr( X ∈ A ) = f ( x ) d x where A � � − 1 2 ( x − µ ) T Σ − 1 ( x − µ ) f ( x ) = exp (2 π ) d/ 2 det(Σ) 1 / 2 p -values Observe T = t and compute p = Pr( T � t ) . If p < 0 . 01 then the observed value t happens 1 % or less of the time. Evidence against the hypothesized distribution of T . MCQMC 2012, February 2012

MC for high-dimensional statistics 6 Problem one We have iid ∼ F in R d X 1 , . . . , X n x iid ∼ G in R d Y 1 , . . . , Y n y is F = G ? We might assume F = N ( µ 1 , Σ) and G = N ( µ 2 , Σ) . Then we test µ 1 = µ 2 . This is an old problem. Revived interest, d ≫ n x + n y DNA microarrays expression levels of d ≈ 30 , 000 genes on n x healthy and n y diseased individuals n x , n y tens or hundreds Genome wide association studies d ≈ 2 , 000 , 000 markers with n x , n y thousands or more MCQMC 2012, February 2012 Also: fMRI, finance

MC for high-dimensional statistics 7 Illustration 50 red and 50 black points in R 2 ● ● 2 ● ● Black points normally distributed ● ● ● ● ● ● ● Red points shifted Northwest wrt black ● ● ● ● 1 ● ● ● ● ● ● ● ● ● X 1 not significantly different, ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● p = 0 . 47 ● ● ● 0 ● ● ● X 2 ● ● ● ● ● ● ● X 2 not significantly different, ● ● ● ● ● ● ● ● ● ● ● ● p = 0 . 09 −1 ● ● ● ● ● ● ● X 1 + X 2 not significantly different, ● ● ● ● ● p = 0 . 60 ● −2 ● X 1 − X 2 very significantly different, ● ● p = 1 . 7 × 10 − 4 −3 −2 −1 0 1 2 So: how to find the interesting projection? X 1 MCQMC 2012, February 2012

MC for high-dimensional statistics 8 Hotelling’s T 2 Find θ ∈ R d with θ T θ = 1 to maximize the apparent separation between � � X i = θ T X i ∈ R Y i = θ T Y i ∈ R and Answer depends on n x n x � � X = 1 ¯ ( X i − ¯ X )( X i − ¯ X ) T S x = X i n x i =1 i =1 n y n y � � Y = 1 ¯ ( Y i − ¯ Y )( Y i − ¯ Y ) T S y = Y i n y i =1 i =1 Algebraicly we get n x n y ( ¯ X − ¯ Y ) T S − 1 ( ¯ X − ¯ T 2 = Y ) where n x + n y S x + S y S = Hotelling (1931) n x + n y − 2 Get T 2 = 18 . 58 . Also Pr( T 2 � 18 . 58) = 2 . 6 × 10 − 4 (p value) MCQMC 2012, February 2012

MC for high-dimensional statistics 9 In high dimensions When d ≫ n x + n y the covariance matrix S is not invertible. Can’t use n x n y ( ¯ X − ¯ Y ) T S − 1 ( ¯ X − ¯ T 2 = Y ) n x + n y Geometrically Some projection θ ∈ R d has θ T X i = constant for i = 1 , . . . , n x and θ T Y i = different constant. IE: we will get perfect separation, even if F = G . A classic remedy by Dempster (1958) takes � d j =1 ( ¯ X j − ¯ Y j ) 2 � ¯ X − ¯ Y � 2 n x n y n x n y T 2 Dempster = = � d n x + n y tr ( S ) n x + n y j =1 S jj but this makes no use of correlations Recent improvements by: Bai, Saradanasa, Hall, Fan, Chen, Srivastava also don’t use correlations MCQMC 2012, February 2012

MC for high-dimensional statistics 10 Random projections Lopes, Jacob, Wainwright (2011) Choose random Θ ∈ R d × k with Θ T Θ = I k . Put � X i = Θ T X i and � Y i = Θ T Y i Then use n x n y � ( ¯ X − ¯ Y ) T Θ(Θ T S Θ) − 1 Θ T ( ¯ X − ¯ T 2 Θ = Y ) n x + n y Exists if k < n x + n y − 2 That is project data into a random k dimensional subspace test means of projected data this retains some of the correlations MCQMC 2012, February 2012

MC for high-dimensional statistics 11 Uniform random projections From R d to R : normalize a Gaussian vector Z Z ∼ N (0 , I k ) θ = � Z � , To project from R d to R k   · · · Z 11 Z 12 Z 1 k     · · · Z 21 Z 22 Z 2 k   iid ∈ R d × k Z = Z ij ∼ N (0 , 1)   . . . ...   . . . . . .   Z d 1 Z d 2 · · · Z dk Gram-Schmidt yields Z = QR Θ = Q ∈ R d × k deliver � X i = Θ T X i project Any QR decomposition with positive R ii will do. MCQMC 2012, February 2012

MC for high-dimensional statistics 12 Lopes et al. ctd. They make just one random projection of the data They find that k ≈ ( n x + n y − 2) / 2 performs well Why just one? If your one projection is ’unlucky’ then you might miss the pattern. But with just one projection the distribution of � T 2 is known. Multiple projections M � T 2 = 1 � ¯ T 2 i M i =1 T 2 based on Θ i ∈ R d × k i Get some kind of ’average’ luck. But distribution of ¯ T 2 not known. MCQMC 2012, February 2012

MC for high-dimensional statistics 13 Multiple projections average over M independent random Θ i ∈ R d × k Work with S. Emerson: M � T 2 = 1 � ¯ T 2 i , where M i =1 n x n y � ( ¯ X − ¯ i ( ¯ X − ¯ T 2 i S Θ i ) − 1 Θ T Y ) T Θ i (Θ T i = Y ) n x + n y Easily T 2 ) = E ( � 1) E ( ¯ T 2 i ) 2) Var( ¯ T 2 ) < Var( � T 2 i ) , unless both are infinite! (averaging reduces variance) Less easily 1) Finite variance requires k � n x + n y − 6 2) Finite mean requires k � n x + n y − 4 Unfortunately: the distribution of ¯ T 2 is not known. MCQMC 2012, February 2012

MC for high-dimensional statistics 14 Separation Simulate 2000 data sets iid iid δ ∈ R d ∼ N (0 , Σ) , ∼ N ( δ, Σ) , X i Y i Of these: � δ � = 0 1000 null cases � δ � > 0 1000 non-null cases Rank 2000 ¯ T 2 scores See if nulls get smaller ¯ T 2 values. The ROC ∗ curve Shown later, shows how well the test separates the two cases ∗ R eceiver O perating C haracteristic (don’t ask) MCQMC 2012, February 2012

MC for high-dimensional statistics 15 Simulated case X i , Y i ∈ R 200 n x = n y = 50 Pick � δ � = 3 uniform on 200 dimensional sphere √ Pick Σ = I d × 50 / d Why these Uniform δ means that the group separation is unrelated to the covariance structure. Debatable. We follow Lopes et al in making this assumption. WLOG, under uniformity Σ = diag ( λ 1 , . . . , λ d ) λ 1 � λ 2 � · · · � λ d Interesting cases are equal λ j and rapidly decreasing λ j MCQMC 2012, February 2012

MC for high-dimensional statistics 16 Multiple projections Simulated T squared M=1 M=32 Null Alt Null Alt n x = n y = 50 , d = 200 , k = 49 , Null: � δ � = 0 Alt: � δ � = 3 MCQMC 2012, February 2012

MC for high-dimensional statistics 17 The ROCs ROC curves: M=1,2,4,8,16,32 100 Larger M has greater area under the curve: 80 M AUC True positives 1 71.9 60 2 80.6 40 4 87.1 8 91.4 20 16 94.3 32 95.7 0 0 20 40 60 80 100 MCQMC 2012, February 2012 False positives

MC for high-dimensional statistics 18 Varying k Lopes et al. prefer k ≈ ( n x + n y − 2) / 2 That is not always optimal. But may be a good default. For the previous scenario: small k do relatively poorly. 32 � k � 56 all gave AUC ≈ 0 . 95 with M = 32 Other scenarios S. Emerson: advantage of averaging persists in other decay rates for eigenvalues of Σ MCQMC 2012, February 2012

MC for high-dimensional statistics 19 Using ¯ T 2 The usual p -value is Pr( ¯ T 2 � t 2 ) where t 2 is the observed value on our data. We have no good approximation for this. Even the moments of ¯ T 2 involve difficult integrals over Θ ∈ V d,k , the Stiefel manifold, e.g. � Θ(Θ T S Θ) − 1 Θ T d U (Θ) Θ ∈ V d,k � Z ∈ R d × k Z ( Z T SZ ) − 1 Z T e − tr( Z T Z ) / 2 d Z = (2 π ) − dk/ 2 Non-negative diagonal S ∈ R d × d with n x + n y − 2 positive entries U (Θ) is the uniform (Haar) measure. Above is the first moment. Closed forms for first and second moments could lead to useful test statistics. MCQMC 2012, February 2012

Random projections, reweighting and half-sampling for - PowerPoint PPT Presentation

MC for high-dimensional statistics 1 Random projections, reweighting and half-sampling for high-dimensional statistical inference Art B. Owen Stanford University based on joint works with: Dean Eckles Facebook Inc. Sarah Emerson Oregon

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

Color Half Toning Half Toning Digital Half Toning Half toning and Colors Half Toning Half

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Half Toning Color Half Toning 1 Color Half Toning 2 Half Toning Emulating 5 different levels

Sampling multimodal densities in high dimensional sampling space Gersende FORT LTCI, CNRS &

Reweighting Techniques for Monte Carlo and Molecular Dynamics Simulations I Yuko OKAMOTO

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

CONTENTS OF DAY 2 I. Random Sampling 3 Connection with independent random variables 4

Repeated Measures ANOVA Rick Balkin, Ph.D., LPC-S, NCC Department of Counseling Texas A & M

The Multivariate Dustbin Phil Ender UCLA Statistical Consulting Group (Ret.) Stata Conference

The Green Paradox The Green Paradox Reyer Gerlagh Tilburg University Introduction Policy

Using Facebook Data to Predict 2016 US Presidential Election Keng-Chi Chang Chun-Fang Chiang

Voronoi Games on Cycle Graphs Marios Mavronicolas Burkhard Monien Vicky G. Papadopoulou Florian

1. (Three-player Cournot competition) Consider a three-player Cournot competition, in which three

Spectral Methods for Latent Variable Models Kaizheng Wang Department of ORFE Princeton

Advanced Microeconomics: Game Theory P . v. Mouche Wageningen University 2017 Motivation

Random projections, reweighting and half-sampling for - PowerPoint PPT Presentation

MC for high-dimensional statistics 1 Random projections, reweighting and half-sampling for high-dimensional statistical inference Art B. Owen Stanford University based on joint works with: Dean Eckles Facebook Inc. Sarah Emerson Oregon

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

Color Half Toning Half Toning Digital Half Toning Half toning and Colors Half Toning Half

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Sampling Distributions Sampling Distribution of the Mean &amp; Hypothesis Testing Sampling

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Half Toning Color Half Toning 1 Color Half Toning 2 Half Toning Emulating 5 different levels

Sampling multimodal densities in high dimensional sampling space Gersende FORT LTCI, CNRS &amp;

Reweighting Techniques for Monte Carlo and Molecular Dynamics Simulations I Yuko OKAMOTO

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

CONTENTS OF DAY 2 I. Random Sampling 3 Connection with independent random variables 4

Repeated Measures ANOVA Rick Balkin, Ph.D., LPC-S, NCC Department of Counseling Texas A &amp; M

The Multivariate Dustbin Phil Ender UCLA Statistical Consulting Group (Ret.) Stata Conference

The Green Paradox The Green Paradox Reyer Gerlagh Tilburg University Introduction Policy

Using Facebook Data to Predict 2016 US Presidential Election Keng-Chi Chang Chun-Fang Chiang

Voronoi Games on Cycle Graphs Marios Mavronicolas Burkhard Monien Vicky G. Papadopoulou Florian

1. (Three-player Cournot competition) Consider a three-player Cournot competition, in which three

Spectral Methods for Latent Variable Models Kaizheng Wang Department of ORFE Princeton

Advanced Microeconomics: Game Theory P . v. Mouche Wageningen University 2017 Motivation

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

Sampling multimodal densities in high dimensional sampling space Gersende FORT LTCI, CNRS &

Repeated Measures ANOVA Rick Balkin, Ph.D., LPC-S, NCC Department of Counseling Texas A & M