Speeding up Permutation Testing Vamsi Ithapu - PowerPoint PPT Presentation

Speeding up Permutation Testing Vamsi Ithapu http://pages.cs.wisc.edu/~vamsi/pt_fast November 17, 2013

The paper ◮ “Speeding up Permutation Testing in Neuroimaging” ◮ Joint work with Chris Hinrichs 1 , Vikas Singh and Qinyuan Sun ◮ NIPS 2013 Spotlight Basic Idea : Traditional permutation testing procedure is computationally intensive. Our model leverages the structure of permutation testing matrix, and reduces the computation time by atleast 50 times without loosing any accuracy in estimating the null distribution. 1 Vamsi and Chris are joint first authors

Background Consider a study with n subjects from two groups (ex. Diabetic vs. Non-diabetic). For each subject, a m dimensional data/measurement is obtained (voxels, ROIs, genes etc.). Multiple hypothesis testing checks for group difference by ◮ Computing m univariate hypothesis tests (ex. t test) ◮ Calculating the corrected p –value by adjusting for multiple testing issues Bonferroni method computes the corrected α threshold using union bound (i.e. averaging over m tests). Problem : If m is large, Bonferroni’s corrected α ≪ true α

Permutation Testing - Background/Setup Permutation testing is a random sampling method – a non–parametric method to estimate the FWER by sampling from Global/Max Null distribution. If the two groups donot differ, then I can permute the group/class labels and end up with approximately same set of t statistics Given m , n and T (numner of trials/permutations). Repeat T times ◮ Randomly “permute” group labels across n subjects – compute t statistics for m dimensions – m × T permutation testing matrix (denoted by P ). Compute the max. t statistics for each permutation (column of P ), and estimate the max. Null distribution Compute p -value of “true” labeling using max. Null

Permutation Testing - continued For a good estimate of max. Null, T should be very large. Depending on m , n and T (number of random permutations), permutation testing is extremely computationally intensive. ◮ In neuroimaging, typically m ∼ 3 × 10 5 , n ∼ 400 and T ∼ 10 4 ◮ In Bioinformatics, typically m ∼ 1000, n ∼ 10 3 and T ∼ 10 3 The computation time can be days, and weeks in some cases!! Observation: ◮ P is “highly structured” – a combination of low–rank signal and high–rank residual.

example P MRI data. 100 healthy vs. non-healthy. m = 1000, T = 2000

So what? From a high–level viewpoint, this means P is “highly structured” = ⇒ Each column looks “similar” to other columns, and each row looks “similar” to other rows = ⇒ If you give me “sufficiently many” random (i.e. at random positions) entries of P , I will give you a highly accurate estimate of the entire matrix P Mathematically, P = UW + S , U is low rank and S is random residual – given some entries, I can estimate U , W and S (Matrix Completion) Sufficiently many ∼ < 1% sub–sampling !!

Evaluations Setup Data ◮ MRI data from 4 studies of cognitively healthy vs. non-healthy subjects ◮ n = 40 , 50 , 55 and 70 ◮ m ∼ 275000 and T = 10 4 Questions ◮ Can we recover max. Null ? ◮ What is the computational speed-up ? ◮ How stable is the estimated α threshold ? Baseline computes max Null from sub-sampled data directly (i.e. no completion of P )

max Null recovery Recovery measued using D KL (KL Divergence) and D B (Bhattacharya Distance) in log–scale

Computational Speed-up Time measured in minutes.

recovery vs. speed-up

Stability of α thresholds t -statistic thresholds at α = 0 . 95

Conclusion ◮ A novel method for estimating permutation testing matrix is proposed ◮ A computationl speed-up of > 50 is achieved while recovering max. Null upto a high degree of accuracy

Speeding up Permutation Testing Vamsi Ithapu - PowerPoint PPT Presentation

Speeding up Permutation Testing Vamsi Ithapu http://pages.cs.wisc.edu/~vamsi/pt_fast November 17, 2013 The paper Speeding up Permutation Testing in Neuroimaging Joint work with Chris Hinrichs 1 , Vikas Singh and Qinyuan Sun

The diameter of permutation groups permutation groups H. A. Helfgott February 2017 The

Speeding up Permutation Testing in Neuroimaging Chris Hinrichs , Vamsi Ithapu , Qinyuan

Speeding up the Inter-Planetary File System (IPFS) Speeding up the Inter-Planetary File System

Speeding Up Your Mac A Joe ON Tech Guide Speeding Up Your Mac Basics Three factors affect

Growth in permutation groups and linear New work on algebraic groups permutation groups H. A.

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

The diameter of permutation groups Proof ideas H. A. Helfgott and . Seress July 2013 Cayley

The diameter of permutation groups kos Seress May 2012 Cayley graphs The diameter of

Enumeration schemes for permutation patterns dashed permutation patterns Lara Pudwell Dashed

Algorithms for Permutation groups Alice Niemeyer UWA, RWTH Aachen Alice Niemeyer (UWA, RWTH

Statistics on permutation tableaux Pawel Hitczenko Drexel University parts based on joint work

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

Software Testing Overview What is software testing? General testing criteria Testing

Software testing Software Testing Introduction Testing levels Automated testing Principles and

1. Test page This page is for testing. This page is for testing. This page is for testing.

Bhattacharyya clustering with applications to mixture simplifications ICPR 2010, Istanbul, Turkey

Latency-Reliability Tradeoff in Short-Packet Communications Giuseppe Durisi Chalmers, Sweden

INVERSE FACTORIAL SERIES: A LITTLE KNOWN TOOL FOR THE SUMMATION OF DIVERGENT SERIES Ernst

Generalized Legendre Curves and Quaternionic Multiplication Fang-Ting Tu , joint with Alyson

Affine Invariant LCCs and LTCs Sivakanth Gopi Joint work with Arnab Bhattacharya (Indian

Deep Neural Networks based Text- Dependent Speaker Verification Gautam Bhattacharya, Jahangir

Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and

Intuitive Parameterization of Distance-Based Clustering Techniques Altobelli de Brito Mantuan