Statistical Inference on Large Contingency Tables: Convergence, - PowerPoint PPT Presentation

Preliminaries Convergence of contingency tables Testability Homogeneous partitions, spectra Application References Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu COMPSTAT 2010 Paris, August 23, 2010

Preliminaries Convergence of contingency tables Testability Homogeneous partitions, spectra Application References Motivation To recover the structure of large rectangular arrays, for example, microarrays, socal, economic, or communication networks, classical methods of cluster and correspondence analysis may not be carried out on the whole table because of computational size limitations. In other situations, we want to compare contingency tables of different sizes. Two directions: 1. Select a smaller part (by an appropriate randomization) and process SVD or correspondence analysis on it. 2. Regard it as a continuous object and set up a bilinear programming task with constraints. In this way, fuzzy clusters are obtained.

Preliminaries Convergence of contingency tables Testability Homogeneous partitions, spectra Application References References We generalize some theorems of Borgs, Chayes, Lov´ asz, S´ os, Vesztergombi, Convergent graph sequences I: subgraph sequences, metric properties and testing, Advances in Math. 2008 to rectangular arrays and to testable parameters defined on them. In Bolla, Friedl, Kr´ amli, Singular value decomposition of large random matrices (for two-way classification of microarrays), Journal of Multivariate Analysis 101, 2010 we investigated effects of random perturbations on the entries to the singular spectrum, clustering effect, and correspondence factors.

Preliminaries Convergence of contingency tables Testability Homogeneous partitions, spectra Application References Notation Let C = C m × n be a contingency table of row set Row C = { 1 , . . . , m } and column set Col C = { 1 , . . . , n } . c ij ’s are interactions between the rows and columns, and they are normalized such that 0 ≤ c ij ≤ 1. Binary table: 0/1 entries. Row-weights: α 1 , . . . , α m ≥ 0 Column-weights: β 1 , . . . , β n ≥ 0 (Individual importance of the categories. In correspondence analysis, these are the marginals.)

Preliminaries Convergence of contingency tables Testability Homogeneous partitions, spectra Application References A contingency table is called simple if all the row- and column-weights are equal to 1. Assume that C does not contain identically zero rows or columns, moreover C is dense in the sense that the number of nonzero entries is comparable with mn . Let C denote the set of such tables (with any natural numbers m and n ). Consider a simple binary table F a × b and maps Φ : Row F → Row C , Ψ : Col F → Col C ; further a b m n � � � � α Φ := α Φ( i ) , β Ψ := β Ψ( j ) , α C := α i , β C := β j . i =1 j =1 i =1 j =1

Preliminaries Convergence of contingency tables Testability Homogeneous partitions, spectra Application References Homomorphism density Definition The F → C homomorphism density is 1 � � t ( F , C ) = α Φ β Ψ c Φ( i )Ψ( j ) . ( α C ) a ( β C ) b Φ , Ψ f ij =1 If C is simple, then 1 � � t ( F , C ) = c Φ( i )Ψ( j ) . m a n b Φ , Ψ f ij =1 In addition, if C is binary too, then t ( F , C ) is the probability that a random map F → C is a homomorphism (preserves the 1’s).

Preliminaries Convergence of contingency tables Testability Homogeneous partitions, spectra Application References The maps Φ and Ψ correspond to sampling a rows and b columns out of Row C and Col C with replacement, respectively. In case of simple C it means uniform sampling, otherwise the rows and columns are selected with probabilities proportional to their weights. The following simple binary random table ξ ( a × b , C ) will play an important role in proving the equivalent theorems of testability. Select a rows and b columns of C with replacement, with probabilities α i /α C ( i = 1 , . . . , m ) and β j /β C ( j = 1 , . . . , n ), respectively. If the i th row and j th column of C are selected, they will be connected by 1 with probability c ij and 0, otherwise, independently of the other selected row–column pairs, conditioned on the selection of the rows and columns. For large m and n , P ( ξ ( a × b , C ) = F ) and t ( F , C ) are close to each other.

Preliminaries Convergence of contingency tables Testability Homogeneous partitions, spectra Application References Definition Definition We say that the sequence ( C m × n ) of contingency tables is convergent if the sequence t ( F , C m × n ) converges for any simple binary table F as m , n → ∞ . The convergence means that the tables C m × n become more and more similar in small details as they are probed by smaller 0-1 tables ( m , n → ∞ ).

Preliminaries Convergence of contingency tables Testability Homogeneous partitions, spectra Application References The limit object The limit object is a measurable function U : [0 , 1] 2 → [0 , 1] and we call it contingon. In the m = n and symmetric case, C can be regarded as the weight matrix of an edge- and node-weighted graph (the row-weights are equal to the column-weights, loops are possible) and the limit object was introduced as graphon, see Borgs et al. The step-function contingon U C is assigned to C in the following way: the sides of the unit square are divided into intervals I 1 , . . . , I m and J 1 , . . . , J n of lengths α 1 /α C , . . . , α m /α C and β 1 /β C , . . . , β n /β C , respectively; then over the rectangle I i × J j the step-function takes on the value c ij .

Preliminaries Convergence of contingency tables Testability Homogeneous partitions, spectra Application References The metric inducing the convergence Definition The cut distance between the contingons U and V is µ,ν � U − V µ,ν � � δ � ( U , V ) = inf (1) where the cut norm of the contingon U is defined by � � �� U � � = sup U ( x , y ) dx dy � , � � S , T ⊂ [0 , 1] � S × T and the infimum in (1) is taken over all measure preserving bijections µ, ν : [0 , 1] → [0 , 1], while V µ,ν denotes the transformed V after performing the measure preserving bijections µ and ν on the sides of the unit square, respectively.

Preliminaries Convergence of contingency tables Testability Homogeneous partitions, spectra Application References Equivalence classes of contingons An equivalence relation is defined over the set of contingons: two contingons belong to the same class if they can be transformed into each other by measure preserving map, i.e., their cut distance is zero. In the sequel, we consider contingons modulo measure preserving maps, and under contingon we understand the whole equivalence class. By a theorem of Borgs et al. (2008), the equivalence classes form a compact metric space with the δ � metric.

Statistical Inference on Large Contingency Tables: Convergence, - PowerPoint PPT Presentation

Preliminaries Convergence of contingency tables Testability Homogeneous partitions, spectra Application References Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability Marianna Bolla Institute of

Business Statistics CONTENTS Contingency tables Independence of categorical variables 2 2

The Set of 3 4 4 Contingency Tables has 3-Neighborhood Property Toshio Sumi and Toshio

Counting Contingency Tables Igor Pak, UCLA Combinatorics Seminar, OSU, September 17, 2020 1

Presentation of medical data. Frequency tables and contingency tables. Visualization.

TIMES TABLES HOW WE TEACH TIMES TABLES AND HOW YOU CAN HELP WHY ARE TIMES TABLES IMPORTANT?

NZ Data Tables Data tables sit alongside the Active NZ main report The data tables provide

Symbol tables COMP 520 Fall 2013 Symbol tables (2) Symbol tables are used to describe and analyse

Statistical Issues Associated With Multi-way Contingency Tables & Links to Algebraic Geometry

Contingency planning and Outbreak management Nia Meddins Plant Health Policy Lead What does

Contingency Plan Contingency Plan i h i h in the events of in the events of f f Aberrant

Development of the Asia/Pacific Regional ATM Contingency Plan Shane Sumner Regional Officer

Humanitarian Response Plan Crisis preparedness and contingency Ongoing crisis Changes

Fundamentals of Evolution Session 22 - 11/27/2018 Contingency and Development 1 Contingency in

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

STAT 113 Describing Categorical Data Colin Reimer Dawson Oberlin College September 7, 2017 1 /

X1D: Create Pivot Tables using Excel 2013 3/07/2018 V1N Create Pivot Tables using Excel 2013 1

Reading Wikipedia to Answer Open-Domain Questions Authors - Danqi Chen Introduction

Observations on the modern NSM toolchest Christian Kreibich christian@lastline.com Bro4Pros,

!"#$%&' +,-./,.-01+,-./,.-02/3456-78398 +0:.09/01+,-./,.-02/3456-78398

CDC Update Regarding Aerosol vs. Airborne vs. Droplet Transmission &

SELECT THE RIGHT ABSTRACT INTERESTINGNESS MEASURE FOR ASSOCIATION PATTERNS Many techniques

Chapter 11 Categorical Data Analysis Categorical Data and the Multinomial Distribution

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

DATA MINING LECTURE 4 Frequent Itemsets and Association Rules This is how it all started

Sambuz

Useful Links

Newsletter

Mail Us

Statistical Inference on Large Contingency Tables: Convergence, - PowerPoint PPT Presentation

Preliminaries Convergence of contingency tables Testability Homogeneous partitions, spectra Application References Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability Marianna Bolla Institute of

Business Statistics CONTENTS Contingency tables Independence of categorical variables 2 2

The Set of 3 4 4 Contingency Tables has 3-Neighborhood Property Toshio Sumi and Toshio

Counting Contingency Tables Igor Pak, UCLA Combinatorics Seminar, OSU, September 17, 2020 1

Presentation of medical data. Frequency tables and contingency tables. Visualization.

TIMES TABLES HOW WE TEACH TIMES TABLES AND HOW YOU CAN HELP WHY ARE TIMES TABLES IMPORTANT?

NZ Data Tables Data tables sit alongside the Active NZ main report The data tables provide

Symbol tables COMP 520 Fall 2013 Symbol tables (2) Symbol tables are used to describe and analyse

Statistical Issues Associated With Multi-way Contingency Tables &amp; Links to Algebraic Geometry

Contingency planning and Outbreak management Nia Meddins Plant Health Policy Lead What does

Contingency Plan Contingency Plan i h i h in the events of in the events of f f Aberrant

Development of the Asia/Pacific Regional ATM Contingency Plan Shane Sumner Regional Officer

Humanitarian Response Plan Crisis preparedness and contingency Ongoing crisis Changes

Fundamentals of Evolution Session 22 - 11/27/2018 Contingency and Development 1 Contingency in

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

STAT 113 Describing Categorical Data Colin Reimer Dawson Oberlin College September 7, 2017 1 /

X1D: Create Pivot Tables using Excel 2013 3/07/2018 V1N Create Pivot Tables using Excel 2013 1

Reading Wikipedia to Answer Open-Domain Questions Authors - Danqi Chen Introduction

Observations on the modern NSM toolchest Christian Kreibich christian@lastline.com Bro4Pros,

!&quot;#$%&amp;' +,-./,.-01+,-./,.-02/3456-78398 +0:.09/01+,-./,.-02/3456-78398

CDC Update Regarding Aerosol vs. Airborne vs. Droplet Transmission &amp;

SELECT THE RIGHT ABSTRACT INTERESTINGNESS MEASURE FOR ASSOCIATION PATTERNS Many techniques

Chapter 11 Categorical Data Analysis Categorical Data and the Multinomial Distribution

Part III Unstructured Data Data Retrieval: III.1 Unstructured data and data retrieval

DATA MINING LECTURE 4 Frequent Itemsets and Association Rules This is how it all started

Sambuz

Useful Links

Newsletter

Mail Us

Statistical Issues Associated With Multi-way Contingency Tables & Links to Algebraic Geometry

!"#$%&' +,-./,.-01+,-./,.-02/3456-78398 +0:.09/01+,-./,.-02/3456-78398

CDC Update Regarding Aerosol vs. Airborne vs. Droplet Transmission &