Business Statistics CONTENTS Contingency tables Independence of - PowerPoint PPT Presentation

CONTINGENCY TABLES: TESTS Business Statistics

CONTENTS Contingency tables Independence of categorical variables 2 × 2 -tables Old exam question Further study

CONTINGENCY TABLES Contingency table (see “Summarizing data”): ▪ matrix with counts ▪ rows represent levels of categorical variable 𝑌 ( = 1,2 ) ▪ columns represent levels of categorical variable 𝑍 ( = 1,2,3 ) ▪ “margins” contain totals

CONTINGENCY TABLES Simple example: election poll (sample) ▪ three cities ▪ four parties ▪ count data ▪ is political preference in these cities the same?

CONTINGENCY TABLES Notation for contingency table ▪ #counts in cell 𝑘, 𝑙 : 𝑜 𝑘𝑙 𝑑 ▪ total in row 𝑘 : 𝑜 𝑘⋅ = σ 𝑙=1 𝑜 𝑘𝑙 𝑠 ▪ total in column 𝑙 : 𝑜 ⋅𝑙 = σ 𝑘=1 𝑜 𝑘𝑙 𝑑 𝑠 ▪ “total total”: 𝑜 ⋅⋅ = σ 𝑙=1 σ 𝑘=1 𝑜 𝑘𝑙 there are 𝑠 rows and 𝑑 columns 1 2 3 tot 1 𝑜 11 𝑜 12 𝑜 13 𝑜 1∙ ▪ 2 𝑜 21 𝑜 22 𝑜 23 𝑜 2∙ tot 𝑜 ∙1 𝑜 ∙2 𝑜 ∙3 𝑜 ∙∙

INDEPENDENCE OF CATEGORICAL VARIABLES What does it mean when categorical variable 𝑌 is independent of categorical variable 𝑍 ? ▪ knowledge of 𝑦 𝑗 doesn’t help you to predict 𝑧 𝑗 ▪ 𝑄 𝑍 = 𝑙 𝑌 = 𝑘 = 𝑄 𝑍 = 𝑙 ▪ 𝑄 𝑌 = 𝑘 ∩ 𝑍 = 𝑙 = 𝑄 𝑌 = 𝑘 𝑄 𝑍 = 𝑙 ▪ (for all 𝑘 and 𝑙 ) Can we calculate a statistic for independence?

INDEPENDENCE OF CATEGORICAL VARIABLES Again 1 2 3 tot 1 𝑜 11 𝑜 12 𝑜 13 𝑜 1∙ ▪ 2 𝑜 21 𝑜 22 𝑜 23 𝑜 2∙ tot 𝑜 ∙1 𝑜 ∙2 𝑜 ∙3 𝑜 ∙∙ For the totals we have: 𝑑 ▪ for row 𝑘 : 𝑜 𝑘∙ = 𝑜 𝑘1 + 𝑜 𝑘2 + 𝑜 𝑘3 = σ 𝑙=1 𝑜 𝑘𝑙 𝑠 ▪ for column 𝑙 : 𝑜 ∙𝑙 = 𝑜 1𝑙 + 𝑜 2𝑙 = σ 𝑘=1 𝑜 𝑘𝑙 𝑠 𝑑 ▪ for entire table: 𝑜 ∙∙ = σ 𝑘=1 𝑜 𝑘∙ = σ 𝑙=1 𝑜 ∙𝑙

INDEPENDENCE OF CATEGORICAL VARIABLES ▪ There are 𝑜 𝑘∙ (out of 𝑜 ∙∙ ) cases in row 𝑘 ▪ and 𝑜 ∙𝑙 (out of 𝑜 ∙∙ ) cases in column 𝑙 𝑜 𝑘∙ ▪ A fraction 𝑜 ∙∙ of the cases is in row 𝑘 𝑜 ∙𝑙 ▪ and a fraction 𝑜 ∙∙ of the cases is in column 𝑙 ▪ So, if there is no dependence between row 𝑘 and column 𝑜 𝑘∙ 𝑜 ∙𝑙 𝑙 , we expect to have a fraction 𝑜 ∙∙ of the cases in 𝑜 ∙∙ × cell 𝑘, 𝑙 𝑜 𝑘∙ 𝑜 𝑘∙ ×𝑜 ∙𝑙 𝑜 ∙𝑙 ▪ which gives an expected count of 𝑜 ∙∙ × 𝑜 ∙∙ × 𝑜 ∙∙ = 𝑜 ∙∙ 𝑘𝑙 = 𝑜 𝑘∙ × 𝑜 ∙𝑙 𝑓 𝑜 ∙∙

EXERCISE 1 Given is the following contingency table: If the two variables (party preference and state) would be independent, what is the expected count for Democrat/Utah?

INDEPENDENCE OF CATEGORICAL VARIABLES ▪ Compare expected count ( 𝑓 𝑘𝑙 ) in cell 𝑘, 𝑙 to observed count ( 𝑜 𝑘𝑙 ) ▪ Discrepancy between observed count and expected count under the null hypothesis of independence: 𝑜 𝑘𝑙 − 𝑓 𝑘𝑙 ▪ Can we aggregate this over all cells ( 𝑘 = 1, … , 𝑠 and 𝑙 = 1, … , 𝑑 )? ▪ Yes, but (again!) positive and negative deviations would easily cancel each other 2 over all ▪ so we aggregate the squared discrepancy 𝑜 𝑘𝑙 − 𝑓 𝑘𝑙 cells

INDEPENDENCE OF CATEGORICAL VARIABLES Still one thing to do: ▪ a discrepancy of 5 when 8 expected is much worse than a discrepancy of 5 when 1000 is expected 2 𝑜 𝑘𝑙 −𝑓 𝑘𝑙 ▪ so we “standardize” by the expected count: 𝑓 𝑘𝑙 Therefore, we measure the overall discrepancy between expected and observed frequencies as: 𝑠 𝑑 2 𝑜 𝑘𝑙 − 𝑓 𝑘𝑙 ෍ ෍ 𝑓 𝑘𝑙 𝑘=1 𝑙=1

INDEPENDENCE OF CATEGORICAL VARIABLES It can be shown that under 𝐼 0 (independence of 𝑌 (rows) and 𝑍 (columns)): 𝑠 𝑑 2 𝑜 𝑘𝑙 − 𝑓 𝑘𝑙 2 ෍ ෍ ~𝜓 𝑠−1 𝑑−1 𝑓 𝑘𝑙 𝑘=1 𝑙=1 provided that all 𝑓 𝑘𝑙 ≥ 5 2 ▪ Therefore, we call our test value 𝜓 calc 2 2 ▪ Reject 𝐼 0 at 𝛽 when 𝜓 calc > 𝜓 𝑠−1 𝑑−1 ;𝛽 2 ▪ for large values of 𝜓 calc only 1-tailed, but 2-sided ...

INDEPENDENCE OF CATEGORICAL VARIABLES Example: Calculations: ▪ 𝑜 GATT,Christ = 55 , etc. 119×75 = 58.3 , etc. ▪ 𝑓 GATT,Christ = 153 55−58.3 2 2 + ⋯ = 1.88 ( 2 × 3 = 6 terms) ▪ 𝜓 calc = 58.3 2 2 ▪ 𝜓 crit;upper = 𝜓 2;0.05 = 5.991 ▪ do not reject 𝐼 0 and conclude that there is no evidence of dependence between religion and GATT membership

INDEPENDENCE OF CATEGORICAL VARIABLES Observed and (under 𝐼 0 ) expected counts 𝜓 2 calculations and standardized residuals 2 𝑜 𝑘𝑙 − 𝑓 𝑜 𝑘𝑙 − 𝑓 𝑘𝑙 𝑘𝑙 2 𝜓 calc 𝑓 𝑓 𝑘𝑙 𝑘𝑙

EXERCISE 2 Find the critical value (at 𝛽 = 5% ) of the appropriate distribution for a contingency table of 3 rows and 4 columns (without the totals).

INDEPENDENCE OF CATEGORICAL VARIABLES ▪ Step 1: ▪ 𝐼 0 : GATT membership and religion are independent; 𝐼 1 : GATT membership and religion are dependent; 𝛽 = 0.05 ▪ Step 2: 2 𝑜 obs −𝑜 exp ▪ sample statistic 𝜓 2 = σ ; reject for large values 𝑜 exp ▪ Step 3: 2 ▪ under 𝐼 0 : 𝜓 2 ~𝜓 𝑒𝑔=2 ▪ requirement: all expected counts ≥ 5 ▪ Step 4: 2 2 2 = 1.88 ; 𝜓 crit = 5 .991 ▪ 𝜓 calc = 𝜓 2;0.05 ▪ Step 5: ▪ do not reject 𝐼 0 at 𝛽 = 0.05 and conclude that ...

2 × 2 -TABLES Suppose there are only two rows and columns ▪ e.g., GATT/no GATT and Christ/no Christ ▪ or male/female and right-handed/left-handed We can still use contingency tables to check for independency But there is a more versatile way: ▪ test for two proportions (see: comparing two 𝜌 s)

2 × 2 -TABLES Using a contingency table ▪ 𝜓 2 = 0.7576 2 2 ▪ 𝜓 crit = 𝜓 1;0.05 = 3.841 ▪ 𝑞 −value = 0.384 ▪ independence not rejected

2 × 2 -TABLES

2 × 2 -TABLES ▪ Approach 1: (see also next lecture) ▪ group 1 = female; group 2 = male ▪ “success” = left -handed ▪ 𝐼 0 : 𝜌 1 = 𝜌 2 12 24 ▪ 𝑞 1 = 120 = 0.10 ; 𝑞 2 = 180 = 0.13 12+24 ▪ pooled proportion: ҧ 𝑞 = 120+180 = 0.12 0.10−0.13 ▪ 𝑨 calc = = −0.87 120 + 1 1 0.12 1−0.12 180 ▪ 𝑨 crit = 𝑨 0.025 = −1.96 ▪ 𝑞 −value = 2 × 0.192 = 0.384 ▪ there is no indication that the proportion of left-handed persons depends on gender

2 × 2 -TABLES Approach 2: (see also next lecture) ▪ group 1 = left-handed; group 2 = right-handed ▪ “success” = female ▪ 𝐼 0 : 𝜌 1 = 𝜌 2 12 108 ▪ 𝑞 1 = 36 = 0.33 ; 𝑞 2 = 264 = 0.41 12+108 ▪ pooled proportion: ҧ 𝑞 = 36+264 = 0.40 0.33−0.41 ▪ 𝑨 calc = = −0.87 36 + 1 1 0.40 1−0.40 264 ▪ 𝑨 crit = 𝑨 0.025 = −1.96 ▪ 𝑞 −value = 2 × 0.192 = 0.384 ▪ there is no indication that the proportion of females depends on handedness

2 × 2 -TABLES Why is this method more versatile? ▪ It also allows to test different hypothesis than “no relation” or “ 𝜌 1 = 𝜌 2 ” ▪ For instance ▪ 𝜌 1 ≥ 𝜌 2 ▪ 𝜌 1 = 𝜌 2 + 0.2 ▪ The 𝜓 2 - test can only test independence (=“no relation”) ▪ but it has the benefit of also working for larger tables than 2 × 2

OLD EXAM QUESTION 26 March 2015, Q1k

FURTHER STUDY Doane & Seward 5/E 15.1 Tutorial exercises week 5 𝜓 2 -test comparing two proportions

Business Statistics CONTENTS Contingency tables Independence of - PowerPoint PPT Presentation

CONTINGENCY TABLES: TESTS Business Statistics CONTENTS Contingency tables Independence of categorical variables 2 2 -tables Old exam question Further study CONTINGENCY TABLES Contingency table (see Summarizing data): matrix with

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Global assessment of linking trade statistics and the business register Nancy Snyder United

Introduction to Business Statistics Professor Jarad Niemi STAT 226 - Iowa State University

Business and Business Environment Business and Business Environment Introduction Business is

Business statistics and Globalisation UN Committee of Experts on Business Statistics First

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 3 t

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 4 t

REPUBLIC OF NAMIBIA WHAT IS FOREIGN TRADE STATISTICS WHAT IS FOREIGN TRADE STATISTICS Records

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

Next Generation BlueZ & Bluetooth Smart Devices Johan Hedberg (Intel) Bluetooth Short

An H.323 Videoconferencing Service for the German Research and Education Community TNC 2003

W3C Workshop on Privacy for Advanced Web APIs 12-13 July, 2010 I. Krontiris, A. Albers,

Teaching Arts Luncheon February 22, 2019 Your questions: Does grading provide feedback to help

COBS: A Compact Bit-Sliced Signature Index Timo Bingmann, Phelim Bradley, Florian Gauger, and Zamin

Natural Language Generation (Not Only) in Dialogue Systems Ond rej Du sek Institute of

Constructive and analytic enumeration of circulant graphs with p 3 vertices; p = 3 , 5 Joint work

Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabs Pczos

Sambuz

Useful Links

Newsletter

Mail Us

Business Statistics CONTENTS Contingency tables Independence of - PowerPoint PPT Presentation

CONTINGENCY TABLES: TESTS Business Statistics CONTENTS Contingency tables Independence of categorical variables 2 2 -tables Old exam question Further study CONTINGENCY TABLES Contingency table (see Summarizing data): matrix with

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

Areal statistics Barry Rowlingson Research Fellow DataCamp Spatial Statistics in R Borders

The Pulse monitors: Statistics Smartpods PULSE 1 - Improve Facility Efficiencies 2 - Increase

Quality Assurance in Official Statistics Directorate of Economics &amp; Statistics, Planning

UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics UK Bleeding Disorder Statistics

The Statistics Network The Statistics Network Statistics network Compute servers Desktop PCs

1 Practical Information 2 Introduction to Statistics Per Bruun Brockhoff 3 Descriptive Statistics:

Statistics for Social Sciences I: Introduction to Statistics Introduction to Statistics

Global assessment of linking trade statistics and the business register Nancy Snyder United

Introduction to Business Statistics Professor Jarad Niemi STAT 226 - Iowa State University

Business and Business Environment Business and Business Environment Introduction Business is

Business statistics and Globalisation UN Committee of Experts on Business Statistics First

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 3 t

Introduction to Business Statistics Introduction to Business Statistics QM 120 Ch Chapter 4 t

REPUBLIC OF NAMIBIA WHAT IS FOREIGN TRADE STATISTICS WHAT IS FOREIGN TRADE STATISTICS Records

AP Biology and Statistics Statistics Statistics help to better understand the meaning of a

Next Generation BlueZ &amp; Bluetooth Smart Devices Johan Hedberg (Intel) Bluetooth Short

An H.323 Videoconferencing Service for the German Research and Education Community TNC 2003

W3C Workshop on Privacy for Advanced Web APIs 12-13 July, 2010 I. Krontiris, A. Albers,

Teaching Arts Luncheon February 22, 2019 Your questions: Does grading provide feedback to help

COBS: A Compact Bit-Sliced Signature Index Timo Bingmann, Phelim Bradley, Florian Gauger, and Zamin

Natural Language Generation (Not Only) in Dialogue Systems Ond rej Du sek Institute of

Constructive and analytic enumeration of circulant graphs with p 3 vertices; p = 3 , 5 Joint work

Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabs Pczos

Sambuz

Useful Links

Newsletter

Mail Us

Quality Assurance in Official Statistics Directorate of Economics & Statistics, Planning

Next Generation BlueZ & Bluetooth Smart Devices Johan Hedberg (Intel) Bluetooth Short