business statistics
play

Business Statistics CONTENTS Contingency tables Independence of - PowerPoint PPT Presentation

CONTINGENCY TABLES: TESTS Business Statistics CONTENTS Contingency tables Independence of categorical variables 2 2 -tables Old exam question Further study CONTINGENCY TABLES Contingency table (see Summarizing data): matrix with


  1. CONTINGENCY TABLES: TESTS Business Statistics

  2. CONTENTS Contingency tables Independence of categorical variables 2 × 2 -tables Old exam question Further study

  3. CONTINGENCY TABLES Contingency table (see “Summarizing data”): ▪ matrix with counts ▪ rows represent levels of categorical variable 𝑌 ( = 1,2 ) ▪ columns represent levels of categorical variable 𝑍 ( = 1,2,3 ) ▪ “margins” contain totals

  4. CONTINGENCY TABLES Simple example: election poll (sample) ▪ three cities ▪ four parties ▪ count data ▪ is political preference in these cities the same?

  5. CONTINGENCY TABLES Notation for contingency table ▪ #counts in cell 𝑘, 𝑙 : 𝑜 𝑘𝑙 𝑑 ▪ total in row 𝑘 : 𝑜 𝑘⋅ = σ 𝑙=1 𝑜 𝑘𝑙 𝑠 ▪ total in column 𝑙 : 𝑜 ⋅𝑙 = σ 𝑘=1 𝑜 𝑘𝑙 𝑑 𝑠 ▪ “total total”: 𝑜 ⋅⋅ = σ 𝑙=1 σ 𝑘=1 𝑜 𝑘𝑙 there are 𝑠 rows and 𝑑 columns 1 2 3 tot 1 𝑜 11 𝑜 12 𝑜 13 𝑜 1∙ ▪ 2 𝑜 21 𝑜 22 𝑜 23 𝑜 2∙ tot 𝑜 ∙1 𝑜 ∙2 𝑜 ∙3 𝑜 ∙∙

  6. INDEPENDENCE OF CATEGORICAL VARIABLES What does it mean when categorical variable 𝑌 is independent of categorical variable 𝑍 ? ▪ knowledge of 𝑦 𝑗 doesn’t help you to predict 𝑧 𝑗 ▪ 𝑄 𝑍 = 𝑙 𝑌 = 𝑘 = 𝑄 𝑍 = 𝑙 ▪ 𝑄 𝑌 = 𝑘 ∩ 𝑍 = 𝑙 = 𝑄 𝑌 = 𝑘 𝑄 𝑍 = 𝑙 ▪ (for all 𝑘 and 𝑙 ) Can we calculate a statistic for independence?

  7. INDEPENDENCE OF CATEGORICAL VARIABLES Again 1 2 3 tot 1 𝑜 11 𝑜 12 𝑜 13 𝑜 1∙ ▪ 2 𝑜 21 𝑜 22 𝑜 23 𝑜 2∙ tot 𝑜 ∙1 𝑜 ∙2 𝑜 ∙3 𝑜 ∙∙ For the totals we have: 𝑑 ▪ for row 𝑘 : 𝑜 𝑘∙ = 𝑜 𝑘1 + 𝑜 𝑘2 + 𝑜 𝑘3 = σ 𝑙=1 𝑜 𝑘𝑙 𝑠 ▪ for column 𝑙 : 𝑜 ∙𝑙 = 𝑜 1𝑙 + 𝑜 2𝑙 = σ 𝑘=1 𝑜 𝑘𝑙 𝑠 𝑑 ▪ for entire table: 𝑜 ∙∙ = σ 𝑘=1 𝑜 𝑘∙ = σ 𝑙=1 𝑜 ∙𝑙

  8. INDEPENDENCE OF CATEGORICAL VARIABLES ▪ There are 𝑜 𝑘∙ (out of 𝑜 ∙∙ ) cases in row 𝑘 ▪ and 𝑜 ∙𝑙 (out of 𝑜 ∙∙ ) cases in column 𝑙 𝑜 𝑘∙ ▪ A fraction 𝑜 ∙∙ of the cases is in row 𝑘 𝑜 ∙𝑙 ▪ and a fraction 𝑜 ∙∙ of the cases is in column 𝑙 ▪ So, if there is no dependence between row 𝑘 and column 𝑜 𝑘∙ 𝑜 ∙𝑙 𝑙 , we expect to have a fraction 𝑜 ∙∙ of the cases in 𝑜 ∙∙ × cell 𝑘, 𝑙 𝑜 𝑘∙ 𝑜 𝑘∙ ×𝑜 ∙𝑙 𝑜 ∙𝑙 ▪ which gives an expected count of 𝑜 ∙∙ × 𝑜 ∙∙ × 𝑜 ∙∙ = 𝑜 ∙∙ 𝑘𝑙 = 𝑜 𝑘∙ × 𝑜 ∙𝑙 𝑓 𝑜 ∙∙

  9. EXERCISE 1 Given is the following contingency table: If the two variables (party preference and state) would be independent, what is the expected count for Democrat/Utah?

  10. INDEPENDENCE OF CATEGORICAL VARIABLES ▪ Compare expected count ( 𝑓 𝑘𝑙 ) in cell 𝑘, 𝑙 to observed count ( 𝑜 𝑘𝑙 ) ▪ Discrepancy between observed count and expected count under the null hypothesis of independence: 𝑜 𝑘𝑙 − 𝑓 𝑘𝑙 ▪ Can we aggregate this over all cells ( 𝑘 = 1, … , 𝑠 and 𝑙 = 1, … , 𝑑 )? ▪ Yes, but (again!) positive and negative deviations would easily cancel each other 2 over all ▪ so we aggregate the squared discrepancy 𝑜 𝑘𝑙 − 𝑓 𝑘𝑙 cells

  11. INDEPENDENCE OF CATEGORICAL VARIABLES Still one thing to do: ▪ a discrepancy of 5 when 8 expected is much worse than a discrepancy of 5 when 1000 is expected 2 𝑜 𝑘𝑙 −𝑓 𝑘𝑙 ▪ so we “standardize” by the expected count: 𝑓 𝑘𝑙 Therefore, we measure the overall discrepancy between expected and observed frequencies as: 𝑠 𝑑 2 𝑜 𝑘𝑙 − 𝑓 𝑘𝑙 ෍ ෍ 𝑓 𝑘𝑙 𝑘=1 𝑙=1

  12. INDEPENDENCE OF CATEGORICAL VARIABLES It can be shown that under 𝐼 0 (independence of 𝑌 (rows) and 𝑍 (columns)): 𝑠 𝑑 2 𝑜 𝑘𝑙 − 𝑓 𝑘𝑙 2 ෍ ෍ ~𝜓 𝑠−1 𝑑−1 𝑓 𝑘𝑙 𝑘=1 𝑙=1 provided that all 𝑓 𝑘𝑙 ≥ 5 2 ▪ Therefore, we call our test value 𝜓 calc 2 2 ▪ Reject 𝐼 0 at 𝛽 when 𝜓 calc > 𝜓 𝑠−1 𝑑−1 ;𝛽 2 ▪ for large values of 𝜓 calc only 1-tailed, but 2-sided ...

  13. INDEPENDENCE OF CATEGORICAL VARIABLES Example: Calculations: ▪ 𝑜 GATT,Christ = 55 , etc. 119×75 = 58.3 , etc. ▪ 𝑓 GATT,Christ = 153 55−58.3 2 2 + ⋯ = 1.88 ( 2 × 3 = 6 terms) ▪ 𝜓 calc = 58.3 2 2 ▪ 𝜓 crit;upper = 𝜓 2;0.05 = 5.991 ▪ do not reject 𝐼 0 and conclude that there is no evidence of dependence between religion and GATT membership

  14. INDEPENDENCE OF CATEGORICAL VARIABLES Observed and (under 𝐼 0 ) expected counts 𝜓 2 calculations and standardized residuals 2 𝑜 𝑘𝑙 − 𝑓 𝑜 𝑘𝑙 − 𝑓 𝑘𝑙 𝑘𝑙 2 𝜓 calc 𝑓 𝑓 𝑘𝑙 𝑘𝑙

  15. EXERCISE 2 Find the critical value (at 𝛽 = 5% ) of the appropriate distribution for a contingency table of 3 rows and 4 columns (without the totals).

  16. INDEPENDENCE OF CATEGORICAL VARIABLES ▪ Step 1: ▪ 𝐼 0 : GATT membership and religion are independent; 𝐼 1 : GATT membership and religion are dependent; 𝛽 = 0.05 ▪ Step 2: 2 𝑜 obs −𝑜 exp ▪ sample statistic 𝜓 2 = σ ; reject for large values 𝑜 exp ▪ Step 3: 2 ▪ under 𝐼 0 : 𝜓 2 ~𝜓 𝑒𝑔=2 ▪ requirement: all expected counts ≥ 5 ▪ Step 4: 2 2 2 = 1.88 ; 𝜓 crit = 5 .991 ▪ 𝜓 calc = 𝜓 2;0.05 ▪ Step 5: ▪ do not reject 𝐼 0 at 𝛽 = 0.05 and conclude that ...

  17. 2 × 2 -TABLES Suppose there are only two rows and columns ▪ e.g., GATT/no GATT and Christ/no Christ ▪ or male/female and right-handed/left-handed We can still use contingency tables to check for independency But there is a more versatile way: ▪ test for two proportions (see: comparing two 𝜌 s)

  18. 2 × 2 -TABLES Using a contingency table ▪ 𝜓 2 = 0.7576 2 2 ▪ 𝜓 crit = 𝜓 1;0.05 = 3.841 ▪ 𝑞 −value = 0.384 ▪ independence not rejected

  19. 2 × 2 -TABLES

  20. 2 × 2 -TABLES ▪ Approach 1: (see also next lecture) ▪ group 1 = female; group 2 = male ▪ “success” = left -handed ▪ 𝐼 0 : 𝜌 1 = 𝜌 2 12 24 ▪ 𝑞 1 = 120 = 0.10 ; 𝑞 2 = 180 = 0.13 12+24 ▪ pooled proportion: ҧ 𝑞 = 120+180 = 0.12 0.10−0.13 ▪ 𝑨 calc = = −0.87 120 + 1 1 0.12 1−0.12 180 ▪ 𝑨 crit = 𝑨 0.025 = −1.96 ▪ 𝑞 −value = 2 × 0.192 = 0.384 ▪ there is no indication that the proportion of left-handed persons depends on gender

  21. 2 × 2 -TABLES Approach 2: (see also next lecture) ▪ group 1 = left-handed; group 2 = right-handed ▪ “success” = female ▪ 𝐼 0 : 𝜌 1 = 𝜌 2 12 108 ▪ 𝑞 1 = 36 = 0.33 ; 𝑞 2 = 264 = 0.41 12+108 ▪ pooled proportion: ҧ 𝑞 = 36+264 = 0.40 0.33−0.41 ▪ 𝑨 calc = = −0.87 36 + 1 1 0.40 1−0.40 264 ▪ 𝑨 crit = 𝑨 0.025 = −1.96 ▪ 𝑞 −value = 2 × 0.192 = 0.384 ▪ there is no indication that the proportion of females depends on handedness

  22. 2 × 2 -TABLES Why is this method more versatile? ▪ It also allows to test different hypothesis than “no relation” or “ 𝜌 1 = 𝜌 2 ” ▪ For instance ▪ 𝜌 1 ≥ 𝜌 2 ▪ 𝜌 1 = 𝜌 2 + 0.2 ▪ The 𝜓 2 - test can only test independence (=“no relation”) ▪ but it has the benefit of also working for larger tables than 2 × 2

  23. OLD EXAM QUESTION 26 March 2015, Q1k

  24. FURTHER STUDY Doane & Seward 5/E 15.1 Tutorial exercises week 5 𝜓 2 -test comparing two proportions

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend