Business Statistics CONTENTS Contingency tables Independence of - - PowerPoint PPT Presentation

business statistics
SMART_READER_LITE
LIVE PREVIEW

Business Statistics CONTENTS Contingency tables Independence of - - PowerPoint PPT Presentation

CONTINGENCY TABLES: TESTS Business Statistics CONTENTS Contingency tables Independence of categorical variables 2 2 -tables Old exam question Further study CONTINGENCY TABLES Contingency table (see Summarizing data): matrix with


slide-1
SLIDE 1

CONTINGENCY TABLES: TESTS

Business Statistics

slide-2
SLIDE 2

Contingency tables Independence of categorical variables 2 × 2-tables Old exam question Further study CONTENTS

slide-3
SLIDE 3

Contingency table (see “Summarizing data”): ▪ matrix with counts ▪ rows represent levels of categorical variable 𝑌 (= 1,2) ▪ columns represent levels of categorical variable 𝑍 (= 1,2,3) ▪ “margins” contain totals CONTINGENCY TABLES

slide-4
SLIDE 4

Simple example: election poll (sample) ▪ three cities ▪ four parties ▪ count data ▪ is political preference in these cities the same? CONTINGENCY TABLES

slide-5
SLIDE 5

Notation for contingency table ▪ #counts in cell 𝑘, 𝑙 : 𝑜𝑘𝑙 ▪ total in row 𝑘: 𝑜𝑘⋅ = σ𝑙=1

𝑑

𝑜𝑘𝑙 ▪ total in column 𝑙: 𝑜⋅𝑙 = σ𝑘=1

𝑠

𝑜𝑘𝑙 ▪ “total total”: 𝑜⋅⋅ = σ𝑙=1

𝑑

σ𝑘=1

𝑠

𝑜𝑘𝑙 ▪ 1 2 3 tot 1 𝑜11 𝑜12 𝑜13 𝑜1∙ 2 𝑜21 𝑜22 𝑜23 𝑜2∙ tot 𝑜∙1 𝑜∙2 𝑜∙3 𝑜∙∙ CONTINGENCY TABLES

there are 𝑠 rows and 𝑑 columns

slide-6
SLIDE 6

What does it mean when categorical variable 𝑌 is independent of categorical variable 𝑍? ▪ knowledge of 𝑦𝑗 doesn’t help you to predict 𝑧𝑗 ▪ 𝑄 𝑍 = 𝑙 𝑌 = 𝑘 = 𝑄 𝑍 = 𝑙 ▪ 𝑄 𝑌 = 𝑘 ∩ 𝑍 = 𝑙 = 𝑄 𝑌 = 𝑘 𝑄 𝑍 = 𝑙 ▪ (for all 𝑘 and 𝑙) Can we calculate a statistic for independence? INDEPENDENCE OF CATEGORICAL VARIABLES

slide-7
SLIDE 7

Again ▪ 1 2 3 tot 1 𝑜11 𝑜12 𝑜13 𝑜1∙ 2 𝑜21 𝑜22 𝑜23 𝑜2∙ tot 𝑜∙1 𝑜∙2 𝑜∙3 𝑜∙∙ For the totals we have: ▪ for row 𝑘: 𝑜𝑘∙ = 𝑜𝑘1 + 𝑜𝑘2 + 𝑜𝑘3 = σ𝑙=1

𝑑

𝑜𝑘𝑙 ▪ for column 𝑙: 𝑜∙𝑙 = 𝑜1𝑙 + 𝑜2𝑙 = σ𝑘=1

𝑠

𝑜𝑘𝑙 ▪ for entire table: 𝑜∙∙ = σ𝑘=1

𝑠

𝑜𝑘∙ = σ𝑙=1

𝑑

𝑜∙𝑙 INDEPENDENCE OF CATEGORICAL VARIABLES

slide-8
SLIDE 8

▪ There are 𝑜𝑘∙ (out of 𝑜∙∙) cases in row 𝑘

▪ and 𝑜∙𝑙 (out of 𝑜∙∙) cases in column 𝑙

▪ A fraction

𝑜𝑘∙ 𝑜∙∙ of the cases is in row 𝑘

▪ and a fraction

𝑜∙𝑙 𝑜∙∙ of the cases is in column 𝑙

▪ So, if there is no dependence between row 𝑘 and column 𝑙, we expect to have a fraction

𝑜𝑘∙ 𝑜∙∙ × 𝑜∙𝑙 𝑜∙∙ of the cases in

cell 𝑘, 𝑙

▪ which gives an expected count of

𝑜𝑘∙ 𝑜∙∙ × 𝑜∙𝑙 𝑜∙∙ × 𝑜∙∙ = 𝑜𝑘∙×𝑜∙𝑙 𝑜∙∙

𝑓

𝑘𝑙 = 𝑜𝑘∙ × 𝑜∙𝑙

𝑜∙∙ INDEPENDENCE OF CATEGORICAL VARIABLES

slide-9
SLIDE 9

Given is the following contingency table: If the two variables (party preference and state) would be independent, what is the expected count for Democrat/Utah? EXERCISE 1

slide-10
SLIDE 10

▪ Compare expected count (𝑓

𝑘𝑙) in cell 𝑘, 𝑙 to observed

count (𝑜𝑘𝑙) ▪ Discrepancy between observed count and expected count under the null hypothesis of independence: 𝑜𝑘𝑙 − 𝑓

𝑘𝑙

▪ Can we aggregate this over all cells (𝑘 = 1, … , 𝑠 and 𝑙 = 1, … , 𝑑)? ▪ Yes, but (again!) positive and negative deviations would easily cancel each other

▪ so we aggregate the squared discrepancy 𝑜𝑘𝑙 − 𝑓

𝑘𝑙 2 over all

cells

INDEPENDENCE OF CATEGORICAL VARIABLES

slide-11
SLIDE 11

Still one thing to do: ▪ a discrepancy of 5 when 8 expected is much worse than a discrepancy of 5 when 1000 is expected ▪ so we “standardize” by the expected count:

𝑜𝑘𝑙−𝑓𝑘𝑙

2

𝑓𝑘𝑙

Therefore, we measure the overall discrepancy between expected and observed frequencies as: ෍

𝑘=1 𝑠

𝑙=1 𝑑

𝑜𝑘𝑙 − 𝑓

𝑘𝑙 2

𝑓

𝑘𝑙

INDEPENDENCE OF CATEGORICAL VARIABLES

slide-12
SLIDE 12

It can be shown that under 𝐼0 (independence of 𝑌 (rows) and 𝑍 (columns)): ෍

𝑘=1 𝑠

𝑙=1 𝑑

𝑜𝑘𝑙 − 𝑓

𝑘𝑙 2

𝑓

𝑘𝑙

~𝜓 𝑠−1

𝑑−1 2

▪ Therefore, we call our test value 𝜓calc

2

▪ Reject 𝐼0 at 𝛽 when 𝜓calc

2

> 𝜓 𝑠−1

𝑑−1 ;𝛽 2

▪ for large values of 𝜓calc

2

  • nly

INDEPENDENCE OF CATEGORICAL VARIABLES

provided that all 𝑓

𝑘𝑙 ≥ 5

1-tailed, but 2-sided ...

slide-13
SLIDE 13

Example: Calculations: ▪ 𝑜GATT,Christ = 55, etc. ▪ 𝑓GATT,Christ =

119×75 153

= 58.3, etc. ▪ 𝜓calc

2

=

55−58.3 2 58.3

+ ⋯ = 1.88 (2 × 3 = 6 terms) ▪ 𝜓crit;upper

2

= 𝜓2;0.05

2

= 5.991 ▪ do not reject 𝐼0 and conclude that there is no evidence of dependence between religion and GATT membership INDEPENDENCE OF CATEGORICAL VARIABLES

slide-14
SLIDE 14

Observed and (under 𝐼0) expected counts 𝜓2 calculations and standardized residuals INDEPENDENCE OF CATEGORICAL VARIABLES

𝑜𝑘𝑙 − 𝑓

𝑘𝑙 2

𝑓

𝑘𝑙

𝑜𝑘𝑙 − 𝑓

𝑘𝑙

𝑓

𝑘𝑙

𝜓calc

2

slide-15
SLIDE 15

Find the critical value (at 𝛽 = 5%) of the appropriate distribution for a contingency table of 3 rows and 4 columns (without the totals). EXERCISE 2

slide-16
SLIDE 16

▪ Step 1:

▪ 𝐼0: GATT membership and religion are independent; 𝐼1: GATT membership and religion are dependent; 𝛽 = 0.05

▪ Step 2:

▪ sample statistic 𝜓2 = σ

𝑜obs−𝑜exp

2

𝑜exp

; reject for large values

▪ Step 3:

▪ under 𝐼0: 𝜓2~𝜓𝑒𝑔=2

2

▪ requirement: all expected counts ≥ 5

▪ Step 4:

▪ 𝜓calc

2

= 1.88; 𝜓crit

2

= 𝜓2;0.05

2

= 5.991

▪ Step 5:

▪ do not reject 𝐼0 at 𝛽 = 0.05 and conclude that ...

INDEPENDENCE OF CATEGORICAL VARIABLES

slide-17
SLIDE 17

Suppose there are only two rows and columns ▪ e.g., GATT/no GATT and Christ/no Christ ▪ or male/female and right-handed/left-handed We can still use contingency tables to check for independency But there is a more versatile way: ▪ test for two proportions (see: comparing two 𝜌s) 2 × 2-TABLES

slide-18
SLIDE 18

Using a contingency table ▪ 𝜓2 = 0.7576 ▪ 𝜓crit

2

= 𝜓1;0.05

2

= 3.841 ▪ 𝑞−value = 0.384 ▪ independence not rejected 2 × 2-TABLES

slide-19
SLIDE 19

2 × 2-TABLES

slide-20
SLIDE 20

▪ Approach 1: (see also next lecture)

▪ group 1 = female; group 2 = male ▪ “success” = left-handed ▪ 𝐼0: 𝜌1 = 𝜌2 ▪ 𝑞1 =

12 120 = 0.10; 𝑞2 = 24 180 = 0.13

▪ pooled proportion: ҧ 𝑞 =

12+24 120+180 = 0.12

▪ 𝑨calc =

0.10−0.13 0.12 1−0.12

1 120+ 1 180

= −0.87 ▪ 𝑨crit = 𝑨0.025 = −1.96 ▪ 𝑞−value = 2 × 0.192 = 0.384 ▪ there is no indication that the proportion of left-handed persons depends on gender

2 × 2-TABLES

slide-21
SLIDE 21

Approach 2: (see also next lecture)

▪ group 1 = left-handed; group 2 = right-handed ▪ “success” = female ▪ 𝐼0: 𝜌1 = 𝜌2 ▪ 𝑞1 =

12 36 = 0.33; 𝑞2 = 108 264 = 0.41

▪ pooled proportion: ҧ 𝑞 =

12+108 36+264 = 0.40

▪ 𝑨calc =

0.33−0.41 0.40 1−0.40

1 36+ 1 264

= −0.87 ▪ 𝑨crit = 𝑨0.025 = −1.96 ▪ 𝑞−value = 2 × 0.192 = 0.384 ▪ there is no indication that the proportion of females depends on handedness

2 × 2-TABLES

slide-22
SLIDE 22

Why is this method more versatile? ▪ It also allows to test different hypothesis than “no relation”

  • r “𝜌1 = 𝜌2”

▪ For instance

▪ 𝜌1 ≥ 𝜌2 ▪ 𝜌1 = 𝜌2 + 0.2

▪ The 𝜓2-test can only test independence (=“no relation”)

▪ but it has the benefit of also working for larger tables than 2 × 2

2 × 2-TABLES

slide-23
SLIDE 23

26 March 2015, Q1k OLD EXAM QUESTION

slide-24
SLIDE 24

Doane & Seward 5/E 15.1 Tutorial exercises week 5 𝜓2-test comparing two proportions FURTHER STUDY