13 Jan, 2011 Statistical Literacy: Confounding UTSA Confounding - - PDF document

13 jan 2011 statistical literacy confounding
SMART_READER_LITE
LIVE PREVIEW

13 Jan, 2011 Statistical Literacy: Confounding UTSA Confounding - - PDF document

13 Jan, 2011 Statistical Literacy: Confounding UTSA Confounding 2011 1 2011 2 Statistical Literacy: Statistical Literacy Confounding Statistical literacy is the ability to read and MILO SCHIELD, interpret summary statistics in everyday


slide-1
SLIDE 1

Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 1

UTSA Confounding 2011

1

MILO SCHIELD,

Augsburg College

Director, W. M. Keck Statistical Literacy Project Vice President, National Numeracy Network US Rep., International Statistical Literacy Project

January 13, 2011 University of Texas San Antonio (UTSA) Slides at www.StatLit.org/pdf/ 2011-Schield-UTSA-Confounding-Slides.pdf

Statistical Literacy: Confounding

2011

2

Statistical Literacy

Statistical literacy is the ability to read and interpret summary statistics in everyday life. Statistical Literacy studies (1) the relation between statistical associations and causation, and (2) the full-range of influences on a statistic

  • r on a statistical association. [Take CARE]
2011

3

Take CARE: Context

The influence of factors taken into account by

  • data broken out by subgroups in tables and graphs
  • averages, ratios and comparisons of averages and ratios
  • epidemiological models (cf., deaths attributed to obesity)
  • regression models and
  • the study design (cf., longitudinal vs. cross-sectional;

experiment vs. observational study). The influence of related factors (confounders) not taken into account in the study and not blocked by the study design.

2011

4

Controlling for a confounder can DECREASE an association

MN has 3.8 times as much prison expense as ME MN has 3.4 times as many inmates as ME MN has 25% more prison expense per inmate than ME State Total # Inmates Per Inmate MN $184M 4,865 $37,825 ME $48M 1,424 $33,711

2011

5

Controlling for a confounder can NULLIFY an association

MD has 3 times as much prison expense as KS MD has three times as many inmates as KS MD has the same prison expense per inmate as KS State Total # Inmates Per Inmate MD $481M 21,623 $22,250 KS $159M 7,148 $22,250

2011

6

Controlling for a confounder can REVERSE an association

CA has 50% more prison expense than NY CA has almost twice as many inmates as NY CA has 25% less prison expense per inmate than NY State Total # Inmates Per Inmate CA $2.9B 136K $21,385 NY $1.9B 69K $28,426

slide-2
SLIDE 2

Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 2

2011

7

Controlling for a confounder can INCREASE an association

MN has 27% more prison expense than IA MN has 18% fewer inmates than IA MN has 56% more prison expense per inmate than IA State Total # Inmates Per Inmate MN $184M 4,865 $37,825 IA $144M 5,929 $24,286

2011

8

Association vs. Causation

.

SEASON WINS vs. TOTAL PAYROLL

US Major League Baseball 52 62 72 82 92 102 10 20 30 40 50 60 Total Payroll ($Millions) 1995 Season Wins Yankees BlueJays Indians Twins Marlins Rangers Mets Padres Braves Orioles Red Sox Reds Expos Pirates Tigers

2011

9

Adjusting for Land Size: Standardize on Average Lot

House Prices (Average Acres = 1.6)

$50,000 $150,000 $250,000 $350,000 $450,000 1 2 3 4 5 6 Land Size (Acres)

2004AssessMTB

Best-Fit Line

2011

10

SAT VERBAL SCORES: FLAT

GROUP 1981 2002 CHANGE White 519 (85%) 527 (65%) 8 Black 412 (9%) 431 (11%) 19 Asian 474 (3%) 501 (10%) 27 Mexican 438 (2%) 446 (4%) 8 Puerto Rican 437 (1%) 455 (3%) 18 American Indian 471 (0%) 479 (1%) 8 ALL Test takers 504 (100%) 504 (100%) ZERO

2011

11

Multivariate Analysis can be Complex

To simplify, consider cases with

  • a binary outcome,
  • a binary predictor and
  • a binary confounder.

What are the necessary conditions for nullification or a reversal?

See Schield (1999) and Schield and Burnham (2003)

2011

12

City Hospital: Hospital of Death??

.

Hospital Total Died Death Rate City 1,000 55 5.50% Rural 1,000 35 3.50% Both 2,000 90 4.50% Condition Total Died Death Rate Good 800 15 1.90% Poor 1,200 75 6.30%

slide-3
SLIDE 3

Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 3

2011

13

Can this confounder nullify or reverse this association?

.

4.5% 6.3% 1.9% 5.5% 3.5% By Hospital By Patient Condition

4.4 Pct. Pts 2 Pct.Pts

Death Rates

Rural City Overall Poor health Good health

230% more 60% more

2011

14

Confounder Reverses; City Hospital is Better

.

Condition Hospital Total Died Death Rate Good City 100 1 1.00% Rural 700 14 2.00% Total 800 15 1.90% Poor City 900 54 6.00% Rural 300 21 7.00% Total 1,200 75 6.30%

2011

15

Two-Group Rates with a Binary Confounder

.

0,0 1,0 Ra Rb Rc Rd AQ XQ AP XP

A: Associated B: confounder. E: effect

BP BQ XN XM 0,1 1,1

2011

16

Compare Hospital Death Rates Confounder: Patient Condition

.

A Confounder can Influence a Difference

0% 1% 2% 3% 4% 5% 6% 7% 0% 20% 40% 60% 80% 100%

Percentage who are in "Poor" Condition

Death Rate

2011

17

Standardize on combined confounder percentage

.

Standardizing Can Reverse A Difference

0% 1% 2% 3% 4% 5% 6% 7% 0% 20% 40% 60% 80% 100%

Percentage who are in "Poor" Condition

Death Rate

2011

18

Adjusting for Land Size

Auto Deaths and Airbag Presence

Confounded by Seatbelt Use

15 43 70 98 125 0% 20% 40% 60% 80% 100%

Percentage who wear Seatbelts

Death Rate per 10,00 Accidents .. None All Airbag No Airbag Airbag Standardized

slide-4
SLIDE 4

Statistical Literacy: Confounding 13 Jan, 2011 2011-Schield-UTSA-Confounding-Slides.pdf 4

2011

19

Adjusting for Land Size

Subscription Renewal Rates by Month

Confounded by Change in Subscription Mix

10% 20% 30% 40% 50% 60% 70% 80% 0% 20% 40% 60% 80% 100%

Percentage of Renewals which are Agent Renewal Rate January Standardize February 10% 40% 46%

2011

20

Confounder: Race

2000n NAEP 4th Grade Math Standardized Scores: LA vs WV

204 230 203 226 200 205 210 215 220 225 230 0% 20% 40% 60% 80% 100% Percentage who are White NAEP Scores

LA WV Std.

2011

21

Confounder: Family Structure

Income: US Families by Race & Structure

$10,000 $15,000 $20,000 $25,000 $30,000 $35,000 $40,000 $45,000 $50,000 $55,000 $60,000 $65,000 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Percentage who are headed by Married Couple

Mean Income Black Families 78% White Families 82% 48% Population

2011

22

Control for Mom’s Age

2011

23

Controlling Can Change Statistical Significance

2011

24

Conclusion

Statistical educators must show students how confounders can influence associations and change statistical significance. The failure of educators to do this may be seen as “statistical negligence.”

Schield (1999). Simpson's Paradox and Cornfield's Conditions, See www.StatLit.org/pdf/1999SchieldASA.pdf. Schield, Milo (2006). Presenting Confounding and Standardization

  • Graphically. STATS Magazine, ASA. Fall 2006. pp. 14-18.

Draft at www.StatLit.org/pdf/2006SchieldSTATS.pdf. Schield, Milo (2009). Confound Those Speculative Statistics. 2009 ASA Proceedings of the Section on Statistical Education. [CD- ROM] 4255-4266. www.StatLit.org/pdf/2009SchieldASA.pdf

slide-5
SLIDE 5

UTSA Confounding 2011

1

MILO SCHIELD,

Augsburg College

Director, W. M. Keck Statistical Literacy Project Vice President, National Numeracy Network US Rep., International Statistical Literacy Project

January 13, 2011 University of Texas San Antonio (UTSA) Slides at www.StatLit.org/pdf/ 2011-Schield-UTSA-Confounding-Slides.pdf

Statistical Literacy: Confounding

slide-6
SLIDE 6

2011

2

Statistical Literacy

Statistical literacy is the ability to read and interpret summary statistics in everyday life. Statistical Literacy studies (1) the relation between statistical associations and causation, and (2) the full-range of influences on a statistic

  • r on a statistical association. [Take CARE]
slide-7
SLIDE 7

2011

3

Take CARE: Context

The influence of factors taken into account by

  • data broken out by subgroups in tables and graphs
  • averages, ratios and comparisons of averages and ratios
  • epidemiological models (cf., deaths attributed to obesity)
  • regression models and
  • the study design (cf., longitudinal vs. cross-sectional;

experiment vs. observational study). The influence of related factors (confounders) not taken into account in the study and not blocked by the study design.

slide-8
SLIDE 8

2011

4

Controlling for a confounder can DECREASE an association

MN has 3.8 times as much prison expense as ME MN has 3.4 times as many inmates as ME MN has 25% more prison expense per inmate than ME State Total # Inmates Per Inmate MN $184M 4,865 $37,825 ME $48M 1,424 $33,711

slide-9
SLIDE 9

2011

5

Controlling for a confounder can NULLIFY an association

MD has 3 times as much prison expense as KS MD has three times as many inmates as KS MD has the same prison expense per inmate as KS State Total # Inmates Per Inmate MD $481M 21,623 $22,250 KS $159M 7,148 $22,250

slide-10
SLIDE 10

2011

6

Controlling for a confounder can REVERSE an association

CA has 50% more prison expense than NY CA has almost twice as many inmates as NY CA has 25% less prison expense per inmate than NY State Total # Inmates Per Inmate CA $2.9B 136K $21,385 NY $1.9B 69K $28,426

slide-11
SLIDE 11

2011

7

Controlling for a confounder can INCREASE an association

MN has 27% more prison expense than IA MN has 18% fewer inmates than IA MN has 56% more prison expense per inmate than IA State Total # Inmates Per Inmate MN $184M 4,865 $37,825 IA $144M 5,929 $24,286

slide-12
SLIDE 12

2011

8

Association vs. Causation

.

SEASON WINS vs. TOTAL PAYROLL

US Major League Baseball 52 62 72 82 92 102 10 20 30 40 50 60 Total Payroll ($Millions) 1995 Season Wins Yankees BlueJays Indians Twins Marlins Rangers Mets Padres Brav es Orioles Red Sox Reds Expos Pirates Tigers

slide-13
SLIDE 13

2011

9

Adjusting for Land Size: Standardize on Average Lot

House Prices (Average Acres = 1.6)

$50,000 $150,000 $250,000 $350,000 $450,000 1 2 3 4 5 6 Land Size (Acres)

2004AssessMTB

Best-Fit Line

slide-14
SLIDE 14

2011

10

SAT VERBAL SCORES: FLAT

GROUP 1981 2002 CHANGE White 519 (85%) 527 (65%) 8 Black 412 (9%) 431 (11%) 19 Asian 474 (3%) 501 (10%) 27 Mexican 438 (2%) 446 (4%) 8 Puerto Rican 437 (1%) 455 (3%) 18 American Indian 471 (0%) 479 (1%) 8 ALL Test takers 504 (100%) 504 (100%) ZERO

slide-15
SLIDE 15

2011

11

Multivariate Analysis can be Complex

To simplify, consider cases with

  • a binary outcome,
  • a binary predictor and
  • a binary confounder.

What are the necessary conditions for nullification or a reversal?

See Schield (1999) and Schield and Burnham (2003)

slide-16
SLIDE 16

2011

12

City Hospital: Hospital of Death??

.

Hospital Total Died Death Rate City 1,000 55 5.50% Rural 1,000 35 3.50% Both 2,000 90 4.50% Condition Total Died Death Rate Good 800 15 1.90% Poor 1,200 75 6.30%

slide-17
SLIDE 17

2011

13

Can this confounder nullify or reverse this association?

.

4.5% 6.3% 1.9% 5.5% 3.5% By Hospital By Patient Condition

4.4 Pct. Pts 2 Pct.Pts

Death Rates

Rural City Overall Poor health Good health

230% more 60% more

slide-18
SLIDE 18

2011

14

Confounder Reverses; City Hospital is Better

.

Condition Hospital Total Died Death Rate Good City 100 1 1.00% Rural 700 14 2.00% Total 800 15 1.90% Poor City 900 54 6.00% Rural 300 21 7.00% Total 1,200 75 6.30%

slide-19
SLIDE 19

2011

15

Two-Group Rates with a Binary Confounder

.

0,0 1,0 Ra Rb Rc Rd AQ XQ AP XP

A: Associated B: confounder. E: effect

BP BQ XN XM 0,1 1,1

slide-20
SLIDE 20

2011

16

Compare Hospital Death Rates Confounder: Patient Condition

.

A Confounder can Influence a Difference

0% 1% 2% 3% 4% 5% 6% 7% 0% 20% 40% 60% 80% 100%

Percentage who are in "Poor" Condition

Death Rate

slide-21
SLIDE 21

2011

17

Standardize on combined confounder percentage

.

Standardizing Can Reverse A Difference

0% 1% 2% 3% 4% 5% 6% 7% 0% 20% 40% 60% 80% 100%

Percentage who are in "Poor" Condition

Death Rate

slide-22
SLIDE 22

2011

18

Adjusting for Land Size

Auto Deaths and Airbag Presence

Confounded by Seatbelt Use

15 43 70 98 125 0% 20% 40% 60% 80% 100%

Percentage who wear Seatbelts

Death Rate per 10,00 Accidents ..

None All Airbag No Airbag Airbag Standardized

slide-23
SLIDE 23

2011

19

Adjusting for Land Size

Subscription Renewal Rates by Month

Confounded by Change in Subscription Mix

10% 20% 30% 40% 50% 60% 70% 80% 0% 20% 40% 60% 80% 100%

Percentage of Renewals which are Agent

Renewal Rate

January Standardize February 10% 40% 46%

slide-24
SLIDE 24

2011

20

Confounder: Race

2000n NAEP 4th Grade Math Standardized Scores: LA vs WV

204 230 203 226 200 205 210 215 220 225 230 0% 20% 40% 60% 80% 100% Percentage who are White NAEP Scores

LA WV Std.

slide-25
SLIDE 25

2011

21

Confounder: Family Structure

Income: US Families by Race & Structure

$10,000 $15,000 $20,000 $25,000 $30,000 $35,000 $40,000 $45,000 $50,000 $55,000 $60,000 $65,000 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Percentage who are headed by Married Couple

Mean Income

Black Families 78% White Families 82% 48% Population

slide-26
SLIDE 26

2011

22

Control for Mom’s Age

slide-27
SLIDE 27

2011

23

Controlling Can Change Statistical Significance

slide-28
SLIDE 28

2011

24

Conclusion

Statistical educators must show students how confounders can influence associations and change statistical significance. The failure of educators to do this may be seen as “statistical negligence.”

Schield (1999). Simpson's Paradox and Cornfield's Conditions, See www.StatLit.org/pdf/1999SchieldASA.pdf. Schield, Milo (2006). Presenting Confounding and Standardization

  • Graphically. STATS Magazine, ASA. Fall 2006. pp. 14-18.

Draft at www.StatLit.org/pdf/2006SchieldSTATS.pdf. Schield, Milo (2009). Confound Those Speculative Statistics. 2009 ASA Proceedings of the Section on Statistical Education. [CD- ROM] 4255-4266. www.StatLit.org/pdf/2009SchieldASA.pdf