STAT 201 Defence against the Dark Arts (for the Life Sciences) - - PDF document

stat 201 defence against the dark arts for the life
SMART_READER_LITE
LIVE PREVIEW

STAT 201 Defence against the Dark Arts (for the Life Sciences) - - PDF document

STAT 201 Defence against the Dark Arts (for the Life Sciences) Instructor: Professor Lockhart Office: 10561 E-mail: lockhart@sfu.ca Phone: 3264 Web site: http://www.stat.sfu.ca/~lockhart 1 Text: The Basic Practice of Statistics , 3rd


slide-1
SLIDE 1

STAT 201 Defence against the Dark Arts (for the Life Sciences)

Instructor: Professor Lockhart Office: 10561 E-mail: lockhart@sfu.ca Phone: 3264 Web site: http://www.stat.sfu.ca/~lockhart

1

slide-2
SLIDE 2

Text: The Basic Practice of Statistics, 3rd Edition, by David S. Moore, W.H. Freeman Publishers On 2 hour reserve in library. Lectures: Tuesday 10:30 – 12:10 without break. Thursday 10:30–11:20. Help: go to Stat Workshop K9516 inside K9510. Office Hours: Thursday 11:30 – 12:30; Friday 12:30 – 13:30

2

slide-3
SLIDE 3

Assignments: six – first five marked. Due: ev- ery second Tuesday by 4:30 PM in boxes out- side Stat Workshop. First due on 21 Sept. Late Assignments not accepted. Worth 20% of mark. Based on best 4 of 5 marked. Returned through Stat Workshop Exams: Midterm worth 30% on 21 October

  • 2004. CLOSED BOOK.

Final: 15 December 2004. OPEN BOOK worth 50% Make-up exams: medical note required. Missed midterm will be replaced by Final.

3

slide-4
SLIDE 4

Web material: slides posted on the web when possible. Assignment questions posted. Solutions posted evening after assignment due. Midterm solu- tions to be posted eventually. Extra material: perhaps but probably not. Computing: some questions to be done using program JMP. Assignment lab accounts created. JMP avail- able on PCs and Macs in assignment lab. JMP also available in Stat Workshop.

4

slide-5
SLIDE 5

Outline Univariate Descrip- tive Statistics 4 hours Chapters 1, 2, 3 Bivariate Descriptive Statistics 6 hours Chapters 4, 5, 6 Experimental Design 2 hours ? Probability 3 hours ? Binomial, Poisson distributions 3 hours ? Hypothesis Tests, confidence intervals 2 hours ? Midterm 1 hour Hypothesis Tests, confidence intervals 2 hours ? Two Sample tests 3 hours ? Inference in Regres- sion 3 hours ? 1 and 2 way ANOVA 3 hours ? Count data 3 hours ?

5

slide-6
SLIDE 6

Definition: Defence against the Dark Arts is the science of Data. How should it be collected? How should it be summarized? How should it be displayed? How should it be interpreted? Where are the pitfalls?

6

slide-7
SLIDE 7

Jargon Usual structure of data set. Individuals, subjects, cases, experimental units are all jargons used for the people or animals or plants or things on which measure- ments are made. Variables: the things measured.

7

slide-8
SLIDE 8

Example: case by variable presentation. Data on sea urchins: Urchin ID Age Size 3997 6.91 57.5 991 0.91 9.5 2163 2.41 29.5 15 0.49 0.5 2202 2.41 30.5 2862 3.42 44.5 1575 1.41 24.5 293 0.49 2.5 358 0.49 3.5 . . . . . . . . . Comment: 9 cases (of 250) shown, 3 variables. Comment: Notice poor scientific form – no units listed for Age or Size in on-line source.

8

slide-9
SLIDE 9

Example: weather in Central Park, New York for May Day Max Temp Sunshine Weather 1 72 5 18 2 75 4 1 3 65 4 NA 4 63 NA . . . . . . . . . Comment: “Weather” is code. ’1’ means Fog. ‘18’ not listed. Comment: NA means not available. Comment ‘5’ for sunshine means partly cloudy. ‘0’ is clear.

9

slide-10
SLIDE 10

Jargon: variable types. Nominal: categories with no particular order. Examples: Variable Sex has 2 “levels”: Male and Female. Variable Eye colour has levels like “blue”, “hazel”, “brown”. Ordinal: categories with an order. Examples: 5 point scales: “Paul Martin is do- ing a good job: Strongly Agree, Agree, Neu- tral, Disagree, Strongly Disagree” Sunshine in NY: 0 is most sunny, 10 is most cloudy. Categorical: either Nominal or Ordinal. Also called “qualitative”.

10

slide-11
SLIDE 11

Quantitative: numerical variable like value in $, age, height, weight. Interval: quantitative variable for which dis- tance from 1 to 2 is same as from 3 to 4. Ratio: Like Interval but with a natural value for 0. Discrete: Used for both Categorical variables and for variable with only integer values. Continuous: values between integers (in prin- ciple as finely measured as desired) Examples: Mass is ratio, temperature in de- grees Celsius is interval, number of murders in a week in Vancouver is quantitative but dis- cretes, temperature is continuous.. Note: 5 point scales (“Likert” scales in Psy- chology) often assigned numbers say 1-5 or 0-

  • 4. But is difference between “Strongly agree”

and “agree” same as between “agree” and “neu- tral”?

11

slide-12
SLIDE 12

Why the jargon? Sometimes helps identify suitable methods of data presentation, summarization and analysis. WARNING: many different forms of statistical jargon in use in different disciplines. Social Sciences: use nominal, ordinal, inter- val and ratio. Math Stat: use categorical, quantitative, dis- crete continuous. WARNING: all labels are sometimes open to debate. Is money “discrete”? (Integer num- ber of pennies but huge number of possible values.)

12

slide-13
SLIDE 13

Data Collection Exercise: VOLUNTARY On blank sheet of paper please provide: 1) Height 2) Weight 3) Sex 4) Value of Coins in pocket / purse 5) SFU credits completed. PLEASE DO NOT PUT YOUR NAME ON THIS. Give to me at end of class or put in box outside Stat Workshop. PURPOSE: provide data set to display and summarize

13

slide-14
SLIDE 14

Univariate Descriptive Statistics Displays: pie charts, bar graphs, box plots, his- tograms, density estimates, dot plots, stem- leaf plots, tables, lists. Example: sea urchin sizes

10 20 30 40 50 60

Boxplot

Urchin Size (mm)

Histogram

Urchin Size (mm) Number of Urchins 10 20 30 40 50 60 70 10 20 30 40 50 60 10 20 30 40 50 60

Dot Plot

Urchin Size (mm) −20 20 40 60 80 0.000 0.005 0.010 0.015

Density

Urchin Size (mm) Density

14

slide-15
SLIDE 15

Points: 1) Useful for quantitative variables. 2) Boxplot shows five point summary: mini- mum, first quartile, median, third quartile, max- imum. 3) Dot Plot illegible with 250 data points. (1 dot for each size plotted on line.) 4) Histogram, density plot serve similar pur- poses. 5) Density goes below 0: bad. 6) Histogram doesn’t show clustering density plot shows.

15

slide-16
SLIDE 16

Example: Categorical: Weather in Central Park

clear partly.cloudy cloudy

Pie Chart

clear partly.cloudy cloudy

Bar Graph

2 4 6 8 10

Pie chart harder to read. General summary: Pie Charts are bad. More useful with more categories. Ordering of categories important for nominal variables. Cloudiness is ordinal.

16

slide-17
SLIDE 17

Pie charts: wedge has area proportional to #

  • f individuals in category.

Bar chart: bar has height equal to # of indi- viduals in category. Density estimates not discussed in this course. Histogram: 1) divide range of values into intervals. 2) Count numbers of individuals in each inter- val. 3) bar AREA is proportional to # of individuals in interval; width is length of interval. 4) equal width bars best – then height propor- tional to # of individuals. 5) label x-axis; include units. 6) label y-axis.

17

slide-18
SLIDE 18

Example: Personal Income for BC (ages 15+). (For those with income.) Source: 2001 Cen- sus.

Adult Personal Income (BC)

Income ($000s) 20 40 60 80 100 0.00 0.01 0.02 0.03

18

slide-19
SLIDE 19

Points 1) Bar widths unequal – census tables given that way. 2) So take width times height to get area = fraction of population in that income group. 3) Last group on right open ended – artificially cut off at $100,000 by me. 4) Plot is “long-tailed to the right” or “skewed to the right”. 5) Based on 20% sample of 1,523,720 people aged 15 + in BC on census day, 2001.

19

slide-20
SLIDE 20

Comparison of 1996, 2001.

1996 Income

Density 20 40 60 80 100

2001 Income

Density 20 40 60 80 100

20

slide-21
SLIDE 21

Summarizing the pictures. Purposes: less space in text than a graph; pre- cise numerical comparison between groups. Summarizing a histogram: Where is centre of the x-axis values? Jargon: location or centre. How far do the x values extend on either side? Jargon: spread, variation, width. Is the picture symmetric or does it extend far- ther to right than left? Location and number of bumps.

21

slide-22
SLIDE 22

Measures of location: Mean, Arithmetic Mean, Average, Arith- metic Average: total of x-values divided by number of x values. Histogram balances at mean. (First Moment in physics.) Think of See-Saw: small kid far from centre balances big kid close to centre. Formula: data X1, . . . , Xn. ¯ X =

n

i=1 Xi

n Utility of summation notation in this course:

  • NIL. But ¯

X is standard notation for average of X. Median: number such that 1/2 of X values at least that large, and 1/2 of X values at least that small. Sort list: if n is odd median is middle of sorted list. If n is even take average of two middle values.

22

slide-23
SLIDE 23

Numerical examples: ages in my family: 50, 50, 20, 15, 8, 8. ¯ A = 50 + 50 + 20 + 15 + 8 + 8 6 = 151 6 ≈ 25.2 Median age: middle numbers are 15, 20. Halfway between is median = 17.5. Mode: most common value. Not useful con- cept in most cases. Location of tallest bar in histogram (affected by definition of classes). Mode of ages is not unique: 50 or 8. Not useful summary of centre.

23

slide-24
SLIDE 24

Comparison: Advantages of mean: 1) if your average weekly income is $100 you know how you will do in the long run; not so if median weekly income is $100. 2) Same point: average and sample size tells you total. 3) Has simpler mathematical behaviour than median. Advantages of median: Not influenced by extreme members of list. Median income, for instance, gives more infor- mation about typical person.

24

slide-25
SLIDE 25

Measures of spread: Standard Deviation Interquartile Range Mean Absolute Deviation. Deviations from the mean: subtract mean from each number in list: Xi − ¯

  • X. For my family de-

viations are 24.8, 24.8, −5.2, −10.2, −17.2, −17.2. Summarize size of deviations: Average is 0. Not useful as measure of size since pluses cancel minuse.

25

slide-26
SLIDE 26

Mean absolute deviation: take absolute values (ignore − signs) and average 24.8 + 24.8 + 5.2 + 10.2 + 17.2 + 17.2 6 = 16.6 years Standard deviation: square deviations, aver- age, take square root: s =

  • (24.8)2 + · · · + (−17.2)2

5 = 19.8 years. WARNING: notice the 5 not 6. This is Tradi-

  • tional. Not important in large data sets.

Jargon: variance is s2: s2 = (24.8)2 + · · · + (−17.2)2 5 = 390.6 years2

26

slide-27
SLIDE 27

Interquartile Range: First define quartiles, quintiles, etc. First, second and third quartiles split list into 4 equal pieces. One quarter of list below first quartile, two quarters below second, three quarters below third. Second quartile is median. Interquartile range is third quartile minus first quartile. Book gives method to find quartiles. Quintiles split list into 5 equal parts. Percentiles split list into 100 equal parts.

27

slide-28
SLIDE 28

Comparison: Advantages of IQR: like median not influenced by extremes. Easily related to proportions of population. But: rather than use 2 number summary (me- dian, IQR) typically use 3 number summary (quartiles) or 5 number summary (min, max, quartiles). Boxplot is graph of 5 number summary. Advantages of Mean Absolute Deviation. Seems intuitive. Less influenced by extremes than Standard De- viation. But: poor mathematical properties. We mostly use Standard Deviation.

28

slide-29
SLIDE 29

Why the Standard Deviation? Usual explanation: squares nicer mathemati- cally than absolute values. Real explanation (WARNING: personal view): ONLY the SD works in normal approximations for sums. Normal approximations? A common summary for curves. Rule of thumb: in many lists of data about 2/3 of the observations are within 1 SD of the mean, about 95% within 2 SDs of the mean and almost all within 3 SDs of the mean. NEXT TOPIC: the normal curve. (bell curve, Gaussian)

29

slide-30
SLIDE 30

The Normal Curve Where does the rule of thumb come from? Making Normal approximation. Draw smooth curve on top of histogram. One curve for each mean and SD. Formula: f(x) = 1 √ 2πσ exp{−(x − µ)2/(2σ2)} Use of formula: NONE in this course. Instead use tables or computers to compute areas un- der curve (Integrals!)

30

slide-31
SLIDE 31

−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4

N(0,1)

x density −30 −10 10 20 30 0.00 0.01 0.02 0.03 0.04

N(0,100)

x density 7 8 9 10 11 12 13 0.0 0.1 0.2 0.3 0.4

N(10,1)

x density −20 10 20 30 40 0.00 0.01 0.02 0.03 0.04

N(10,100)

x density

31

slide-32
SLIDE 32

Notice: all curves look the same when axes drawn to corresponding heights! Now superimpose normal curve over data set.

Father’s Height (Inches) 60 65 70 75 0.00 0.04 0.08 0.12

32

slide-33
SLIDE 33

Notice general shapes similar. Use: proportion of fathers with height in given range is AREA under histogram in range. Approximate this area by area under normal curve. Total area under histogram is 1 if units on ver- tical axis chosen as “density” ( proportion per x unit). Total area under normal curve is 1. (Fact from 2nd year calculus.)

33

slide-34
SLIDE 34

Making a normal approximation: 1) Sketch curve. 2) Label x axis and mark desired range. 3) Convert range to standard units: subtract mean from x values and divide by SD. 4) Look up area under standard normal curve using these standardized limits. See Table A in text.

34

slide-35
SLIDE 35

Example: Proportion of father’s under 5 feet 10 inches = 70 inches. SKETCH: Desired range: area under curve left of 70 inches. Convert 70 to standard units: 70 − ¯ x s = 70 − 67.69 2.74 = 0.84 Look up area to left of 0.84 under normal curve.

35

slide-36
SLIDE 36

Get approximately 0.7995 ≈ 80% of fathers under 5 foot 10. This is 80% of 1078 or 862 fathers. Actual number is 856 fathers or 79.4% Extract of Table A: z · · · .04 .05 · · ·

  • 1.4

· · · .0003 .0003 · · · . . . · · · . . . . . . · · · 0.8 · · · .7995 .8023 · · · . . . · · · . . . . . . · · · The numbers in the center are areas under the normal curve to the left of z. Value of z is number in column under z followed by last digit in top row. For example: area to left of -1.45 is under .05 and across from -1.4 (so is .0003).

36

slide-37
SLIDE 37

Some areas under the normal curve from the tables: Left of 0 50% Right of 0 50% Between -1 and 1 68.3% ≈ 2/3 Between -2 and 2 95.4% ≈ 95% Between -3 and 3 99.7% Between -4 and 4 99.994% Between -6 and 6 1-1.97×10−9 Notice source of rule of thumb.

37

slide-38
SLIDE 38

Finding areas: Tables show areas to left of standard value: Get other areas by subtracting: Area to left of 2 is 97.72% Area to left of 0 is 50.00% So: area from 0 to 2 is difference: 47.72%

38

slide-39
SLIDE 39

One more example: fathers between 5 foot 2 and 5 foot 10 Convert 62 inches and 70 inches to standard units. 62 − ¯ x s = −2.07 70 − ¯ x s = 0.84 Area to left of 0.84 is 0.7995. Area to left of -2.07 is 0.0192. Subtract to get 0.7803≈78% Exact answer is 77.6%.

39

slide-40
SLIDE 40

Normal approximation applied to income: poor. Proportion of adults earning under $30,000? Mean income is $29,250 approximately. SD of income is $23,600 approximately. Convert $30,000 to standard units: 30000 − 29250 23600 = 0.03 Area to left of 0.03 is 51% Correct percentage is 59%. Income distribution is “skewed to the right”. It has a ‘long right hand tail’.

40

slide-41
SLIDE 41

Incomes with normal curve on top:

Income in $,000s 20 40 60 80 100 0.00 0.01 0.02 0.03

Notice that normal curve extends below 0. Normal approximation predicts many negative incomes!

41

slide-42
SLIDE 42

Reversing the process. What is the IQR of fathers heights? First quartile of standard normal curve: -0.67 Third quartile is 0.67. Convert back to orginal units: multiply stan- dard units by s and add ¯ x. So: -0.67 Standard units is −0.67 ∗ 2.74 + 67.69 = 65.85 and 0.67 Standard units is 0.67 ∗ 2.74 + 67.69 = 69.53 So IQR is approximately 69.53-65.85=3.68. Ac- tual value 3.81.

42

slide-43
SLIDE 43

Summary: Standard units: z = x − ¯ x s Look up areas by converting limits to standard units and using standard normal curve. From standard units z back to original x values: x = s ∗ z + ¯ x

43

slide-44
SLIDE 44

Most statistics concerns relationships between

  • variables. How do they vary together?

Consider two variables at a time: bivariate problem. Two Nominal or two ordinal variables: usually presented in cross tabulation or contingency table: Male Female Left Handed 448 393 Right Handed 3581 3929 Both 40 44 Usually converted to percentages within groups to compare the groups: Male Female Left Handed 11 9 Right Handed 88 90 Both 1 1 Warning: data based on survey in UK 1992.

44

slide-45
SLIDE 45

One categorical and one quantitative variable: superimposed (or back to back) histograms.

STAT 201 Class Heights

Height (Inches) Density Proportion per inch 60 65 70 75 0.00 0.05 0.10 0.15 0.20

45

slide-46
SLIDE 46

Two quantitative variates: scatterplot

Height

100 160 4000 8000 10 40 70 100 160

Weight Sex

1.0 1.6 4000

Coins

10 30 50 70 1.0 1.4 1.8 20 40 60 30 60

Credits

46

slide-47
SLIDE 47

Comments on the plots: 1) Notice outlier: someone 6 inches tall. 2) Sex treated as 0 1 variable but plots of doubtful utility. Cleaned up data:

Height

100 160 4000 8000 60 70 100 160

Weight Sex

1.0 1.6 4000

Coins

60 65 70 75 1.0 1.4 1.8 20 40 60 30 60

Credits

47

slide-48
SLIDE 48

Comments on cleaned up plots: 1) Height and weight increase together. 2) No relation between either coins or credits and height or weight. 3) relation between sex and height or weight visible but not easy to see. Note for clarity. Idea: for each case in data set have two variables x and y. For each case put one dot on scatterplot at (x, y). To add my height, weight to plot put dot at (71,195).

48

slide-49
SLIDE 49

Another example: 2000 US presidential elec- tion: Plot: one dot for each county. (x, y) where x is Bush vote, y is Buchanan.

50000 100000 150000 200000 250000 300000 500 1000 1500 2000 2500 3000 3500 BUSH BUCHANAN

49

slide-50
SLIDE 50

Notice outlier dots: Dade County (large but in line with others) and Palm Beach (very sur- prising number of votes for Buchanan). Conclusion: something unusual happened in Palm Beach: the butterfly ballot. But: look at all candidates:

GORE

10000 400 500 2500 0e+00

BUSH BUCHANAN

3000 10000

NADER BROWN

3000 400

HAGELIN HARRIS

10000 500

MCREYNOLDS MOOREHEAD

150 2500

PHILLIPS

0e+00 3000 3000 10000 150 0e+00 0e+00

Total

50

slide-51
SLIDE 51

Note many outliers. Unless explained weaken case for claim butterfly ballot at fault. Next example: my old car (1980 MAZDA GLC). Plot distance driven x against Gas used y.

200 250 300 350 400 450 500 15 20 25 30 35 Distance Driven (km) Gas Used (L)

Notice outliers, not as linear as expected.

51

slide-52
SLIDE 52

Summarizing bivariate plots. Plot variable y against variable x: Summarize y values by mean and standard de- viation: ¯ y and sy. Summarize x values by mean and standard de- viation: ¯ x and sx. Summarize relationship using correlation co- efficient. Notation: r. Formula: r = 1 n − 1

n

  • i=1

xi − ¯

x sx yi − ¯ y sy

  • 52
slide-53
SLIDE 53

Comments: 1) If we had used n to define sx and sy we could have used n in r. 2) So r is average of products of x and y values after converting to standard units 3) r is unitless. Value is same no matter what units x or y measured in. 4) For every data set −1 ≤ r ≤ 1 5) For r = 1 the data set must lie exactly on a straight line sloping up. 6) For r = −1 the data set must lie exactly on a straight line sloping down. 7) This summary is appropriate for data which are approximately bivariate normal. Meaning: scatterplot is roughly oval in shape and his- tograms of x and y each follow normal curve well.

53

slide-54
SLIDE 54

Examples: 1) Identical twins, head circumferences

53 54 55 56 57 58 59 53 54 55 56 57 58

r=0.88

Twin 1 Head Circumference (cm) Twin 2 Head Circumference (cm)

54

slide-55
SLIDE 55

2) Heights and Weights in STAT 201

60 65 70 75 100 120 140 160 180 200

r=0.73

Height (Inches) Weight (lbs)

55

slide-56
SLIDE 56

3) Father’s Height and Son’s height.

60 65 70 75 60 65 70 75

r=0.5

Father’s Height (Inches) Son’s Height (Inches)

56

slide-57
SLIDE 57

4) Height and Credit hours: STAT 201.

60 65 70 75 10 20 30 40 50 60 70

r=0.09

Height (Inches) Credit Hours

57

slide-58
SLIDE 58

5) Distance Driven and fuel efficiency (L/100km)

200 250 300 350 400 450 500 6.0 6.5 7.0 7.5 8.0 8.5

r=−0.48

Distance Driven Litres per 100 km

58

slide-59
SLIDE 59

Some simulated data: negative r.

r=−0.88 r=−0.45 r=−0.16 r=0.02

59

slide-60
SLIDE 60

Some simulated data: positive r.

r=0.11 r=0.48 r=0.77 r=0.99

60

slide-61
SLIDE 61

Fact: r does not depend on which variable you put on x axis and which on the y axis. Often: interest centers on whether or not changes in x cause changes in y or on predicting y from x. In this case: call y the dependent or outcome

  • r response or endogenous variable. Response

in this course. Call x explanatory, or exogenous or indepen- dent or predictor. Example: predict son’s height from father’s height. Suppose father is 70 inches tall. How to guess height of son? Simple method: use average height of those sons whose fathers were 70 inches tall.

61

slide-62
SLIDE 62

Pick out cases where father is between 69.5 and 70.5 inches tall. There were 115 such fa- thers. Average son’s height in this group: 69.8 in SD son’s height in this group: 2.5in

Sons of 70 inch fathers

Son’s Height (In) Frequency 60 65 70 75 80 5 10 20 30

62

slide-63
SLIDE 63

Now do same for fathers 59 inches tall, then 60 inches tall and so on. Get a ¯ y for each x from 59 to 75.

60 65 70 75 60 65 70 75 Father’s Height Son’s Height X X X X X X X X X X X X X X X X X

63

slide-64
SLIDE 64

Notice the line: it is called a regression or least-squares line. Formula for the least squares line? y = a + bx Where: a = ¯ y − b¯ x and b = rsy sx I prefer to write: y − ¯ y sy = rx − ¯ x sx In words: predict y in standard units to be x in standard units times correlation coefficient.

64

slide-65
SLIDE 65

Jargon: a is the intercept. b is the slope also called regression coefficient. “Least squares” because we find formulas for a and b by using calculus to minimize the Error Sum of Squares:

n

  • i=1

(yi − (a + bxi))2 Sum of vertical squared deviations between (xi, yi) and the straight line with slope b and intercept a. For the height data Fathers: ¯ x = 67.7, sx = 2.74 (inches) Sons: ¯ y = 68.7, sy = 2.81 (inches) Correlation: r = 0.50.

65

slide-66
SLIDE 66

Average weight versus height for our class:

60 65 70 75 100 120 140 160 180 200 Height Weight X X X X X X X X

“Fit not as good”. Scatterplot not too oval; mixing sexes in same plot.

66

slide-67
SLIDE 67

Numerical values: Height: ¯ H = 66.8, sH = 3.75 (inches). Weight: ¯ W = 140, sW = 25.3 (pounds). Correlation: r = 0.7. Regression line: Slope: b = 0.73 × 25.3/3.75 = 4.93 (pounds per inch) Intercept: a = 140−4.93×66.8 = −189 (pounds) Meaning of intercept: NONE whatever. Not to be understood as weight of person 0 inches tall. DO NOT USE regression line outside of range

  • f x values!

DO NOT EXTRAPOLATE.

67