Advanced Statistics Janette Walde janette.walde@uibk.ac.at - - PowerPoint PPT Presentation

advanced statistics
SMART_READER_LITE
LIVE PREVIEW

Advanced Statistics Janette Walde janette.walde@uibk.ac.at - - PowerPoint PPT Presentation

Advanced Statistics Janette Walde janette.walde@uibk.ac.at Department of Statistics University of Innsbruck Janette Walde Advanced Statistics Introduction We are pattern-seeking story-telling animals. (Edward Leamer) Statistics does


slide-1
SLIDE 1

Advanced Statistics

Janette Walde

janette.walde@uibk.ac.at Department of Statistics University of Innsbruck

Janette Walde Advanced Statistics

slide-2
SLIDE 2

Introduction

“We are pattern-seeking story-telling animals.” (Edward Leamer) Statistics does not hand truth to the user on a silver

  • platter. However, statistics confines arbitrariness

and provides comprehensible conclusions. “Es gibt keine Tatsachen, es gibt nur Interpretationen.” (Friedrich Nietzsche)

Janette Walde Advanced Statistics

slide-3
SLIDE 3

Aims of the course

1

You will learn to apply statistical tools correctly, interpret the findings appropriately and get an idea about the possibilities of analyzing research questions employing statistics.

2

It is not possible and not worthwhile to learn all statistical methods in such a course. However, this course is successful if it enables you to improve your knowledge in statistical methods

  • n your own. Therefore this course gives you

profound knowledge base about analyzing tools and shows you the correct application using regression analysis as example.

Janette Walde Advanced Statistics

slide-4
SLIDE 4

Aims of the course

3

Although knowing the most sophisticated analyzing instruments one may be confronted with limits in getting results or finding appropriate interpretations or applying tools in the given framework. This has to be accepted (“If we torture the data long enough, they will confess.”).

4

Be aware: Never confuse statistical significance with biological significance.

Janette Walde Advanced Statistics

slide-5
SLIDE 5

Scales of measurement

1

Nominal Scale. Nominal data are attributes like sex or species, and represent measurement at its weakest level. We can determine if one

  • bject is different from another, and the only

formal property of nominal scale data is equivalence.

2

Ranking Scale. Some biological variables cannot be measured on a numerical scale, but individuals can be ranked in relation to one

  • another. Two formal properties occur in

ranking data: equivalence and greater than.

Janette Walde Advanced Statistics

slide-6
SLIDE 6

Scales of measurement

3

Interval and Ratio Scales. Interval and ratio scales have all the characteristics of the ranking scale, but we know the distances between the

  • classes. If we have a true zero point, we have a

ratio scale of measurement.

Janette Walde Advanced Statistics

slide-7
SLIDE 7

Statistical tests and scientific hypotheses

A statistical test is a confrontation of the real world (observations) to a theory (model) with the aim of falsifying the model.

Janette Walde Advanced Statistics

slide-8
SLIDE 8

Statistical tests and scientific hypotheses

As such the statistical test (as a scientific method) fits directly into the philosophy of science described by the English philosopher Karl Popper (1902–1994) (see e.g. The Logic of Scientific Discovery, 1972). Basically the philosophy says that 1) theories can not be empirically verified but only falsified and 2) scientific progress happens by having a theory until it is falsified. That is, if we observe a phenomenon (data) which under the model (theory) is very unlikely, then we reject the model (theory).

Janette Walde Advanced Statistics

slide-9
SLIDE 9

Statistical tests and scientific hypotheses

”No amount of experimentation can ever prove me right; a single experiment can prove me wrong.” (Albert Einstein)

In other words, experiments can mainly be used for falsifying a scientific hypothesis – never for confirming it! When we have a scientific theory, we conduct an experiment in order to falsify it. Therefore, the strong conclusion arising from an experiment is when a hypothesis is rejected. Accepting (more precisely – not rejecting) a hypothesis is not a very strong conclusion (maybe acceptance is simply due to that the experiment is too small).

Janette Walde Advanced Statistics

slide-10
SLIDE 10

Example

Suppose we have a coin, and that our hypothesis is that the coin is fair, i.e. that P(head) = P(tail) = 1/2. Suppose we toss a coin n = 25 times and

  • bserve 21 heads. The probability of actually
  • bserving these data under the model is P(21 heads,

4 tails) = 0.0004. It is a very unlikely (but possible) event to see such data if the model is true. In this falsification process we employ the interpretation principle of statistics: Unlikely events do not occur...

Janette Walde Advanced Statistics

slide-11
SLIDE 11

Statistical tests and scientific hypotheses

If we do not employ this principle we can never say anything at all on the basis of statistics (observations): An opponent can always claim that the present observations just are “an unfortunate

  • utcome” which - no matter how unlikely they are -

are possible. In practice the statistical interpretation principle needs more structure: In a large sample space, all possible outcomes will have a very small probability, so it will be unlikely to have the data one has. In addition there is also the question about how small a probability is needed in order to classify data as being unlikely.

Janette Walde Advanced Statistics

slide-12
SLIDE 12

Two Types of Errors

Recall that the following four outcomes are possible when conducting a test: Reality Our Decision H0 Ha H0 √ Type I Error (Prob = 1 − α) Prob = α Ha Type II Error √ Prob = β (Prob = 1 − β) The significance level α of any fixed level test is the probability of a Type I error.

Janette Walde Advanced Statistics

slide-13
SLIDE 13

Acceptable levels of errors

Type I error (α)

Typically α = 0.05 (This convention is due to R.A. Fisher) More stringent test α = 0.01 or α = 0.001 Exploratory or preliminary experiments α = 0.10

Type II error (β)

Typically 0.20 Often unspecified and much less than 0.20

Janette Walde Advanced Statistics

slide-14
SLIDE 14

The power of a statistical test

The power of a significance test measures its ability to detect an alternative hypothesis. The power against a specific alternative is calculated as the probability that the test will reject H0 when that specific alternative is true. Statistical power is the probability of rejecting H0 given population effect size (ES), α and sample size (n). This calculation also requires knowledge of the sampling distribution of the test statistic under the alternative hypothesis.

Janette Walde Advanced Statistics

slide-15
SLIDE 15

The power of a statistical test

Statistical power = (1 − β) In practice, we first choose an α and consider

  • nly tests with probability of Type I error no

greater than α. Among all levels α, we select one that makes the probability of Type II error as small as possible (i.e. the most powerful possible test).

Janette Walde Advanced Statistics

slide-16
SLIDE 16

Example: Computing statistical power

Does exercise make strong bones? Can a 6-month exercise program increase the total body bone mineral content (TBBMC) of young women? A team of researchers is planning a study to examine this question. Based on the results of a previous study, they are willing to assume that σ = 2 for the percent change in TBBMC over the 6-month period. A change in TBBMC of 1% would be considered important, and the researcher would like to have a reasonable chance of detecting a change this large or larger. Are 25 subjects a large enough sample for this project?

Janette Walde Advanced Statistics

slide-17
SLIDE 17

Example (cont.)

1

State the hypotheses: let µ denote the mean percent change: H0 : µ = 0 Ha : µ > 0

2

Calculate the rejection region: The z test rejects H0 at the α = 0.05 level whenever: z = ¯ x − µ0 σ/√n = ¯ x 2/ √ 25 ≥ 1.645 That is we reject H0 when ¯ x ≥ 0.658.

Janette Walde Advanced Statistics

slide-18
SLIDE 18

Example (cont.)

3

Compute the power at a specific alternative: The power of the test at alternative µ = 1 is P(¯ x ≥ 0.658|µ = 1) = 0.8 Show graph! Comment: Power curve.

Janette Walde Advanced Statistics

slide-19
SLIDE 19

Ways to Increase the Power

Increase α. A 5% test of significance will have a greater chance of rejecting the alternative than a 1% test because the strength of evidence required for rejection is less. Consider a particular alternative that is farther away from µ0. Values of µ that are in Ha but lie close to the hypothesized value µ0 are harder to detect than values of µ that are far from µ0. Increase the sample size. More data will provide more information about ¯ x so we have a better chance of distinguishing values of µ.

Janette Walde Advanced Statistics

slide-20
SLIDE 20

Ways to Increase the Power

Decrease σ. This has the same effect as increasing the sample size: it provides more information about µ. Improving the measurement process and restricting attention to a subpopulation are two common ways to decrease σ.

Janette Walde Advanced Statistics

slide-21
SLIDE 21

How many samples are needed to achieve a power of 0.8 in a t-test?

Effect size index for the t-test for a difference between two independent means. d = µ1−µ2

σ

where d is the effect size index, µ1 and µ2 are means, σ is the common standard deviation of the means. Effect size indices are available for many statistical tests.

Janette Walde Advanced Statistics

slide-22
SLIDE 22

How many samples are needed to achieve a power of 0.8 in a t-test?

Effect Size α = 0.10 α = 0.05 α = 0.01 Large effect 20 26 38 (d = 0.8) Medium effect 50 64 95 (d = 0.5) Small effect 310 393 586 (d = 0.2)

Source: Cohen (1992), p. 158.

Janette Walde Advanced Statistics

slide-23
SLIDE 23

Is lack of statistical power a widespread problem?

”We estimated the statistical power of the first and last statistical test presented in 697 papers from 10 behavioral journals ... On average statistical power was 13-16% to detect a small effect and 40-47% to detect a medium effect. This is far lower than the general recommendation of a power of 80%. By this criterion, only 2-3%, 13-21%, and 37-50% of the tests examined had a requisite power to detect a small, medium, or large effect, respectively.”

Jennions, M.D., and A.P. Moeller 2003. Behavioral Ecology 14, 438-455.

Janette Walde Advanced Statistics

slide-24
SLIDE 24

An alternative viewpoint

”... the central focus of good data analysis should be to find which parameter values are supported by the data and which are not.” Hoenig, J.M., and D.M. Heisey 2001. American Statistician 55: 19-24. Recommendation Use estimates of statistical power as a guide to planning experiments.

Janette Walde Advanced Statistics

slide-25
SLIDE 25

Further readings

Cohen, J. 1992. A power primer. Psychological Bulletin 112: 155-159. Jennions, M.D., and A.P. Moeller 2003. A survey of the statistical power of research in behavioral ecology and animal behavior. Behavioral Ecology 14: 438-455. Hoenig, J.M., and D.M. Heisey 2001. The abuse of power: the pervasive fallacy of power calculations for data analysis. American Statistician 55: 19-24.

Janette Walde Advanced Statistics

slide-26
SLIDE 26

Recapitulation of statistical concepts

Histogram Scatter Plot Box Plot Summary Statistic Standard Deviation and Standard Error Q-Q Plot

Janette Walde Advanced Statistics

slide-27
SLIDE 27

Q-Q Plot

Many statistical methods make some assumptions about the distribution of the data (e.g. normal). The quantile-quantile plot provides a way to visually investigate such an assumption. The QQ-plot shows the theoretical quantiles versus the empirical quantiles. If the distribution assumed (theoretical one) is indeed the correct one, we should observe a straight line.

Janette Walde Advanced Statistics

slide-28
SLIDE 28

Q-Q Plot

−2 −1 1 2 −2 −1 1 2 Normal Q−Q Plot Theoretical Quantiles Sample Quantiles −2 −1 1 2 10 20 30 40 50 Normal Q−Q Plot Theoretical Quantiles Sample Quantiles −4 −2 2 4 0.0 0.1 0.2 0.3 0.4 x density

Janette Walde Advanced Statistics

slide-29
SLIDE 29

Why multivariate analysis?

Male Female Accept 35 20 Refuse entry 45 40 Total 80 60 Example: 44% of male applicants are admitted by a university, but only 33% of female applicants. Does this mean there is unfair discrimination? University investigates and breaks down figures for Engineering and English programmes.

Janette Walde Advanced Statistics

slide-30
SLIDE 30

Simpson’s Paradox

Engineering Male Female Accept 30 10 Refuse entry 30 10 Total 60 20 English Male Female Accept 5 10 Refuse entry 15 30 Total 20 40

No relationship between sex and acceptance for either programme. So no evidence of

  • discrimination. Why?

More females apply for the English programme, but it is hard to get into. More males applied to Engineering, which has a higher acceptance rate than English. Must look deeper than single cross-tab to find this out!

Janette Walde Advanced Statistics

slide-31
SLIDE 31

Origin of the word ”Regression”

Sir Francis Galton (1822–1911), a famous geneticist, who studied the sizes of seeds and their

  • ffspring and the heights of fathers and their sons.

Tall fathers tend to have sons that are slightly smaller than the fathers. Sons of small fathers are

  • n average larger than their fathers. He called this

phenomenon ”regression towards mediocrity”.

Janette Walde Advanced Statistics

slide-32
SLIDE 32

A scatterplot of the heights of 1078 sons versus the heights of their fathers.

58 60 62 64 66 68 70 72 74 76 78 80 58 60 62 64 66 68 70 72 74 76 78 80 Father’s height (in) Son’s height (in)

Janette Walde Advanced Statistics

slide-33
SLIDE 33

The regression effect must be taken into account in test-retest situations. Suppose for example that a group of preschool children are given an IQ test at age four and another test at age five. The results of the tests will certainly be correlated, and according to above, children who do poorly on the first test will tend to score higher on the second test. If, on the basis of the first test, low-scoring children were selected for supplementary educational assistance, their gains might be mistakenly attributed to the program. A comparable control group is needed in this situation to tighten up the experimental design.

Janette Walde Advanced Statistics

slide-34
SLIDE 34

Linear regression problem

Using a Case Study published in ”Analysing Ecological Data (Statistics for Biology and Health)” by A.F. Zuur, E.N. Ieno, G.M. Smith, Springer 2007, New York. The Dutch governmental institute RIKZ started a research project whose aim was to find relationships between macrofauna of the intertidal area and abiotic variables (e.g., sediment composition, slope

  • f the beach). Sampling was carried out in June
  • 2002. Abundances of around 75 invertebrate species

from 45 sites were measured on various beaches along the Dutch coast.

Janette Walde Advanced Statistics

slide-35
SLIDE 35

Linear regression problem, II

One of the collected variables was ”NAP” which measured the height of the sample site compared with average sea level, and indicated the time a site is under water. The species data was converted into a diversity index: Shannon-Weaver index. For these data the Shannon-Weaver index can also be seen as an indicator for the number of different species.

Janette Walde Advanced Statistics

slide-36
SLIDE 36

Linear regression problem, III

Hypothesis: The tidal environment creates a harsh environment for the animals living there, and it is reasonable to assume that different species and species abundances will be found in beaches with different NAP values. A simple starting point is therefore to compare species diversity (species richness) with the NAP values from different areas of the beach.

Janette Walde Advanced Statistics

slide-37
SLIDE 37

Scatterplot of the data

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0 5 10 15 20 NAP Richness

Janette Walde Advanced Statistics

slide-38
SLIDE 38

Regression line for the data

Regression line: yi = β0 + β1xi + εi

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0 5 10 15 20 NAP Richness

Janette Walde Advanced Statistics

slide-39
SLIDE 39

Decomposition of y

Linear functional form:

yi = β0 + β1xi + εi

Janette Walde Advanced Statistics

slide-40
SLIDE 40

Decomposition of y

Linear functional form:

Decomposition in a systematic component, explained by variable x

yi = β0 + β1xi + εi

Janette Walde Advanced Statistics

slide-41
SLIDE 41

Decomposition of y

Linear functional form:

Decomposition in a systematic component, explained by variable x and an unexplained component ε.

yi = β0 + β1xi + εi

Janette Walde Advanced Statistics

slide-42
SLIDE 42

Decomposition of y

Linear functional form:

Decomposition in a systematic component, explained by variable x and an unexplained component ε.

yi = β0 + β1xi + εi

The derivation β1 = dy dx gives the marginal impact of a change in x.

Janette Walde Advanced Statistics

slide-43
SLIDE 43

Decomposition of y

1 2 3 4 5 6 1 2 3 4 5 6 7 8 y x

b b b b b

Janette Walde Advanced Statistics

slide-44
SLIDE 44

Decomposition of y

1 2 3 4 5 6 1 2 3 4 5 6 7 8 y x

b b b b b

y1 y2 y3 y4 y5

Janette Walde Advanced Statistics

slide-45
SLIDE 45

Decomposition of y

1 2 3 4 5 6 1 2 3 4 5 6 7 8 y x

b b b b b

y1 y2 y3 y4 y5

bc bc bc bc bc

  • yi = 1.5 + 0.4xi

Janette Walde Advanced Statistics

slide-46
SLIDE 46

Decomposition of y

1 2 3 4 5 6 1 2 3 4 5 6 7 8 y x

b b b b b

y1 y2 y3 y4 y5

bc bc bc bc bc

  • yi = 1.5 + 0.4xi
  • y1
  • y2
  • y3
  • y4
  • y5
  • yi = 1.5 + 0.4xi

Janette Walde Advanced Statistics

slide-47
SLIDE 47

Decomposition of y

1 2 3 4 5 6 1 2 3 4 5 6 7 8 y x

b b b b b

y1 y2 y3 y4 y5

bc bc bc bc bc

  • yi = 1.5 + 0.4xi
  • y1
  • y2
  • y3
  • y4
  • y5
  • yi = 1.5 + 0.4xi

e1 e2 e3 e4 e5

yi = yi + ei

Janette Walde Advanced Statistics

slide-48
SLIDE 48

Computation of β0 and β1

1 2 3 4 5 6 1 2 3 4 5 6 7 8 y x

b b b b b

y1 y2 y3 y4 y5 yi = β0 + β1xi + εi

Janette Walde Advanced Statistics

slide-49
SLIDE 49

Computation of β0 and β1

1 2 3 4 5 6 1 2 3 4 5 6 7 8 y x

b b b b b

y1 y2 y3 y4 y5 yi = β0 + β1xi + εi

bc bc bc bc bc

  • yi = ˆ

β0 + ˆ β1xi

  • y1
  • y2
  • y3
  • y4
  • y5

Janette Walde Advanced Statistics

slide-50
SLIDE 50

Computation of β0 and β1

1 2 3 4 5 6 1 2 3 4 5 6 7 8 y x

b b b b b

y1 y2 y3 y4 y5

bc bc bc bc bc

yi = β0 + β1xi + εi

  • yi = ˆ

β0 + ˆ β1xi

  • y1
  • y2
  • y3
  • y4
  • y5

ei = yi − yi

Janette Walde Advanced Statistics

slide-51
SLIDE 51

Computation of β0 and β1

1 2 3 4 5 6 1 2 3 4 5 6 7 8 y x

b b b b b

y1 y2 y3 y4 y5

bc bc bc bc bc

yi = β0 + β1xi + εi

  • yi = ˆ

β0 + ˆ β1xi

  • y1
  • y2
  • y3
  • y4
  • y5

ei = yi − yi

e2

1

e2

2

e2

3

e2

4 e2

5

min

i e2 i = i(yi −

yi)2

Janette Walde Advanced Statistics

slide-52
SLIDE 52

These estimators, called β0 and β1, based on the sample data then act as estimators for their equivalent population parameter, β0 and β1, respectively. The four assumptions that allow the sample data to be used to estimate the population parameter are: yi = β0 + β1xi + εi εi ∼ N(0, σ2)

1

normality

2

homogeneity

3

independence and

4

fixed X.

Janette Walde Advanced Statistics

slide-53
SLIDE 53

lm(Richness ∼ NAP, data = RIKZ) Df Sum Sq Mean Sq F value Pr(>F) NAP 1 357.53 357.53 20.66 0.0000 Residuals 43 744.12 17.31 Estimate

  • Std. Error

t value Pr(>|t|) (Intercept) 6.6857 0.6578 10.16 0.0000 NAP

  • 2.8669

0.6307

  • 4.55

0.0000

Residual standard error: 4.16 on 43 degrees of freedom Multiple R-squared: 0.3245, Adjusted R-squared: 0.3088

Janette Walde Advanced Statistics

slide-54
SLIDE 54

Coefficient of determination

R2 = SSregression

SStotal

= 1 − SSresidual

SStotal

Note: Do not compare models with different data transformations using R2! A model with more explanatory variables will always have a higher R2.

Janette Walde Advanced Statistics

slide-55
SLIDE 55

Anscombe data

5 10 15 4 6 8 10 12 x1 y1 5 10 15 4 6 8 10 12 x2 y2 5 10 15 4 6 8 10 12 x3 y3 5 10 15 4 6 8 10 12 x4 y4

Anscombe’s 4 regression data sets

Same intercept, slope and confidence bands! F−statistics and t−values indicate that the regression parameter is significantly different from zero, and all four R2 = 0.67.

Reference: Anscombe, Francis J. (1973) Graphs in statistical analysis. American Statistician, 27, 17–21. Janette Walde Advanced Statistics

slide-56
SLIDE 56

Assessing the important assumptions

Normality assumption can be checked using a histogram of the residuals or a Q-Q plot Homogeneity can be assessed by plotting the residuals against X to check for any increases (decreases)in the spread of residuals along the x−axis. This procedure can also assess model misspecification and model fit. Plotting fitted values against residuals can show increases in the spread for larger fitted values: a strong indicator for heterogeneity.

Janette Walde Advanced Statistics

slide-57
SLIDE 57

Assessing the important assumptions

As to independence, for time series data residuals can be checked by the auto-correlation function. For spatial correlation in the data residuals can be checked by Moran I.

Janette Walde Advanced Statistics

slide-58
SLIDE 58

Non-random structure of the residuals

1

Apply a transformation.

2

Add other explanatory variables.

3

Add interactions.

4

Add non-linear terms of the explanatory variables (e.g., quadratic terms).

5

Use smoothing techniques like additive modelling.

6

Allow for different spread using generalized least squares (GLS).

7

Apply mixed modelling.

Janette Walde Advanced Statistics

slide-59
SLIDE 59

Allow for heterogeneity using variance structures

εi ∼ N(0, Xi · σ2) εi ∼ N(0, |Xi|2δ · σ2) εi ∼ N(0, σ2 · exp(2δXi)) εi ∼ N(0, σ2

j )

where δ is an unknown parameter. The first 3 options allow for an increase (decrease) in residual variance depending on the values of the variance covariate Xi. The 4th option allows for different spread per level of a nominal variable.

Janette Walde Advanced Statistics

slide-60
SLIDE 60

Back to our example: RIKZ

2 4 6 8 10 −5 5 10 15 Fitted values Residuals lm(Richness ~ NAP) Residuals vs Fitted 22 9 10 −2 −1 1 2 −1 1 2 3 Theoretical Quantiles Standardized residuals lm(Richness ~ NAP) Normal Q−Q 22 9 10

Janette Walde Advanced Statistics

slide-61
SLIDE 61

Influential points

Leverage is a tool that identifies observations that have rather extreme values for the explanatory variables and may potentially bias the regression. leveragei = hi = 1

n + (xi−¯ x)2

  • j(xj−¯

x)2

Cook’s distance statistic identifies single observations that are influential on all regression parameters Di =

  • j( ˆ

Yj− ˆ Yj(i)) p·MSE

p... number of fitted parameters in the model MSE... mean square error of the regression model Jacknife procedure and plot changes in the parameters.

Janette Walde Advanced Statistics

slide-62
SLIDE 62

Leverage and Cook’s distance statistic

10 20 30 40 0.02 0.04 0.06 0.08 0.10 Sample index Hat values

10 20 30 40 0.00 0.05 0.10 0.15 0.20 0.25

  • Obs. number

Cook’s distance lm(Richness ~ NAP) Cook’s distance 22 10 9

Janette Walde Advanced Statistics

slide-63
SLIDE 63

Jacknife procedure

Changes in intercept and slope

10 20 30 40 −0.1 0.1 0.3 Index of omitted observation Intercept 10 20 30 40 −0.2 0.0 Index of omitted observation NAP slope

Janette Walde Advanced Statistics

slide-64
SLIDE 64

Standardized residuals

The standardized residuals are defined as

(yi−ˆ yi)

MSresidual(1−hi)

Standardized residuals are assumed to be normally distributed with expectation 0 and standard deviation 1: N(0, 1). Taking the square root of the absolute values reduces the skewness and makes non-constant variance more noticeable.

Janette Walde Advanced Statistics

slide-65
SLIDE 65

Studentized residuals

To obtain the ith Studentised residual the regression model is applied on all data except for observation i, and the MSresidual is based on the n − 1 points (but not residual ei or hat value hi. These are based on the full data set.) If the ith Studentised residual is much larger than a standardised residual, then this is an influential observation because the variance without this point is smaller.

Janette Walde Advanced Statistics

slide-66
SLIDE 66

Plot of standardized and studentized residuals

2 4 6 8 10 −1 1 2 3

Standardized residuals

Fitted values Residuals 2 4 6 8 10 −1 1 2 3 4

Studentised residuals

Fitted values Residuals

Janette Walde Advanced Statistics

slide-67
SLIDE 67

Example: body weight [kg] and brain weight [g] of 62 mammals species (and 3 dinosaurs)

20000 40000 60000 80000 1000 2000 3000 4000 5000 6000 7000

64 mammals including 3 dinosaurs

body weight [kg] brain weight [g]

Janette Walde Advanced Statistics

slide-68
SLIDE 68

Example (cont.)

1e−02 1e+00 1e+02 1e+04 1e−01 1e+01 1e+03

64 mammals including 3 dinosaurs

body weight [kg] brain weight [g]

Lesser Short−tailed Shrew Little Brown Bat Mouse Big Brown Bat Musk Shrew Star Nosed Mole Eastern American Mole Ground Squirrel Tree Shrew Golden Hamster Mole Rate Galago Rat Chinchilla Desert Hedgehog Rock Hyrax (a) European Hedgehog Tenrec Arctic Ground Squirrel African Giant Pouched Rat Guinea Pig Mountain Beaver Slow Loris Genet Phalanger North American Opossum Tree Hyrax Rabbit Echidna Cat Artic Fox Nine−banded Armadillo Water Opossum Rock Hyrax (b) Red Fox Raccoon Roe Deer Goat Kangaroo Gray Wolf Sheep Giant Armadillo Gray Seal Jaguar Brazilian Tapir Donkey Pig Protoceratops Okapi Camptosaurus Cow Horse Giraffe Stegosaurus Allosaurus Asian Elephant Anatosaurus Iguanodon African Elephant Tyrannosaurus Triceratops Diplodocus Blue Whale Br

1e−02 1e+00 1e+02 1e+04 1e−01 1e+01 1e+03

64 mammals including 3 dinosaurs

body weight [kg] brain weight [g]

Janette Walde Advanced Statistics

slide-69
SLIDE 69

Example (cont.)

Call: lm(formula = brain weight ∼ body weight, data = data, subset = extinct ==”no”) Residuals: Min 1Q Median 3Q Max

  • 633.8
  • 202.0
  • 193.6
  • 125.2

4679.0 Coefficients: Estimate

  • Std. Error

t value Pr(> |t|) (Intercept) 204.42722 111.15156 1.839 0.071 body weight 0.12452 0.01462 8.518 8.36e − 12 − − − Residual standard error: 846.5 on 58 degrees of freedom Multiple R-squared: 0.5557, Adjusted R-squared: 0.5481 F-statistic: 72.55 on 1 and 58 DF, p-value: 8.36e − 12

Janette Walde Advanced Statistics

slide-70
SLIDE 70

Example (cont.)

plot(modelfitted.values, modelresiduals,log=”x”)

200 500 1000 2000 5000 1000 2000 3000 4000 model$fitted.values model$residuals

Janette Walde Advanced Statistics

slide-71
SLIDE 71

Variance-stabilizing transformation

We see that the residuals’ variance depends on the fitted values (or the body weight): “heteroscedasticity”. The model assumes homoscedasticity, i.e. the random deviations must be (almost) independent of the explaining traits (body weight) and the fitted values. Variance-stabilizing transformation: a logarithmic transformation scales body- and brain size to make deviations independent of variable.

Janette Walde Advanced Statistics

slide-72
SLIDE 72

Cause for heteroscedasticity

Actually not so surprising: An elephant’s brain of typically 5 kg can easily be 500 g lighter or heavier from individual to individual. This cannot happen for a mouse brain of typically 5 g. The latter will rather also vary by 10%, i.e. 0.5 g. Thus, the variance is not additive but rather multiplicative.

Janette Walde Advanced Statistics

slide-73
SLIDE 73

Cause for heteroscedasticity

brain mass = (expected brain mass) • random We can convert this into something with additive randomness by taking the log: log(brain mass) = log(expected brain mass) + log(random)

Janette Walde Advanced Statistics

slide-74
SLIDE 74

Example: Logarithmic transformation

Call: lm(formula = log(brain weight) ∼ log(body weight), subset = extinct==”no”) Residuals: Min 1Q Median 3Q Max

  • 2.6500
  • 0.4957

0.0391 0.6776 1.6083 Coefficients: Estimate

  • Std. Error

t value Pr(> |t|) (Intercept) 1.93277 0.12570 15.38 < 2e − 16 log(body weight) 0.62395 0.02982 20.93 < 2e − 16 − − − Residual standard error: 0.867 on 58 degrees of freedom Multiple R-squared: 0.883, Adjusted R-squared: 0.881 F-statistic: 437.9 on 1 and 58 DF, p-value:< 2.2e − 16

Janette Walde Advanced Statistics

slide-75
SLIDE 75

Example (cont.)

2 4 6 8 −2 −1 1 logmodel$fitted.values logmodel$residuals

Janette Walde Advanced Statistics

slide-76
SLIDE 76

Multiple linear regression

yi = β0 + β1x1i + ... + βpxpi + εi εi.i.d.N(0, σ2) Ri = constant + β1NAPi + β2Grainsizei + β3Humusi + Weeki + β4Anglei + εi lm(formula = Richness ∼ angle2 + NAP + grainsize + humus + factor(week), data = RIKZ)

Janette Walde Advanced Statistics

slide-77
SLIDE 77

Residuals: Min 1Q Median 3Q Max

  • 5.0454
  • 1.2865
  • 0.3314

0.7048 12.0917 Coefficients: Estimate

  • Std. Error

t value Pr(> |t|) (Intercept) 9.298448 7.967002 1.167 0.250629 angle2 0.016760 0.042934 0.390 0.698496 NAP

  • 2.274093

0.529411

  • 4.296

0.000121 grainsize 0.002249 0.021066 0.107 0.915570 humus 0.519686 8.703910 0.060 0.952710 factor(week)2

  • 7.065098

1.761492

  • 4.011

0.000282 factor(week)3

  • 5.719055

1.827616

  • 3.129

0.003411 factor(week)4

  • 1.481816

2.720089

  • 0.545

0.589182 − − − Residual standard error: 3.092 on 37 degrees of freedom Multiple R-squared: 0.679, Adjusted R-squared: 0.6182 F-statistic: 11.18 on 7 and 37 DF, p-value: 1.664e − 07

Janette Walde Advanced Statistics

slide-78
SLIDE 78

Multicollinearity

Multicollinearity is the phenomenon in which two or more predictor variables in a multiple regression model are highly

  • correlated. In this situation the coefficient estimates may

change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a whole; it only affects calculations regarding individual predictors. That is, a multiple regression model with correlated predictors can indicate how well the entire bundle of predictors predicts the outcome variable, but it may not give valid results about any individual predictor, or about which predictors are redundant with others. Tolerance measures the degree of multicollinearity.

Janette Walde Advanced Statistics

slide-79
SLIDE 79

Model selection with respect to independent variables

Akaike information criteria AIC = n · log(SSresidual) + 2(p + 1) − n · log(n) AdjustedR2 = 1 − SSresidual/(n−(p+1))

sstotal/(n−1)

Forward selection, backward selection, or a combination of forward and backward selection. Note:

multiple comparisons collinearity explorative analysis

Janette Walde Advanced Statistics

slide-80
SLIDE 80

Partial linear regression

Yi = constant + β1Xi + β2Wi + β3Zi + εi Pure X effect? Filtering out the effects of W and Z:

1

Yi = constant + β4Wi + β5Zi + ε1i

2

Xi = constant + β6Wi + β7Zi + ε2i ε2 can be seen as the information in X after filtering out the effects of W and Z.

Janette Walde Advanced Statistics

slide-81
SLIDE 81

Partial linear regression

3

ε1i = βε2i + noisei This model shows the relationship between Y and X after partialling

  • ut the effect of W and Z. Hence, the

regression model of ε1 on ε2 shows the pure X effect. ⇒ Partial regression plot and interpret the significance of the slope.

Janette Walde Advanced Statistics

slide-82
SLIDE 82

Part correlation coefficient/decomposition

  • f the variation

To obtain the variance components, Legendre and Legendre (1998) gave the following algorithm:

1

Apply the linear regression model Yi = constant + Xiβ + Wiγ + noisei and

  • btain R2. This is equal to [a + b + c], and [d]

is equal to 1 − [a + b + c].

2

Apply the linear regression model Yi = constant + Xi ˜ β + noisei and obtain R2. This gives [a + b].

Janette Walde Advanced Statistics

slide-83
SLIDE 83

Part correlation coefficient/decomposition

  • f the variation

3

Apply the linear regression model Yi = constant + Wi˜ γ + noisei and obtain R2. This gives [b + c]. The following computation gives [b] = [a + b] + [b + c] − [a + b + c].

Janette Walde Advanced Statistics

slide-84
SLIDE 84

Variance partitioning for the RIKZ data

The following model was fitted Ri = constant + β1NAPi + β2Anglei + β3Exposurei + noisei All regression parameters are significantly different from zero at the 5% level. What is the pure NAP effect?

Janette Walde Advanced Statistics

slide-85
SLIDE 85

Variance partitioning for the RIKZ data

1

Ri = constant + β1NAPi + β2Anglei + β3Exposurei + noisei → R2 = 0.636 → [a + b + c] = 0.637 and [d] = 1 − 0.637 = 0.364.

2

Ri = constant + ˜ β1NAPi + noisei → R2 = 0.325 → [a + b] = 0.325

3

Ri = constant + ˜ β2 ˜ Anglei + ˜ β3Exposurei + noisei → R2 = 0.347 → [b + c] = 0.344 Hence, [b] = 0.325 + 0.344 − 0.636 = 0.033, and therefore [a] = 0.292 and [c] = 0.311. These results indicate that the pure NAP effect is 29.2%, meaning that 29% of the variation in species richness is explained purely by NAP.

Janette Walde Advanced Statistics