Lecture 4.2: Quiz 2 prep; HW Solutions; Probability Theory Topics - - PowerPoint PPT Presentation

lecture 4 2 quiz 2 prep hw solutions probability theory
SMART_READER_LITE
LIVE PREVIEW

Lecture 4.2: Quiz 2 prep; HW Solutions; Probability Theory Topics - - PowerPoint PPT Presentation

Lecture 4.2: Quiz 2 prep; HW Solutions; Probability Theory Topics for Quiz 2 , Fri 10/24: HW-3, especially # B, H, J, K, L, M HW-2, especially # 8-10, 13 HW-1 (except # 9) Old Topics (content) t-tests, 2 tests;


slide-1
SLIDE 1

1

Lecture 4.2: Quiz 2 prep; HW Solutions; Probability Theory

  • Topics for Quiz 2, Fri 10/24:
  • HW-3, especially # B, H, J, K, L, M
  • HW-2, especially # 8-10, 13
  • HW-1 (except # 9)
  • ‘Old’ Topics (content)

– t-tests, χ2 tests; Confidence Intervals for µ, p, and σ2; Power; Simple Regression; CLT – lm() output to examine Interaction, poly(X, 2) – Calculate µ and σ2 for discrete distrn – Simple calculations of Probability; P(1 < χ3

2 < 6.25).

slide-2
SLIDE 2

2

New Topics for Quiz 2

  • Conditional Probability
  • Examples, Axioms and Laws of Probability

Theory

  • ‘By hand’ calculations in Regression
  • More on linear vs non-linear effects, and

interaction

  • Examples of prob calculations. (Already can

solve “sample size reqd for margin of error to be 5%”, “budget required so that P(solvency) = 95%”)

  • Homogeneity of variance – test and

transformations that remedy heterogeneity

slide-3
SLIDE 3

Psy 10/Stat 60: Quiz 7, 2006

Students use the course material (lectures, stat text, own notes, problems and solutions, etc.) to varying degrees, and level of use defines “an inverse index of a student’s ‘preparedness’, X, on a 0-10 scale: “X = 0” means “highly prepared”, …, and “X = 10” means “unprepared.” Y measures a student’s statistical sophistication one year after the course, 0 ≤ Y ≤ 10. It may be assumed that a relationship between X and Y, if one exists, would be approximately

  • linear. The (xi, yi) values for a random sample of 30 students

were obtained, and a partial summary of the data follows.

3

slide-4
SLIDE 4

4

Ans: We must first use these quantities to calculate the 5 or so statistics needed for correlation & regression:

slide-5
SLIDE 5

5

slide-6
SLIDE 6
  • 3. Calculate the correlation coefficient, r, between X and
  • Y. Test whether the observed r is statistically significant,

stating your significance level, α, and your alternative

  • hypothesis. (50)
  • 4. State briefly your reasoning for your choice of

alternative hypothesis in #3 above. (20)

  • 5. Calculate the proportion of variance in Y that is

explained by X. [Ans. Prop explained variance = r2 = 17.1%.]

  • 6. Calculate the regression equation of Y on X. [Ans. b =
  • .529, a = 8.17, so regression eqn is:

y = 8.17 – 0.529x ]

  • 7. What do you conclude about the relation between the

variables in this study?

6

slide-7
SLIDE 7

7

Example (HW-3, #M): The cost, X, of treating a patient varies with the type and seriousness of the medical problem, and with other factors. X is coded as $100, $300, $500, or $700, and its probability distribution is:

µ ≡ E(X) = pi

i

xi = 3.2; σ 2 ≡ E[(X − µ)2] = E(X 2)− µ2 = 13.8 − 3.22 = 3.56. σ = 1.89.

slide-8
SLIDE 8

8

µT = 16 i µ = 16 i 3.2 = 51.2, σ T

2 = 16 iσ 2 = 16 i 3.56 = 56.96;σ T = 7.55.

slide-9
SLIDE 9

9

Probability as Area under the Curve

slide-10
SLIDE 10

10

Joint prob: P(‘X < b’ AND ‘X > a’) = P(‘a < X < b’)

slide-11
SLIDE 11

> pnorm(1.645) [1] 0.950015 > pnorm(1.645, lower = F) [1] 0.04998491 > qnorm(.05) [1] -1.644854 > qnorm(.05, lower = F) [1] 1.644854 > pt(2.2, 8, lower = F) [1] 0.02949695 > pchisq(3.84, 1, lower = F) [1] 0.050043

11

Examples

slide-12
SLIDE 12

12

Conditional Probability

  • Suppose we know the distrn of the

scores, X, of 100 persons. E.g., 80 have X > 10; 50 have X > 15; 20 have X > 19, etc.

  • What proportion have X > 15? Ans. .5
  • Among those with X > 10, what

proportion have X > 15? This is the conditional prob, P(X > 15 | X > 10).

  • Ans. 50/80 = .625 (not .5!)
slide-13
SLIDE 13

13

Conditional Probability

slide-14
SLIDE 14

14

  • (Psy 10/Stat 60, Quiz 2) A careful assessment
  • f the quality (Q) of a large pool of applicants

for a certain type of job gives the following distribution of quality. For a given level, x, of quality, we show the proportion, Prob(Q > x),

  • f applicants with quality greater than x.

P(Q > x) .95 .80 .70 .50 .30 .20 .05 .02 x 1.2 2.3 3.0 4.4 6.1 7.3 11.1 13.4

slide-15
SLIDE 15

15

  • Calculate Prob(4.4 < Q < 7.3). (20) (Ans. .5
  • .2 = .3.)
  • Only applicants with Q values of 2.3 or more

are invited for an interview. Among interviewees (i) what is the conditional probability that the quality of an interviewee would be greater than 4.4? (30) (ii) what is the conditional probability that the quality of an interviewee would be less than 6.1? (30)

P(Q > x) .95 .80 .70 .50 .30 .20 .05 .02 x 1.2 2.3 3.0 4.4 6.1 7.3 11.1 13.4

slide-16
SLIDE 16

16

Answers

(i) P(Q > 4.4 | Q > 2.3) = P(Q > 4.4 and Q > 2.3)/P(Q > 2.3) = P(Q > 4.4)/P(Q > 2.3) = .5/.8 = .625. (ii) P(Q < 6.1 | Q > 2.3) = P(Q < 6.1 and Q > 2.3)/P(Q > 2.3) = P(2.3 < Q < 6.1)/P(Q > 2.3) = (.8 - .3)/(.8) = .5/.8 = .625.

slide-17
SLIDE 17

17

Probability and Causal Reasoning

  • In a recent (10/9/09) article, since retracted:

– 4% of healthy persons carry the virus, XMRV – 66 out of 101 chronic fatigue syndrome patients carry XMRV – Maybe XMRV is a ‘passenger virus’: CFS è XMRV – Maybe Cause0 causes people to get CFS and XMRV – Or maybe XMRV causes CFS: XMRV è CFS

  • The statistics here is trivial. Under the null,

1 – pbinom(65, 101, .04) = 0!

  • It’s the causal story that’s complex.

The initial reports (2006) were followed by a large number of studies in which no association was found between XMRV and cancer or CFS. It has not been established [in 2013] that XMRV can infect humans, nor has it been demonstrated that XMRV is associated with or causes human disease.

slide-18
SLIDE 18

18

Motivation for studying PT

  • Probability Theory (PT) is inherently interesting; but

it also has instrumental value!

  • PT provides theory underlying Stats and Psych:

(a) ‘large’ or ‘extreme’ deviation = deviation with a ‘low’ !

probability (basis for Statistical Inference) (b) Mean (i.e., Expected Value), median, mode; variance, etc. are probabilistic concepts (c) Laws of Large Numbers; Central Limit Theorem (d) ‘best’ estimate of a parameter (e.g., µ) is the value that makes the observed data most probable or likely (the Maximum Likelihood principle) (e) ‘Law of small numbers’ – Kahneman & Tversky

  • But not every concept is probabilistic! E.g., ‘least

squares’

slide-19
SLIDE 19

19

The central limit theorem has an interesting history. The first version of this theorem was postulated by the French-born mathematician Abraham de Moivre who, in a remarkable article published in 1733, used the normal distribution to approximate the distribution of the number of heads resulting from many tosses of a fair coin. This finding was far ahead of its time, and was nearly forgotten until the famous French mathematician Pierre-Simon Laplace rescued it from obscurity in his monumental work Théorie Analytique des Probabilités, which was published in 1812. Laplace expanded De Moivre's finding … It was not until the nineteenth century was at an end that the importance of the central limit theorem was discerned, when, in 1901, Russian mathematician Aleksandr Lyapunov defined it in general terms and proved precisely how it worked

  • mathematically. Nowadays, the central limit theorem is considered to be

the unofficial sovereign of probability theory.

The CLT per Wiki

Tijms (2004, p. 169) writes:

slide-20
SLIDE 20

20

Sir Francis Galton (Natural Inheritance, 1889) described the Central Limit Theorem as:

I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the "Law of Frequency of Error". The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement, amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme law of Unreason. Whenever a large sample of chaotic elements are taken in hand and marshaled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all along.

slide-21
SLIDE 21

Technical Note

  • The proof of the CLT relies, in turn, on

various limits involving the logarithm and exponential functions, such as:

  • As n -> ∞, [1 + (a/n)]n -> ea.
  • For small x, log(1 + x) ≈ x, or, equivalently,
  • For small x, ex ≈ 1 + x.
  • The ‘x’ in these approximations refers to 1/n,

n large, in the proof of the CLT. The CLT is an example of ‘large sample’ theory.

21

slide-22
SLIDE 22

22

Least Squares: An algebraic, not probabilistic,

principle for choosing ‘best’ estimates of parameters. We used it in regression estimation earlier.

Ex: x1, x2, …, xn are independent obs from a population with mean, µ. What is ‘best’ est, m, of µ. For each xi, define ‘error’ or ‘loss’ as (xi - m)2. What value of m minimizes (makes ‘least’) the sum of squared errors, ∑(xi - m)2? Recall the definition of the mean, , and note that the Sum of Squares, SS = ∑(xi - )2, does not depend on m.

x

x

slide-23
SLIDE 23

23

  • Then, we wish to minimize
  • This shows that the sample mean has the

useful, ‘least squares’ property

C ≡ (xi − m)2

= [(xi

− x) + (x − m)]2 = (xi − x)2

+ 2 (xi − x)

(x − m) + (x − m)2

= SS + 2(x − m) (xi − x)

+ n(x − m)2 = SS + 0 + n(x − m)2, which is a min when m = x.

slide-24
SLIDE 24

24

Definitions, Axioms and Laws of Probability Theory

  • Perform an expt (e.g., toss a coin twice). The set
  • f possible outcomes (e.g., HH, HT, TH, and

TT) is the sample space. Any subset, A, of the sample space is an event. Let p(A) denote the "probability that A is the outcome of the experiment". We now consider some rules for calculating p(A) in certain situations.

slide-25
SLIDE 25

25

Examples of events

  • A1 = "exactly 1 H in 2 tosses” is {HT, TH}.
  • A2 = "at least 1 T in 2 tosses" is {HT, TH,

TT}

  • A3 = "2 H's in 2 tosses" is{HH}.
  • A1 and A2 can both occur if the experiment

is performed once; i.e., if HT or TH occurs. We say that A1 and A2 are not mutually exclusive.

slide-26
SLIDE 26

26

  • A1 = "exactly 1 H in 2 tosses” is {HT, TH}.
  • A2 = "at least 1 T in 2 tosses" is {HT, TH,

TT}

  • A3 = "2 H's in 2 tosses" is{HH}.
  • However, A1 and A3 cannot both occur in a

single run of the experiment; either one or the other can occur, but not both. In this case, we say that A1 and A3 are mutually exclusive or disjoint

slide-27
SLIDE 27

27

A

(sometimes Ac, `c' for complement) means ‘not A’ Φ refers to the `null event' or `empty set' that contains no elements of the sample space. A B means the event `A or B (or both)', A B means the event `A and B', and A | B means the conditional event, `that A occurs given that B has occurred'. Let S refer to the whole sample space. Two events A and B are independent if the occurrence

  • r nonoccurrence of one has no effect on the occurrence
  • r nonoccurrence of the other.

slide-28
SLIDE 28

28

Probability function, p(A): For each event A, there is a real number, p(A), called the probability of A, that captures the probability that A occurs on each given ‘experiment’. Probability can sometimes be defined via relative frequency or proportion: probability is the limit, as the number of draws increases, of the relative frequency of occurrence of a desired event. Theoretically, at infinity, probability matches proportion.

slide-29
SLIDE 29

29

Axioms

  • (i) p(A) ≥ 0,

(ii) p(S) = 1, (iii) if A1, A2, ... are disjoint events, p(A1 A2 …) = p(A1) + p(A2) + … That is,

p Ai

i=1 ∞

⎛ ⎝ ⎜ ⎞ ⎠ ⎟ = p(Ai)

i=1 ∞

slide-30
SLIDE 30

30

Laws of Probability

  • The above axioms imply:
  • (a) p( ) = 1 - p(A).

(b) 0 ≤ p(A) ≤ 1. (c) p(Φ) = 0. (d) As a special case of Axiom (iii), if A and B are mutually exclusive events (i.e., if one of them

  • ccurs the other cannot, or, equivalently, p(A and

B) = 0), then p(A or B) = p(A) + p(B).

  • (e) The addition rule: When A and B are not

mutually exclusive, i.e., when p(A and B) > 0, then

  • p(A or B) = p(A) + p(B) - p(A and B).

A

slide-31
SLIDE 31

31

slide-32
SLIDE 32

32

Conditional Probability: Bayes’ Rule p(A | B) = p(A and B) p(B)

(f) (g) A and B are independent events if knowing that one of them has occurred does not change the probability that the

  • ther will occur, i.e., if p(A | B) = p(A). From (f) above, this

implies that, if A and B are independent events, The multiplication rule: p(A and B) = p(A)p(B), and that p(B | A) = p(B).

slide-33
SLIDE 33

33

  • (h) When we have a bivariate distribution, the

following rule is often used: Note that the events "A" and "not A" are mutually exclusive [i.e., p("A" and "not A") = 0] and exhaustive [i.e., p(A) + p(not A) = 1].

  • P(B) = P(B | A)P(A) + P(B |not A)P(not A).
slide-34
SLIDE 34

34

B ~B A P(A & B) P(A & ~B) P(A) ~A P(~A & B) P(~A & ~B) P(~A) P(B) P(~B) 1 P(A|B) = P(A & B)/P(B); P(B|A) = P(A & B)/P(A) P(B) = P(A & B) + P(~A & B) = P(B|A)*P(A) + P(B|~A)*P(~A)

slide-35
SLIDE 35

35

  • 25% of students failed math (FM), 15%

failed chem (FC), and 10% failed both (FC&FM). Calculate P(FM|FC), P(FC|FM), P(FC or FM), P(FM|PassC):

Chem Fail Pass Math Fail .10 .15 Pass .05 .70 Total .15 .85

slide-36
SLIDE 36

36

Examples of Simple Probability

  • We think a bag has 9 marbles, 6 red and 3
  • white. We draw 2 marbles simultaneously,

note their color, and replace them. There are 3 possible outcomes: 2R, 1R1W or 2W. We actually make 120 draws of pairs of marbles and note which of the 3 possible

  • utcomes occurs on each draw. The data

are: 61 `2R's, 53 `1R1W's, and 6 `2W's. Are these frequencies consistent with those expected (at the .05 level)?

slide-37
SLIDE 37

37

  • We first need to calculate the probabilities, pi,
  • f each possible outcome, i = 1, 2, 3. Then Ei =

N pi; etc. Here, we focus on the calculation of the pi's.

  • P(2R) = P(1st marble is R and 2nd marble is R).
  • P(1st is R) = 6/9, because there are 9 balls and 6
  • f them are R.
  • After the 1st is drawn and is known to be R

(and is not yet replaced; we replace the 2 marbles after both are drawn!), we have 8 marbles left in the bag, 5 of them R. Thus, the probability that the 2nd is R, given that the 1st is R, is equal to 5/8.

  • Thus, P(2R) = (6/9)(5/8) = 5/12.
slide-38
SLIDE 38

38

  • Similarly, P(2W) = (3/9)(2/8) = 1/12.
  • Since there are only 3 possible outcomes,

P(1R1W) = 1 - P(2R) - P(2W) = 1/2.

  • We can now find the expected frequencies

and finish the problem.

slide-39
SLIDE 39

39

Binomial Expt

  • A test has 5 multiple-choice questions on it, and each

question has 3 possible answers. A person guesses on all items. What is:

– P(item #2 is correct)? (Ans. 1/3). – P(0 items correct)? (Ans. (2/3)5, by the multiplication rule) – P(1 item correct)? (Ans. 5*(1/3)*(2/3)4)

  • For the last one: There are 5 ways to get exactly 1

correct: C on 1st qu only, C on 2nd only, …, C on 5th. These are mutually indep events, so the prob of any one

  • f them occurring is the sum of the probs (all equal):
  • P(1 C) = 5*P(1 C and 4 Inc’s) = 5*(1/3)*(2/3)4
  • Or, use R: dbinom(1, 5, 1/3) = 0.329.
slide-40
SLIDE 40

40

Binomial Expt

  • A test has 5 multiple-choice questions on it, and each

question has 3 possible answers. A person guesses on all items. What is:

– P(item #2 is correct)? (Ans. 1/3). – P(0 items correct)? (Ans. (2/3)5, by the multiplication rule) – P(1 item correct)? (Ans. 5*(1/3)*(2/3)4)

  • P(0 Correct): dbinom(0, 5, 1/3) = 0.1317 = (2/3)5
  • P(3 or more C’s) = P(3 C’s) + P(4 C’s) + P(5 C’s)

= dbinom(3, 5, 1/3) + dbinom(4, 5, 1/3) + dbinom(5, 5, 1/3) = …

  • Or, P(3 or more C’s) = 1 - P(2 or less C’s) = 1 -

pbinom(2, 5, 1/3) = …

slide-41
SLIDE 41

41

Non-linear and interaction effects

  • Orientation to data analysis (HW-1, #7). Why?
  • Separating linear from non-linear effects of a

quantitative predictor – in theory & in R.

  • Examples: (i) Food preference, (ii) Optimism,

(iii) Perceived guilt, (iv) Abstract examples

slide-42
SLIDE 42

42

Orienting notes & questions

  • What 1 to 3 relationships are ‘interesting’ and

empirical? Why is data analysis needed?

  • What variables, and is each qualitative or

quantitative; dependent or independent variables. Which is the mediator variable, the moderator variable? Are the relationships linear or non-linear?

  • Why should you care about linear vs non-linear

relations?

– r only measures linear relns; relying only on r would miss non-linear relns – The psychological or other mechanisms leading to linear relns are often different from those leading to non-

  • linearity. Helpful to know which set generated the data
slide-43
SLIDE 43

43

There is a clear U-shaped reln, but r measures

  • nly the linear component.
slide-44
SLIDE 44

Often, ‘non-linear’ ≈ ‘quadratic’

  • Often, small departures from linearity in

X are (i) interesting, and (ii) modeled approx by a quadratic term, e.g., X2 or (X – m)2. So, for U- and inverted U- shaped relations, including a quadratic term in lm() is a sufficient way to detect non-linearity.

  • Exceptions: S-shaped curve; curve

with an asymptote (‘ceiling’ or ‘floor’); logistic curve; exponential curve; …

44

slide-45
SLIDE 45

45

  • Example 1: X = size of meal, Y =

attractiveness of meal.

– What are possible relns between X and Y? – How might this relation depend on Age? Culture? – Discuss interaction.

  • Student answers!
slide-46
SLIDE 46

46

  • When motivation is ‘hunger’ or ‘greed’, Y

might depend linearly on X

  • When motivation is ‘dietary’, there may be an
  • ptimum X, such that Y is max at optimum X.

This is a non-linear, e.g., quadratic, reln. Evidence of non-linearity is evidence of a ‘dietary’ motivation

slide-47
SLIDE 47

47

  • HO-1, #11.2. Is ‘optimism’ related to ‘party’, as

well as ‘age’? Plots are interpretable.

Estimated Marginal Means of OPTMISM

AGECAT

70.00 57.00 45.00 35.00 24.00

Estimated Marginal Means

8 7 6 5 4 3

PARTY

Democ Repub Other

slide-48
SLIDE 48

48

  • The ‘age’ main effect is mainly due to the

difference between very young voters and the

  • rest. Republicans are less optimistic than the

rest of the voters (‘party’ main effect).

  • The plots above show a hint on non-linearity

in each Party. Is this quadratic effect of ‘age’ (averaged across parties) statistically significant?

  • Interactions

– Is ‘age(linear)’ effect same for all parties? – Is ‘age(quad)’ effect same for all parties?

slide-49
SLIDE 49

49

  • From ‘sfield1.r’, HO-1

res1 = lm(optmism ~ age + party, na.action=na.omit, d0) #additive model res2 = lm(optmism ~ poly(age, 2) + party, na.action=na.omit, d0) #additive model with quadratic term res3 = lm(optmism ~ poly(age, 2) * party, na.action=na.omit, d0) #interactive model with quadratic term print(anova(res1, res2, res3))

slide-50
SLIDE 50

50

  • Call: lm(formula = optmism ~ poly(age, 2) + party,

data = d0, na.action = na.omit) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.2900 0.1748 35.977 < 2e-16 *** poly(age, 2)1 7.5534 1.6356 4.618 7.02e-06 *** poly(age, 2)2 -3.2733 1.6580 -1.974 0.0498 * partyrepub -1.7777 0.2579 -6.894 7.33e-11 *** partyother 0.1019 0.3227 0.316 0.7525 Describe the effects in this

additive model (i.e., model with no interaction terms)

slide-51
SLIDE 51

51

Call: lm(formula = optmism ~ poly(age, 2) * party, data = d0, na.action = na.omit) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.28724 0.17702 35.518 < 2e16 *** poly(age, 2)1 6.81239 2.34250 2.908 0.00407 ** poly(age, 2)2 -3.01154 2.32628 -1.295 0.19703 partyrepub -1.75986 0.26557 -6.627 3.41e-10 *** partyother 0.09563 0.33657 0.284 0.77663 poly(age, 2)1:partyrepub 0.69865 3.93001 0.178 0.85909 poly(age, 2)2:partyrepub 0.52216 4.03495 0.129 0.89717 poly(age, 2)1:partyother 3.25014 4.33597 0.750 0.45443 poly(age, 2)2:partyother -2.82347 4.40866 -0.640 0.52266 None of the interaction terms is sig, so I

prefer the additive model. Using the additive model, I conclude that there is sig non-linearity in the ‘age’ effect (p < .05). (Note this age(quad) effect is not sig in the model with interaction.)

slide-52
SLIDE 52

52

Analysis of Variance Table Model 1: optmism ~ age + party Model 2: optmism ~ poly(age, 2) + party Model 3: optmism ~ poly(age, 2) * party Res.Df RSS Df Sum of Sq F Pr(>F) 1 196 517.78 2 195 507.64 1 10.15 3.8364 0.05161 . 3 191 505.16 4 2.48 0.2343 0.91877

slide-53
SLIDE 53

Generic description of ‘interaction’

  • G (= 1, 2) and X (e.g., quant) are predictors;

Y is DV. Is the G*X interaction sig? “No”, if the effect of X on Y the same at each level

  • f G. Or: “No”, if the effect of G on Y the

same at each level of X.

** Describe effect of X on Y when G = 1 ** Describe effect of X on Y when G = 2 ** Compare/contrast the above 2 descriptions * Add any other ‘interesting’ features; e.g., any non-linear effects, or is the interaction due to only 1 group, or crossover effect, …?

53

slide-54
SLIDE 54

54

  • Abstract examples:

– Interaction of ‘group’ with X(linear) – Interaction of ‘group’ with X(quadratic)

  • If interaction is sig, take care in

describing data. Main effects may not be meaningful.

  • Describe 3 main effects, namely, due to

‘group’, X(linear) and X(quad), in addition to the 2 interactions, G*X(linear) and G*X(quad).

slide-55
SLIDE 55

55

Ex: health vs stress: decreasing function if S is ‘not innoculated’, But flat if S is ‘innoculated’.

slide-56
SLIDE 56

56

  • 1. No main effect of

‘Group’.

  • 2. Scores decline with

‘age’.

  • 3. The 2 lines have

same slope, so no age(lin)*Group inter- action.

  • 4. However, Group 2

has a non-linear effect

  • f ‘age’, so there is a

age(quad)*Group interaction.

slide-57
SLIDE 57

57

  • 1. Main effect of Group

is sig.

  • 2. Main effect of

age(lin) is not sig. (Why?)

  • 3. Main effect of

age(quad) is sig. (Why?)

  • 4. age(lin)*Group is

not sig. (Why?)

  • 5. age(quad)*Group is

not sig. (Why?)

slide-58
SLIDE 58

58

  • 1. Main effect of Group

is not sig. (Why?)

  • 2. Main effect of

age(lin) is not sig. (Why?)

  • 3. Main effect of

age(quad) may be sig. (Why?)

  • 4. age(lin)*Group is

very sig. (Why?)

  • 5. age(quad)*Group is
  • sig. (Why?)
slide-59
SLIDE 59

59

More Examples

  • Does the relation between ‘Future Threat’ and

‘Perceived Guilt’ depend on whether or not the defendant is seen as ‘Mentally Ill’? (HW-3)

  • Does the relation between ‘Weeks of CBT’ and

‘depression’ depend on whether the patient is treated with an SSRI or a sugar pill?

  • In a romantic relationship, does the relation between

‘target’s satisfaction with the relation’ and ‘partner’s positive image of target’ depend on the target’s self- image (‘hi’ vs ‘lo’)?

slide-60
SLIDE 60
  • Ex. 2. Perceived guilt (HW-3)

Assess the relations among jurors’ perceptions of a defendant’s being (i) ‘mentally ill’ (yes/no), (ii) a ‘future threat’ to society (0-10 scale), and (iii) guilty, as charged (1-4 scale).

Model 1: guiltcat ~ mentill Model 2: guiltcat ~ mentill + futhrt Model 3: guiltcat ~ mentill * futhrt Model 4: guiltcat ~ mentill * poly(futhrt, 2)

60

slide-61
SLIDE 61

61

Res.Df RSS Df SumSq F Pr(>F) 1 238 152.29 2 237 148.26 1 4.0359 6.6130 0.010743 * 3 236 143.08 1 5.1774 8.4833 0.003931 ** 4 234 142.81 2 0.2675 0.2191 0.803376

rs24 = anova(rs1, rs2, rs3, rs4) print(rs24) The degrees of freedom, “Df” above, for each test is the Difference in the number of parameters between ‘reduced’ and ‘full’ model. Here is the output for the most general model.

slide-62
SLIDE 62

62

lm(formula = guiltcat ~ mentill * poly(futhrt, 2), data = d0, na.action = na.omit) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.3595 0.0739 45.463 < 2e-16 *** mentill -1.6306 0.1047 -15.575 < 2e-16 *** poly(futhrt, 2)1 4.8078 1.2342 3.896 0.000128 *** poly(futhrt, 2)2 0.7306 1.1463 0.637 0.524480 mentill:poly(futhrt, 2)1 -5.0125 1.6990 -2.950 0.003497 ** mentill:poly(futhrt, 2)2 -0.5200 1.6435 -0.316 0.751981

poly(futhrt, 2)1 is the linear effect of ‘futhrt’ on guilt; and poly(futhrt, 2)2 is the non-linear effect of ‘futhrt’ on guilt. It is helpful to think of these as 2 orthogonal predictors, each having a main effect, and each possibly interacting with ‘mentill’. Only the linear term interacts with ‘mentill’.

slide-63
SLIDE 63

Homogeneity of variance, a key assumption in lm() lm()

  • The t-test and tests done with lm() assume

that all groups have the same within-group

  • variance. What remedial steps to take if this

assn is violated? (Handout 1, pp. 8-13)

63

slide-64
SLIDE 64
  • One solution is to transform the DV and hope

that stabilizes the variance.

– log(Y), or log(Y + ε) to remove 0’s (ε = .5?) – sqrt(Y) – 1/Y, for Y = Reaction Time, is ‘speed’ – arcsin(Y), if Y is a proportion

lphapp = log(Pasthapp + .5); sphapp = sqrt(Pasthapp) rs3a = bartlett.test(Pasthapp ~ memtype, na.action=na.omit, data=dat0) #compare variances on score for all 3 groups print(rs3a) rs3b = bartlett.test(lphapp ~ memtype, na.action=na.omit, data=dat0) print(rs3b) rs3c = bartlett.test(sphapp ~ memtype, na.action=na.omit, data=dat0) print(rs3c)

64

slide-65
SLIDE 65

Bartlett test of homogeneity of variances data: Pasthapp by memtype Bartletts K-squared = 8.3253, df = 2, p-value = 0.01557 Bartlett test of homogeneity of variances data: lphapp by memtype Bartletts K-squared = 1.0049, df = 2, p-value = 0.605 Bartlett test of homogeneity of variances data: sphapp by memtype Bartletts K-squared = 1.8797, df = 2, p-value = 0.3907

65

slide-66
SLIDE 66
  • The homogeneity of variance

assumption is violated in the analysis of raw scores, but not in the analysis of the transformed scores – both transforms succeed in stabilizing the variance.

  • The conclusion about group differences

is unchanged. Is there a ‘theoretical’ or ‘aesthetic’ reason for preferring the log

  • ver the sqrt transformation? It depends
  • n the context.

66

slide-67
SLIDE 67
  • The benefit of transforming the data is that the

homogeneity of variance assumption is satisfied. After the analysis, one then reports the effects in the usual way.

  • There is nothing wrong with an analysis based
  • n transformed data and a report based on the
  • riginal data. Finding an appropriate

transformation is a frankly empirical matter – use the transformation that is most successful in stabilising the variance.

  • But there are theoretical reasons for expecting

certain transformations to be successful in certain situations.

67

slide-68
SLIDE 68

A theoretical approach

  • Here we base the choice of transform on the

precise relationship across the k groups between the group mean and the group variance (or s.d.). This is our diagnosis.

  • The log transform is appropriate when the

standard deviation is proportional to (or, more loosely, is linearly related to) the mean; and

  • The square root transform is appropriate

when the variance is proportional to (or, more loosely, is linearly related to) the mean.

68

slide-69
SLIDE 69

69

Both plots are approximately linear.

slide-70
SLIDE 70

Sketch of proof [Optional]

  • We often can see a rough relation between σ and µ, σ

= g(µ). We wish to find a transformation, Y* = f(Y), such that the variance of Y* does not depend on the mean of Y*. We may argue as follows.

  • First, let us write the dependent variable, Y = µ + e;

i.e., µ is the mean of Y and σ is the standard deviation

  • f Y (and, therefore, of e). The transformed variable is

Y* = f(Y) = f(µ + e). Then, by a standard (Taylor) approx in Calculus, Y* ≈ f(µ) + f '(µ)e,

where f '(.) is the first derivative of f(.).

Approximately then, Y* is a linear transformation of e.

70

slide-71
SLIDE 71
  • By the known result on linear

transformations,

var(Y*) = [f '(µ)]2var(e) = [f '(µ)]2 σ2.

  • But, through observation, we know that σ

= g(µ). Therefore, in order that var(Y*) be constant (= c2, say), we must have that

  • [f '(µ)]2 σ2 = [f '(µ)]2[g(µ)]2 = c2; i.e.,
  • f '(µ)g(µ) = c, which implies that

71

If we know g(.), we integrate above to get f(.), the required

  • transformation. The most popular cases are tabulated next.
slide-72
SLIDE 72

72