FAO training course June, 11, 2013 Yoshiki Tsukakoshi Ph.D. - - PowerPoint PPT Presentation

fao training course june 11 2013
SMART_READER_LITE
LIVE PREVIEW

FAO training course June, 11, 2013 Yoshiki Tsukakoshi Ph.D. - - PowerPoint PPT Presentation

Data analysis and Basic Statistics FAO training course June, 11, 2013 Yoshiki Tsukakoshi Ph.D. (statistical science) (National Food Research Institute, National Agriculture and Food Research Organization) Email: Yoshiki.tsukakoshi@gmail.com


slide-1
SLIDE 1

Data analysis and Basic Statistics FAO training course June, 11, 2013

Yoshiki Tsukakoshi Ph.D. (statistical science) (National Food Research Institute, National Agriculture and Food Research Organization) Email: Yoshiki.tsukakoshi@gmail.com

slide-2
SLIDE 2

Today’s Agenda

  • Descriptive statistics

– Summary statistics – Graphical presentation

  • Statistical Inference

– Estimation (Point/Interval) – Hypothesis testing – Regression analysis – Analysis of variance

  • Statistical Packages
slide-3
SLIDE 3

Analytical Results (Repeated Analysis) AAS for cereal

Test Cadmium mg/kg 1 0.0394 2 0.0412 3 0.0420 4 0.0398 5 0.0407 6 0.0395

slide-4
SLIDE 4

Know Data Types

  • Univariate Data

– Single variable from a sample

  • Bivariate Data

– Two variable from a sample – Analytical value and Brand, origin – Comparison of two method

  • Multivariate Data

– Multiple data from single sample – ICP-MS, DNA arrays

slide-5
SLIDE 5

Univariate Data

  • Single set of samples

Test Cadmium mg/kg 1 0.0394 2 0.0412 3 0.0420 4 0.0398 5 0.0407 6 0.0395 Average 0.0404

slide-6
SLIDE 6

What can be done with Univariate Data Descriptive Statistics

  • Location/Shift/Central Tendency

– Mean (arithmetic, geometrical, harmonic)

  • Trimmed
  • Interquartile mean

– Median, Mode

  • Scatter/Scale/Dispersion

– Variance, median absolute deviation – Range, interquartile range

  • Shape

– Skewness, kurtosis

  • Frequency distribution

– Histogram, cumulative distribution,stem and leaf

slide-7
SLIDE 7

Location Parameters

  • Sample data set (5,4,5,6,5)
  • Arithmetic mean

– ((5+4+5+6+5)/5) = 5

  • Geometrical mean

– (5*4*5*6*5)^(1/5) = 4.959344… – Positive number only

  • Harmonic mean

– (1/5) + (1/4) + (1/5) + (1/6) + (1/5) = 4.918033…

  • Arithemetic >= geometrical >= harmonic
slide-8
SLIDE 8

Location parameter - 2

  • Location based on quantile

– Median = 5 (Symmetric case, mean) – Quartile

  • 1ST, 2ND 3RD

– Percentile 1st, 2nd...100th

  • Mode

– If continuous, kernel density method

  • Empirically, 3(mean-median)≒ (mean-mode)
slide-9
SLIDE 9

Scale parameters

  • Variance, Standard Deviation
  • Coefficient of Variation

– Geometric coefficient of variation

  • Range

– (minimum, maximum)

slide-10
SLIDE 10

Population variance

Test result Deviance Deviance2 1 0.0394

  • 0.00103

1.06778E-06 2 0.0412 0.000767 5.87778E-07 3 0.0420 0.001567 2.45444E-06 4 0.0398

  • 0.00063

4.01111E-07 5 0.0407 0.000267 7.11111E-08 6 0.0395

  • 0.00093

8.71111E-07 Average 0.0404 9.08889E-07

Coefficient of variance, Relative standard error s/μ

slide-11
SLIDE 11

Descriptive Statistics Graphical representation of data

  • Histogram
  • Stem-and-leaf

– (1.1, 1.1, 1.1, 1.2, 1.2, 2.5, 2.5) – 1|1 1 1 2 2 – 2|5 5

Histogram of r0

r0 Frequency

  • 6
  • 4
  • 2

2 4 6 6000

slide-12
SLIDE 12

Starting R

Command line window

slide-13
SLIDE 13

Exercise Plotting histogram

  • >Orange
  • Tree age circumference
  • 1 1 118 30
  • 2 1 484 58
  • 3 1 664 87
  • > hist(Orange$circumference)

Histogram of Orange$circumference

Orange$circumference Frequency 50 100 150 200 2 4 6

slide-14
SLIDE 14

Exercise Plotting Stem and Leaf Plot

  • > stem(Orange$circumference)
  • The decimal point is 1 digit(s) to the right of the |
  • 2 | 00023
  • 4 | 918
  • 6 | 295
  • 8 | 17
  • 10 | 81255
  • 12 | 059
  • 14 | 02256
  • 16 | 72479
  • 18 |
  • 20 | 3394
slide-15
SLIDE 15

Exercise Plotting empirical distribution

  • >plot(ecdf(Orange$ci

rcumference))

50 100 150 200 0.0 0.4 0.8

ecdf(Orange$circumference)

x Fn(x)

slide-16
SLIDE 16

Shape parameter

  • Skewness
  • Kurtosis

Histogram of rnorm(1000)

rnorm(1000) Frequency

  • 3
  • 2
  • 1

1 2 3 50 150

Histogram of rnorm(100)^2

rnorm(100)^2 Frequency 2 4 6 8 20 40 60

Histogram of rnorm(1000)

rnorm(1000) Frequency

  • 3
  • 2
  • 1

1 2 3 50 150

slide-17
SLIDE 17

skewness

  • Non parametric skew = (mean-median)/s.d.

Histogram of rnorm(100)^2

rnorm(100)^2 Frequency 2 4 6 8 20 40 60

Histogram of rnorm(1000)

rnorm(1000) Frequency

  • 3
  • 2
  • 1

1 2 3 50 150

Histogram of rnorm(100)^2

rnorm(100)^2 Frequency 2 4 6 8 20 40 60

negative Positive

slide-18
SLIDE 18

Calculating skewness

Test result Deviance Deviance3 1 0.0394 -0.00103

  • 1.10337E-09

2 0.0412 0.000767 4.5063E-10 3 0.0420 0.001567 3.8453E-09 4 0.0398 -0.00063

  • 2.54037E-10

5 0.0407 0.000267 1.8963E-11 6 0.0395 -0.00093

  • 8.13037E-10

Average 0.0404 3.57407E-10

  • Divide mean deviance3 by s.d.3
slide-19
SLIDE 19

Interval Range

  • 1 standard deviation (σ)
  • Inter-Quartile Range (IQR)
  • 3rd quartile (75%th quantile) - 1st quartile (25%th quantile)
  • Normalized IQR 0.7413×IQR

– If data is normally distributed, converge to standard deviation

slide-20
SLIDE 20

Terminology : Shape

  • Unimodal

– Single peak

  • Bimodal

– Dual peak

  • Multimodal

Histogram of rnorm(1e+06, 0, 1)

rnorm(1e+06, 0, 1) Frequency

  • 4
  • 2

2 4 50000 150000

istogram of c(rnorm(1e+06, 0, 1), rnorm(1e+0

c(rnorm(1e+06, 0, 1), rnorm(1e+06, 8, 2)) Frequency

  • 5

5 10 15 100000 250000

slide-21
SLIDE 21

Terminology : Shape - 2

  • Symmetric
  • Asymmetric

– Skewness not zero

istogram of c(rnorm(1e+06, 0, 1), rnorm(1

c(rnorm(1e+06, 0, 1), rnorm(1e+06, 4, 2)) Frequency

  • 5

5 10 0e+00 2e+05

Histogram of rnorm(1e+06, 0, 1)

rnorm(1e+06, 0, 1) Frequency

  • 4
  • 2

2 4 50000 150000

slide-22
SLIDE 22

Terminology : Shape - 3

  • Heavy tail

– Decreases slower than exponential distribution

  • Light tail

– Decreases faster than exponential distribution

Histogram of c(rlnorm(1000, 0, 1))

c(rlnorm(1000, 0, 1)) Frequency 5 10 15 20 25 200 400 600

slide-23
SLIDE 23

Graphical Distribution Box-plot

  • 4
  • 2

2 4

Outlier ( outside 1.5 IQR) Median 1st Quadrant 3rd Quadrant +1.5 IQR Outlier ( outside 1.5 IQR)

  • 1.5 IQR
slide-24
SLIDE 24

Graphical Distribution Error bars

  • The width can be:

– Standard error – Confidence interval

1 2 3 4 5 Replicates Value (arbitrary units) 0.0 0.4 0.8 1.2 1 2 3 4 5 Replicates Value (arbitrary units) 0.0 0.4 0.8 1.2

slide-25
SLIDE 25

Exercise on calculation

  • =RAND()
  • Copy

=NORM.INV(RAND(), 0, 1)

slide-26
SLIDE 26
slide-27
SLIDE 27

Generating (pseudo) random number

  • > rnorm(10,0,1)
  • 0.11896589 0.34946077

0.69093896 0.12069319

  • 0.39211479 0.04787566

0.17911326 0.67266441 1.16644580 -0.35335929

  • > r0 = rnorm(1e5, 0, 1)
  • > r1 = rnorm(1e5, 1, 1)
  • >hist(r0),xlim=c(-6,6))
  • >hist(r1),xlim=c(-6,6))

Histogram of r0

r0 Frequency

  • 6
  • 4
  • 2

2 4 6 6000

Histogram of r1

r1 Frequency

  • 6
  • 4
  • 2

2 4 6 6000

slide-28
SLIDE 28

Expectation and variance

  • E[aX+bY]=aE[X]+bE[Y]
  • V[X+Y] =V[X]+V[Y]+2cov(X,Y)
  • SD[X+Y] = √V[X+Y]
slide-29
SLIDE 29

Exercise Adding variable (cont.)

  • # E[aX+bY]=aE[X]+bE[Y]
  • > mean(r0)
  • [1] 0.001574978
  • > mean(r1)
  • [1] 0.9990033
  • > mean(r0+r1)
  • [1] 1.000578
  • > mean(r0-r1)
  • [1] -0.9974283
slide-30
SLIDE 30

Exercise Estimating Standard Deviation

  • # V[X+Y]=V[X]+V[Y]+Cov(X,Y)
  • > sd(r0)
  • [1] 0.999807 (1)
  • > sd(r1)
  • [1] 0.997682 (1)
  • > sd(r0+r1) # (Adding independent variables)
  • [1] 1.411946 # √(1×1 + 1×1 )
  • > sd(r0-r1)
  • [1] 1.412932 # √(1×1 + 1×1 )
  • > sd(r0+r0) # (Adding correlated variables)
  • [1] 1.999614 # √(1×1 + 1×1 + 2×1)
  • > sd(r0-r0)
  • [1] 0 (0) # √(1×1 + 1×1 - 2×1)
slide-31
SLIDE 31

Probability Process and Distributions (Univariate)

  • Discrete Distribution

– Count data (colony count) – Food poisoning incidents

  • Continuous Distribution

– Concentration of chemicals

slide-32
SLIDE 32

Discrete distribution

  • Constant
  • Binominal
  • Poisson
  • Negative binominal
slide-33
SLIDE 33

Distribution and Random Process Binominal Distribution

  • Toss of coin – Heads or tails

– Number of heads

  • Throwing dice

– Probability of getting the number one

  • Bernoulli test
  • n trials
  • probability of p
slide-34
SLIDE 34

Binominal Distribution

  • Expectation (pn)
  • Variance(np(1-p) )
  • Standard deviation

1 2 3 4 5 6 7 8 9 10 0.00 0.10 0.20 0.30 1 2 3 4 5 6 7 8 9 10 0.00 0.10 0.20 0.30

Probability mass function N=10,p=0.1 Probability mass function N=10,p=0.3

n n

slide-35
SLIDE 35

Poisson distribution

  • Well dispersed solution

Binominal n: large p: small np : constant

slide-36
SLIDE 36

Poisson Distribution Probability mass functions

1 2 3 4 5 6 7 8 9 0.00 0.15 0.30

Poisson distribution λ=1 Binominal distribution n=10,p=0.1 Poisson distribution λ=3 Binominal distribution n=10,p=0.3

slide-37
SLIDE 37

Other discrete distributions

  • Geometric distribution

– The number of needed to get one success – Parameter p

  • Hyper-geometric distribution

– k success in n draws without replacement of N

  • Negative binominal distribution

– The number of successes before n failures

slide-38
SLIDE 38

Central Limit Theorem

  • Addition of Several Variables from any

distribution

  • gram of sample(1:6, 10000, replace

sample(1:6, 10000, replace = TRUE) Frequency 1 2 3 4 5 6 1000

:6, 10000, replace = TRUE) + sample(

, 10000, replace = TRUE) + sample(1:6, 10000 Frequency 5 10 15 20 25 600

slide-39
SLIDE 39

Central limit theorem an example

  • gram of sample(1:6, 10000, replace

sample(1:6, 10000, replace = TRUE) Frequency 1 2 3 4 5 6 1000

:6, 10000, replace = TRUE) + sample(

, 10000, replace = TRUE) + sample(1:6, 10000 Frequency 2 4 6 8 10 14 1000

TRUE) + sample(1:6, 10000, replace

RUE) + sample(1:6, 10000, replace = TRUE) + Frequency 5 10 15 800

:6, 10000, replace = TRUE) + sample(

, 10000, replace = TRUE) + sample(1:6, 10000 Frequency 5 10 15 20 25 600

slide-40
SLIDE 40

Continuous Distribution Normal (Gaussian ) Distribution distributional shape

  • 3
  • 2
  • 1

1 2 3 0.0 0.1 0.2 0.3 0.4 k dnorm(k, 0, 1)

  • Bell shape
  • Symmetrical
  • Converges to zero and +-infinity

Probability Density Distribution

slide-41
SLIDE 41

Normal Distribution cumulative density funtiction

  • 68% in ±1 σ
  • 97% in ±1 σ
  • 3
  • 2
  • 1

1 2 3 0.0 0.2 0.4 0.6 0.8 1.0 k pnorm(k, 0, 1)

slide-42
SLIDE 42

Log normal Distribution

Skewed shape Mean and mode different Heavy tail

2 4 6 8 10 0.0 0.2 0.4 0.6 k dlnorm(k, 0, 1)

slide-43
SLIDE 43

Exponential distribution and Gamma distribution

  • Exponential distribution

– Time until an event occurs which is expected to

  • ccur at the same rate
  • Gamma distribution

– Time until k events occur at the same rate which are expected to occur at the same rate

slide-44
SLIDE 44

Gamma Distribution

  • Shape parameter k
  • Scale parameter θ

Histogram of rgamma(1e+05, 1)

rgamma(1e+05, 1) Frequency 2 4 6 8 10 12 30000

Histogram of rgamma(1e+05, 2)

rgamma(1e+05, 2) Frequency 5 10 15 15000

γ

K=1,θ=1 K=2,θ=1

slide-45
SLIDE 45

Gamma Distribution

  • Mean = k・θ

Histogram of rgamma(1e+05, 3)

rgamma(1e+05, 3) Frequency 5 10 15 10000 25000

Histogram of rgamma(1e+05, 4)

rgamma(1e+05, 4) Frequency 5 10 15 10000

K=3,θ=1 K=4,θ=1

slide-46
SLIDE 46

Exponential Distribution

  • Exp(-λx)
  • Monotonically decreasing
  • Gamma distribution of Shape = 0

Histogram of rexp(1e+05)

rexp(1e+05) Frequency 2 4 6 8 10 12 14 40000

Histogram of rexp(1e+05, 3)

rexp(1e+05, 3) Frequency 1 2 3 30000

slide-47
SLIDE 47

Generating random number

  • R

– rnorm(n, mean, s.d.) – >hist(rnorm(1e5, 0, 1)) – >hist(rnorm(1e5, 0, 2))

  • Excel 2010

– Norm.dist()

Histogram of rnorm(1e+05, 0, 1)

rnorm(1e+05, 0, 1) Frequency

  • 4
  • 2

2 4 15000

Histogram of rnorm(1e+05, 0, 2)

rnorm(1e+05, 0, 2) Frequency

  • 5

5 15000

slide-48
SLIDE 48

Central Limit theorem exercise

  • >hist (runif(1e5))

Histogram of r0

r0 Frequency 0.0 0.2 0.4 0.6 0.8 1.0 3000

  • >hist (runif(1e5)+runif(1e5))

Histogram of runif(1e+05) + runif(1e+0

runif(1e+05) + runif(1e+05) Frequency 0.0 0.5 1.0 1.5 2.0 6000

slide-49
SLIDE 49

Central Limit theorem exercise

gram of runif(1e+05) + runif(1e+05) + ru

runif(1e+05) + runif(1e+05) + runif(1e+05) Frequency 0.0 1.0 2.0 3.0 10000

f runif(1e+05) + runif(1e+05) + runif(1e+0

runif(1e+05) + runif(1e+05) + runif(1e+05) + runif(1e Frequency 1 2 3 4 8000

slide-50
SLIDE 50

Chi-square distribution

  • Distribution of square of normal random number

Histogram of apply(r0, 1, var

apply(r0, 1, var) * 3 Frequency 5 10 20 600

Sample size = 10 Degree of freedom = 9 Sample size = 4 Degree of freedom = 3

slide-51
SLIDE 51

Exercise Plant Growth

  • >data(PlantGrowth)
  • >> PlantGrowth
  • weight group
  • 1 4.17 ctrl
  • 2 5.58 ctrl
  • 3 5.18 ctrl
slide-52
SLIDE 52

Exercise Plotting notched Box Plot

ctrl trt1 trt2 3.5 4.0 4.5 5.0 5.5 6.0

boxplot(weight ~ group, data = PlantGrowth, main = "PlantGrowth data", ylab = "Dried weight of plants", col = "lightgray", notch = TRUE, varwidth = TRUE)

slide-53
SLIDE 53

Level of Measurement

  • Ratio Data
  • Interval Data
  • Ordinal Data

– Can put rank

  • Categorical Data

– Binary Data

Quantitative Data Qualitative Data

slide-54
SLIDE 54

Descriptive Statistic Bivariate Data

  • Dependence index

– Correlation: Pearson’ chi-square, Kendall’ τ, Spearman’ ρ

  • Cross-tabulation

– Binary and binary – Binary and nominal – Nominal and nominal

  • Scatterplots

– Ordinal/Interval and ordinal/Interval

  • Quantile-Quantile plots

– Ordinal and ordinal

  • 4
  • 2

2 4

  • 4
  • 2

2 4 c(rr0) c(rr1)

slide-55
SLIDE 55

Statistical inference

  • Drawing conclusions from data based on

model/assumption

  • Data is independently identically distributed

– Random sampling from population – Randomized experiment

  • Set Model or Assumption
  • Estimate

– Parameter (mean, proportion, variance)

  • Interval

– Confidential, Tolerance, Prediction

  • Test of Hypothesis
slide-56
SLIDE 56

Types of statistical inference

  • Point Estimate

– Obtain single estimate

  • Estimate Interval

– Interval of possible values

  • Hypothesis testing

– Making decision from data

  • Check model assumption
slide-57
SLIDE 57

Point Estimation

  • Obtain best single value of a population

parameter from a subset

  • Unbiasness
  • minimum variance
  • Parametric Distribution

– Maximum Likelihood Estimator – Moment Estimator

slide-58
SLIDE 58

Unbiasness

  • True parameter: θ0
  • Estimate:θ
  • E[θ]= θ0
  • Estimator of standard deviation

– Variance calculated from 5 normal samples – Left: mean=0.8, right: mean=1.0

Histogram of aa

aa Frequency 1 2 3 4 5

Histogram of aa

aa Frequency 1 2 3 4 5 6

slide-59
SLIDE 59

Unbiased variance

Test result Deviance Deviance2 1 0.0394

  • 0.00103

1.06778E-06 2 0.0412 0.000767 5.87778E-07 3 0.0420 0.001567 2.45444E-06 4 0.0398

  • 0.00063

4.01111E-07 5 0.0407 0.000267 7.11111E-08 6 0.0395

  • 0.00093

8.71111E-07 Average 0.0404 Biased Estimate 9.08889E-07 Unbised Estimate =(sum of Deviance)/(6-1)

slide-60
SLIDE 60

Minimum variance

  • Normal distribution mean
  • 5 samples
  • True mean=0

gram of apply(matrix(rnorm(5e+05), ncol = 5

apply(matrix(rnorm(5e+05), ncol = 5), 1, mean) Frequency

  • 2
  • 1

1 2 5000 10000

ram of apply(matrix(rnorm(5e+05), ncol = 5),

apply(matrix(rnorm(5e+05), ncol = 5), 1, median) Frequency

  • 2
  • 1

1 2 5000 10000 15000

slide-61
SLIDE 61

Goodness-of-fit Test

  • Graphical method

– Quantile-Quantile plot

slide-62
SLIDE 62

Exercise plotting Q-Q plot

  • Fit to normal distribution
  • > qqnorm(rnorm(1e2))
  • > qqnorm(rlnorm(1e2))
  • 2
  • 1

1 2

  • 2

2

Normal Q-Q Plot

Theoretical Quantiles Sample Quantiles

  • 2
  • 1

1 2 2 4 6 8

Normal Q-Q Plot

Theoretical Quantiles Sample Quantiles

slide-63
SLIDE 63

Pearson’s Chi-square

  • Yate’s correction

Observed 10 2 7 9 Hypothesis 8 4 9 7 Diff 2

  • 2

2 2

slide-64
SLIDE 64

Other test of fit

  • Based on empirical distribution function

– Kolmogrov-smirnov test – Anderson-Darling test – Lilliefors test – Cramer-von Mises test

  • For normality

– Jarque-Bera

  • Based on skewness and kurtosis

– Shapiro-wilk test

  • Statistic based on variance and covariance of rank
slide-65
SLIDE 65

Interval Estimation Types of Interval

  • Confidential

– True parameter with probability of alpha – Nominal and actual coverage probability

  • Prediction

– Another sample falls within the prediction interval with the probability of alpha

  • Tolerance

– N percent of data falls within the interval with confidence level of alpha

slide-66
SLIDE 66

Confidence Intervals Example of 95%

slide-67
SLIDE 67

Table of T-values

d.f. t0.95 t0.975 t0.995 1 6.3 12.7 63.7 2 2.9 4.3 10 3 2.4 3.2 5.8 4 2.13 2.8 4.6 5 2.02 2.6 4.0 6 1.94 2.4 3.7 Z(∞) 1.645 1.960 2.326

slide-68
SLIDE 68

Statistical inference

  • Model, Assumption, Hypothesis-
  • Parametric

– Data generation process is parametricized

  • Non-parametric

– Data generation process is not parametericized

  • Asymptotical – commonly used

– Critical value based on table

  • Exact – computer intensive

– Critical values based on data

slide-69
SLIDE 69

Statistical inference and error

  • Type I error

– False Positive – α error – Rejecting a hypothesis that should have been accepted

  • Type Ⅱerror

– False negative – β error – Accepting a hypothesis that should have been rejected

slide-70
SLIDE 70

Statistical Test

  • Test for location
  • Test for dispersion
  • Test for outlier
  • P-value, error
  • Detection Power
  • Uniformly most powerful test
slide-71
SLIDE 71

Ratio data

  • Quantitative data
  • Unlike interval data, it has natural zero
  • Can do multiplication or division
  • Age, Length, etc.
slide-72
SLIDE 72

Interval Data

  • Quantitative data
  • Can add or subtract the data
  • Can not do multiplication or division
  • Ex. Temperature
slide-73
SLIDE 73

Z-test

  • Critical value does not depend on sample size
  • Standard deviation : known
  • Exercise
  • Proficiency testing
  • Target : 700μg/g
  • Standard deviation: 25μg/g
  • Test if one laboratory reports 640μg/g, they

significantly differs from target

slide-74
SLIDE 74

Test for normal interval data single set of samples

  • One sample t-test
  • What is tested

– Whether population mean differs from 0 – Standard deviation: unknown

  • C.f. z-test
  • Threfore, s.d. is estimated from data
  • Error included
  • Mean of data set of (150, 120, 180, 130)

significantly differs from 100.

slide-75
SLIDE 75

Test for interval data 2 levels

  • T-test (paired or unpaired)
  • Variance of two gourps

– Same Students test – Unsame Welch-Aspin test

slide-76
SLIDE 76

T-distribution

  • Distribution of sample mean divided by

sample variance

  • Normality assumed
  • Degree of freedom
  • probability
  • If standard deviation is known or the degree
  • f freedom is infinity. It is z test.
slide-77
SLIDE 77

T-distribution and normal distribution

  • Green = degree of freedom (d.f.) 2
  • Blue = degree of freedom (d.f.) 10
  • Red = degree of freedom (d.f.) +infinity
  • 10
  • 5

5 10 0.0 0.1 0.2 0.3 0.4 seq(-10, 10, by = 0.01) dnorm(seq(-10, 10, by = 0.01))

slide-78
SLIDE 78

T-test assumptions

  • Each of two data set follow a normal

distribution

– especially when sample size is small

  • Each of two data set are sampled

independently

  • There are few cases where those assumptions

are strictly met, care is needed for strict discussions.

slide-79
SLIDE 79

Robustness of inference

  • How violation to assumptions affects the test
  • Outliers

– Against outlier

  • Distribution

– Mixture distribution of different s.d.

  • T-test is somehow robust to some violations
  • Some discuss to apply tests to check if those assumption

holds in the data, but there are other discussions.

slide-80
SLIDE 80

Case of unequal variance, equal sample size n=10, σ1/σ2=4

  • Student

Histogram of p

p Frequency 0.0 0.2 0.4 0.6 0.8 1.0 1000 3000 5000

  • Aspin-Welch

Histogram of p

p Frequency 0.0 0.2 0.4 0.6 0.8 1.0 2000 4000 6000

slide-81
SLIDE 81

T test exercise Sample Data

  • > ToothGrowth
  • len supp dose
  • 1 4.2 VC 0.5
  • 2 11.5 VC 0.5
  • 3 7.3 VC 0.5
  • 4 5.8 VC 0.5
  • 5 6.4 VC 0.5
  • 6 10.0 VC 0.5
  • 7 11.2 VC 0.5
slide-82
SLIDE 82

T test exercise 2 Sample Data

  • d0 <- ToothGrowth$len[1:10]

– VitaminC dose 0.5mg

  • d1<- ToothGrowth$len[11:20]

– VitaminC dose 1.0mg

slide-83
SLIDE 83

T-test using R

  • exercise 3-
  • t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0,

paired = FALSE, var.equal = FALSE, conf.level = 0.95, ...)

  • >t.test(d0, d1)
  • Welch Two Sample t-test
  • data: d0 and d1
  • t = -7.4634, df = 17.862, p-value = 6.811e-07
  • alternative hypothesis: true difference in means is not equal to 0
  • 95 percent confidence interval:
  • 11.265712 -6.314288
  • sample estimates:
  • mean of x mean of y
  • 7.98 16.77
slide-84
SLIDE 84

T-test exercise using random number

  • alpha error-
  • > t.test(rnorm(1e1),rnorm(1e1))
  • t = -0.9106, df = 17.085, p-value = 0.3752
  • t = 0.7685, df = 17.982, p-value = 0.4522
  • t = 0.8858, df = 12.341, p-value = 0.3927
  • t = -1.0532, df = 17.886, p-value = 0.3063
  • t = 0.0694, df = 17.496, p-value = 0.9455
  • t = -0.0784, df = 13.86, p-value = 0.9386
  • t = -0.606, df = 17.528, p-value = 0.5523
slide-85
SLIDE 85

Summary

Actual Nominal P-value>=0.5 50% P-value<0.5 50%

slide-86
SLIDE 86

Test for Proportions

  • > prop.test(10,20,p=0.5)
  • 1-sample proportions test without continuity correction
  • data: 10 out of 20, null probability 0.5
  • X-squared = 0, df = 1, p-value = 1
  • alternative hypothesis: true p is not equal to 0.5
  • 95 percent confidence interval:
  • 0.299298 0.700702
  • sample estimates:
  • p
  • 0.5
slide-87
SLIDE 87

One-sided(tailed) and two- sided(tailed) test

  • One sided

– Null hypothesis μ=μ0 – Alternative hypothesis μ>μ0

  • Two sided

– Null hypothesis μ=μ0 – Alternative hypothesis μ≠μ0

  • One-sided p-value = ½ two-sided p-value
slide-88
SLIDE 88

Balanced v.s. Unbalanced

  • Balanced

– Equal sample or experiment assigned to each treatment

  • Unbalanced

– Unequal sample or experiment assigned to each treatment

slide-89
SLIDE 89

Ordinal Data

  • Several levels with order
  • Excellent – good – fair
  • Ranks in the race
  • Interval data is an ordinal data
  • But ordinal data is not always interval data
slide-90
SLIDE 90

What can be done with Ordinal data

  • Wilcoxon rank sum test

– Mann-Whitney’s U test – Unpaired t-test – interval data

  • Wilcoxon signed rank test

– Paired t-test – interval data

slide-91
SLIDE 91

Wilcoxon test

  • exercise-
  • > wilcox.test(d0,d1)
  • Wilcoxon rank sum test with continuity correction
  • data: d0 and d1
  • W = 0, p-value = 0.0001796
  • alternative hypothesis: true location shift is not equal to 0
  • Warning message:
  • In wilcox.test.default(d0, d1) : cannot compute exact p-

value with ties

slide-92
SLIDE 92

Wilcoxon test

  • unequal variance-

Histogram of p

p Frequency 0.0 0.2 0.4 0.6 0.8 1.0 2000 6000

Histogram of p

p Frequency 0.0 0.2 0.4 0.6 0.8 1.0 2000 6000

Equal variance P(p<0.05)=0.044 Unequal variance P(p<0.05)=0.12

slide-93
SLIDE 93

Comparison of t-test and wilcoxin test

  • Detection power – 98%
  • Extremely low sample size and high difference

– t test

slide-94
SLIDE 94

Detection power of t-test

  • S.d. =1
  • μ1-μ2 = 1, 0.5, 0.1
slide-95
SLIDE 95

Detection power of wilcox test norm, diff=1

20 40 60 80 100 0.0 0.4 0.8 mvec powvec

  • S.d. =1
  • μ1-μ2 = 1
slide-96
SLIDE 96

T-test Detection power

  • > power.t.test(n=10,delta=1, sd=NULL, sig.level =0.05, power=.5)
  • Two-sample t test power calculation
  • n = 10
  • delta = 1
  • sd = 1.079782
  • sig.level = 0.05
  • power = 0.5
  • alternative = two.sided
  • NOTE: n is number in *each* group
slide-97
SLIDE 97

Calculating sample size - t-test

  • power.t.test(delta=1, sd=1, sig.level =0.05, power=.95)
  • Two-sample t test power calculation
  • n = 26.98922
  • delta = 1
  • sd = 1
  • sig.level = 0.05
  • power = 0.95
  • alternative = two.sided
  • NOTE: n is number in *each* group
slide-98
SLIDE 98

t.test detection power (β-error)

  • exercise-
  • >t.test(rnorm(1e1,sd=1.079782),rnorm(1e1,sd

=1.079082))

  • t = -1.0004, df = 10.752, p-value = 0.3391
  • t = 0.1531, df = 17.229, p-value = 0.8801

P>0.05 P<=0.05

slide-99
SLIDE 99

What is Nominal Data

  • Categorically discrete
  • Order of category is arbitrary
  • Red, blue, green
  • Origin(Region)
slide-100
SLIDE 100

Analysis of Nominal Data

  • Contigency table
  • Binominal test
  • Chisquare-test
  • Fisher’s exact test
  • McNemer test
slide-101
SLIDE 101

Correlation

  • analysis of bivariate data-
  • r=0(left), 1.0(middle), 0.7(right)
  • 4
  • 2

2 4

  • 4
  • 2

2 4 rr0 rr1

  • 4
  • 2

2 4

  • 4
  • 2

2 4 rr0 rr0

  • 4
  • 2

2 4

  • 4
  • 2

2 4 (rr0 + rr1)/2 rr0

slide-102
SLIDE 102

Exercise Formaldehyde

  • carb optden
  • 1 0.1 0.086
  • 2 0.3 0.269
  • 3 0.5 0.446
  • 4 0.6 0.538
  • 5 0.7 0.626
  • 6 0.9 0.782
slide-103
SLIDE 103

Covariance

Data A 0.1 0.3 0.5 0.6 0.7 0.9 mean Devianc nce A

  • 0.41667

67 -0.2166 667 -0.0166 667 0.08333 333 0.18333 333 0.38333 333 0.51666 667 Data B 0.086 0.269 0.446 0.538 0.626 0.782 Devianc nce B

  • 0.37183

83 -0.1888 883 -0.0118 183 0.08016 167 0.16816 167 0.32416 167 0.45783 833 D.

  • D. A2

A2 0.17361 61 1 0.04694 944 0.00027 278 0.00694 944 0.03361 611 0.14694 944 0.06805 056

  • D. B2

0.13826 26 0.03565 658 0.00014 14 0.00642 427 0.02828 28 0.10508 084 0.05230 308 D.A X X D.B 0.15493 93 1 0.04091 914 0.00019 197 0.00668 681 0.03083 831 0.12426 264 0.05963 636

Correlation=covariance/( (s.d.ofA)×(s.d.ofB) )

slide-104
SLIDE 104

Exercise Formaldehyde

> plot(Formaldehyde$carb, Formaldehyde$optden) > cor(Formaldehyde$carb, Formaldehyde$optden)

0.2 0.4 0.6 0.8 0.1 0.3 0.5 0.7 Formaldehyde$carb Formaldehyde$optden

slide-105
SLIDE 105

Correlation - exercise

> rr0=rnorm(1e2,0,1)

  • rr1=rnorm(1e2,0,1)
  • > plot(rr0,rr1,xlim=c(-4,4),ylim=c(-4,4))
  • > plot(rr0,rr0,xlim=c(-4,4),ylim=c(-4,4))
  • > plot((rr0+rr1)/2,rr0,xlim=c(-4,4),ylim=c(-4,4))
slide-106
SLIDE 106

Correlation test

  • > cor.test(rr0, rr1)
  • Pearson's product-moment correlation
  • data: rr0 and rr1
  • t = 0.2471, df = 98, p-value = 0.8054
  • alternative hypothesis: true correlation is not equal to 0
  • 95 percent confidence interval:
  • -0.1723101 0.2202910
  • sample estimates:
  • cor
  • 0.02495251
  • 4
  • 2

2 4

  • 4
  • 2

2 4 c(rr0) c(rr1)

slide-107
SLIDE 107

Regression and correlation Goodness-of-fit

  • R

– Pearson’s correlation coefficient

  • R2

– Coefficient of determination – proportion of variance in Y explained by a linear function of X

  • 1-R2

– Fraction of variance unexplained

slide-108
SLIDE 108

Correlation and outlier exercise

  • > cor.test(c(rr0,5), c(rr1,5))
  • Pearson's product-moment correlation
  • data: c(rr0, 5) and c(rr1, 5)
  • t = 2.0633, df = 99, p-value = 0.0417
  • alternative hypothesis: true correlation is not equal to 0
  • 95 percent confidence interval:
  • 0.007928624 0.383282118
  • sample estimates:
  • cor
  • 0.2030532
  • 4
  • 2

2 4 6

  • 4
  • 2

2 4 6 c(rr0, 5) c(rr1, 5)

slide-109
SLIDE 109

Regression analysis

  • ordinal data to ordinal data-
  • Spearman or Kendall correlation

– Spearman : convert data to ranks and calculate Pearson’s correlation coefficient – Kendall : concordant pair and discordant pairs

  • They can also be used for interval data
slide-110
SLIDE 110

Rank correlation test

  • exercise-
  • > cor.test(c(rr0,5), c(rr1,-5), method="spearman")
  • Spearman's rank correlation rho
  • data: c(rr0, 5) and c(rr1, -5)
  • S = 164262, p-value = 0.6666
  • alternative hypothesis: true rho is not equal to 0
  • sample estimates:
  • rho
  • 0.04331974
  • > cor.test(c(rr0,5), c(rr1,-5), method=“kendall")
slide-111
SLIDE 111

Regression Analysis

  • Relation between 2 independent variables

– Y ~ a + bX

  • Scatter plot
  • Uni-varialte and Multi-variate
  • Linear Regression
  • Non-linear Regression
  • R^2 and Prediction Interval
slide-112
SLIDE 112

Y ~ a + bX

  • Y

– Dependent variable – Regressand – Endogenous variable – Response variables – Measured variables

  • X

– Independent variable – Regressors – Exogeneous variables – Explanetory variables – Covariate – Predictor variables

20 30 40 50 60 70 80 90 20 40 60 80 x y

slide-113
SLIDE 113

Regression Predicted and mean response

  • Standard deviation (S.D.)
  • Standard error of mean (S.E.M.)

20 30 40 50 60 70 80 90 20 40 60 80 x y

Outer:Predicted Inner:Mean

slide-114
SLIDE 114

Regression

  • assumption-
  • Error is mean zero and its variance is consistent

across observations

  • If not, try transforming variable

– exp(X) – log(X) – Sqrt(x)

  • Weighted least square
slide-115
SLIDE 115

Regression – Robustness against

  • utliers
  • Ridge/Lasso regression
  • Least absolute deviation regression
slide-116
SLIDE 116

Regression by polynomials relation is not straight line

  • 2
  • 1

1 2

  • 6
  • 5
  • 4
  • 3
  • 2
  • 1

rr

  • rr * rr
slide-117
SLIDE 117

Regression diagnosis

  • Residual
  • Standarized residual

– Residual divided by error

  • Studentized residual

– Remove the data and do regression analysis – Residual divided by error

  • Durvin-watson statsitics
  • Graphical
slide-118
SLIDE 118

Regression analysis fitting binary data to continuous function

  • Logistic Regression

– Logistic function – Logit p = log (p/(1-p)) = a + bx

  • Probit regression

– Inverse of cumulative density function of normal distribution

Level 1

2 3 4 5 6 7 8 9

Positive 0

1 1 1 1 1

slide-119
SLIDE 119

Logistic regression

  • Logit p = log (p/(1-p)) = a + bx

Level 1

2 3 4 5 6 7 8 9

Positive 0

1 1 1 1 1

2 4 6 8

  • 0.5

0.0 0.5 1.0 1.5 res$fit

slide-120
SLIDE 120

Other data

  • Poisson regression

– Count data

  • Multinominal logistic/Probit regression
  • Ordered logistic/Probit regression
slide-121
SLIDE 121

Design of Experiment Principle by R.A. Fisher

  • Comparison
  • Randomization
  • Replication
  • Blocking
  • Orthgonality
  • Factorial experiment
slide-122
SLIDE 122

Completely Randomised Design

Spray A Spray B Spray C 10 11 7 17 1 20 21 7 14 11 2 14 16 3

  • Data(InsectSprays)
slide-123
SLIDE 123

Analysis of several sets of data Analysis of Variance

  • Compare variances between group means and

variance in a group

  • Extension of two sample test to multi samples
  • One-way ANOVA
  • Two-way ANOVA
  • Repeated measure ANOVA

– Measurement on same block or subject

  • 2 or more levels
slide-124
SLIDE 124

One-way ANOVA

  • 1 interval and 1 nominal data
  • Group 1 1.1 1.3 1.4
  • Group 2 1.3 1.7 1.5
  • Group 3 1.2 1.1 1.0
slide-125
SLIDE 125

T-test and ANOVA

  • ne way ANOVA and unpaired t
  • Compare the results
  • Generate 10 normal random number
  • >r0 <- rnorm(10, 0, 1)

– -0.73 0.61 1.03 1.81 -0.05 -1.21 1.04 0.65 -0.99 0.18

  • >r1 <- rnorm(10, 1, 1)

– 1.94 0.82 0.23 2.07 -1.04 0.18 -1.80 1.35 -0.06 -0.61

slide-126
SLIDE 126

Output for (Student’s) T-test

  • test(r1, r0, var.equal=TRUE)
  • Two Sample t-test
  • data: r1 and r0
  • t = 2.827, df = 18, p-value = 0.01117
  • alternative hypothesis: true difference in means is not equal to 0
  • 95 percent confidence interval:
  • 0.2579993 1.7510822
  • sample estimates:
  • mean of x mean of y
  • 1.2375198 0.2329791
slide-127
SLIDE 127

Output for anova

  • > group <- c(rep(0,10), rep(1,10))
  • > rs <- c(r0,r1)
  • > aov(rs ~ group)
slide-128
SLIDE 128

ANOVA result

  • Df Sum Sq Mean Sq F value p-value
  • group 1 5.046 5.046 7.992 0.0112
  • Residuals 18 11.364 0.631

s11 s12 s13 s14 s15 s16 s17 s18 S19 s110 m1 m1 m1 m1 m1 m1 m1 m1 m1 m1 s11- m1 s12- m1 s13- m1 s14- m1 s15- m1 s16

  • m1

s17- m1 s18- m1 s19- m1 s110- m1 S21 S22 s23 s24 s25 s26 s27 s28 s29 s210 M2 M2 m2 m2 m2 m2 m2 m2 m2 m2 s21- m2 s22- m2 s23- m2 s24- m2 s25- m2 s26- m2 s27- m2 s28

  • m2

s29- m2 s210

  • m2

m1 m2 m12 m12 m1-m12 m2-m12

slide-129
SLIDE 129

F-value

  • Distribution of Ratio sample variance of

dataset A and sample variance of dataset B

slide-130
SLIDE 130

F-table

df2 df1 1 2 5 20

1 40 50 57 62 2 8.5 9.0 9.3 9.4 5 4.1 4.3 4.1 3.2 20 3.0 2.6 2.2 1.8

df2 df1 1 2 5 20

1 161 200 230 248 2 19 19 19 19 5 6.6 5.8 5.1 4.6 20 4.4 3.5 2.7 2.1 alpha=0.1 alpha=0.05

slide-131
SLIDE 131

ANOVA with unequal variances

  • > oneway.test(rs ~ group)
  • One-way analysis of means (not assuming

equal variances)

  • data: rs and group
  • F = 7.9918, num df = 1.000, denom df =

13.923, p-value = 0.01351

slide-132
SLIDE 132

t.test by Aspin-Welch

  • > t.test(r1, r0)
  • Welch Two Sample t-test
  • data: r1 and r0
  • t = 2.827, df = 13.923, p-value = 0.01351 # slightly larger p-

value

  • alternative hypothesis: true difference in means is not equal

to 0

  • 95 percent confidence interval:
  • 0.2420163 1.7670653
  • sample estimates:
  • mean of x mean of y
  • 1.2375198 0.2329791
slide-133
SLIDE 133

Exercise Plant Growth

  • >data(PlantGrowth)
  • >> PlantGrowth
  • weight group
  • 1 4.17 ctrl
  • 2 5.58 ctrl
  • 3 5.18 ctrl
slide-134
SLIDE 134

Exercise ANOVA

  • Anova(weight ~ group, data = PlantGrowth))
  • Analysis of Variance Table
  • Response: weight
  • Df Sum Sq Mean Sq F value Pr(>F)
  • group 2 3.7663 1.8832 4.8461 0.01591 *
  • Residuals 27 10.4921 0.3886
  • ---
  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’

0.1 ‘

slide-135
SLIDE 135

Multiple Comparison

  • Tukey’s range test

– Also as Tukey’s honest significance test – Compares all possible pair of means

  • Sceffe’s test
slide-136
SLIDE 136

Exercise Tukey HSD

  • > TukeyHSD(fit)
  • Tukey multiple comparisons of means
  • 95% family-wise confidence level
  • Fit: aov(formula = weight ~ group, data = PlantGrowth)
  • $group
  • diff lwr upr p adj
  • trt1-ctrl -0.371 -1.0622161 0.3202161 0.3908711
  • trt2-ctrl 0.494 -0.1972161 1.1852161 0.1979960
  • trt2-trt1 0.865 0.1737839 1.5562161 0.0120064
  • In fact comparison to control Dunnet’s is better but have to install

multicomp package (requires internet connection)

slide-137
SLIDE 137

Analysis of variance

  • assumptions-
  • Normality of data

– Deviance is normally distributed

  • Homoscedastic

– variance of each group are equal

  • All data are independent
slide-138
SLIDE 138

Anova for ordinal data

  • One-way

– Kruskal-Wallis – Van der Waerden test

  • One-way repeated measure

– Friedman test

slide-139
SLIDE 139

Exercise Kruskal-wallis

  • >kruskal.test(weight~group,data=PlantGrowth

)

  • Kruskal-Wallis rank sum test
  • data: weight by group
  • Kruskal-Wallis chi-squared = 7.9882, df = 2, p-

value = 0.01842

slide-140
SLIDE 140

Comparison of wilcoxon and Kruskal-Wallis tests

  • >wilcox.test(weight~group,data=PlantGrowth,subset=g

roup!='trt1',exact=FALSE, correct=FALSE)

  • W = 25, p-value = 0.05878
  • >kruskal.test(weight~group,data=PlantGrowth,subset=

group!='trt1')

  • Kruskal-Wallis chi-squared = 3.5714, df = 1, p-value =

0.05878

slide-141
SLIDE 141

One way repeated measures Age 1 Age 2 Age 3 Age 4 Tree 1 30 58 87 115 Tree 2 33 69 111 156 Tree 3 30 51 75 108

  • Data(Orange)
slide-142
SLIDE 142

Subset data

  • Compare circumference at age 118 and age

484

  • > Orange.1<-subset(Orange, age==118)
  • > Orange.2<-subset(Orange, age==484)
slide-143
SLIDE 143

Paired t-test

  • >t.test(Orange.1$circumference,Orange.2$circumference,paired

=TRUE)

  • Paired t-test
  • data: Orange.1$circumference and Orange.2$circumference
  • t = -8.6768, df = 4, p-value = 0.000971
  • alternative hypothesis: true difference in means is not equal to 0
  • 95 percent confidence interval:
  • -35.37558 -18.22442
  • sample estimates:
  • mean of the differences
  • -26.8
slide-144
SLIDE 144

ANOVA

  • Repeated measures-
  • > summary(aov(circumference ~ age + Error(Tree/age),

data=rbind(Orange.1,Orange.2)))

  • Error: Tree
  • Df Sum Sq Mean Sq F value Pr(>F)
  • Residuals 4 179.4 44.85
  • Error: Tree:age
  • Df Sum Sq Mean Sq F value Pr(>F)
  • age 1 1795.6 1795.6 75.29 0.000971 ***
  • Residuals 4 95.4 23.8
  • ---
  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
slide-145
SLIDE 145

Wilcox signed rank

  • >wilcox.test(Orange.1$circumference,Orange.2$c

ircumference,paired=TRUE)

  • Wilcoxon signed rank test
  • data: Orange.1$circumference and

Orange.2$circumference

  • V = 0, p-value = 0.0625
  • alternative hypothesis: true location shift is not

equal to 0

slide-146
SLIDE 146

Friedman test

  • friedman.test(circumference ~ age | Tree,

data=rbind(Orange.1,Orange.2))

  • Friedman rank sum test
  • data: circumference and age and Tree
  • Friedman chi-squared = 5, df = 1, p-value =

0.02535

slide-147
SLIDE 147

Factorial Design

  • two-way anova-

Low Dose Middle Dose High Dose

Vitamine C

4.2/11.5 16.5/16.5 23.6/18.5

OrangeJuice 15.2/21.5

19.7/23.3 25.5/26.4

> ToothGrowth

slide-148
SLIDE 148

ANOVA on toothgrowth (2-way)

  • > summary(aov(len ~ supp*dose,

data=ToothGrowth))

  • Df Sum Sq Mean Sq F value Pr(>F)
  • supp 1 205.3 205.3 12.317 0.000894 ***
  • dose 1 2224.3 2224.3 133.415 < 2e-16 ***
  • supp:dose 1 88.9 88.9 5.333 0.024631 *
  • Residuals 56 933.6 16.7
  • ---
  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1
slide-149
SLIDE 149

Test for heteroscedasticity

  • F-test
  • Cochran’s C
  • Bartlett’s test
  • Levene’s test
  • Brown-Forsythe test
  • Goldfeld-Quandt test
  • Breuseh-Pagan test

Histogram of rnorm(10000, 0, 2)

rnorm(10000, 0, 2) Frequency

  • 5

5 500 1000 1500

Histogram of rnorm(10000, 0, 1)

rnorm(10000, 0, 1) Frequency

  • 4
  • 2

2 4 500 1000 1500 2000

slide-150
SLIDE 150

Cochran’ C

  • Data 1=(1,2,1,3,1,2)
  • Data 2=(1,3,5,2,5,2)
  • Data 3=(1,4,1,5,1,2)
  • Detects high within-group variation
  • H0: all variance equal
  • H1: At least one variance not equal or outlier

included

slide-151
SLIDE 151

Cochran’s C

  • >library(outlier)
  • > cochran.test(rs ~ group)
  • Cochran test for outlying variance
  • data: rs ~ group
  • C = 0.7706, df = 10, k = 2, p-value = 0.0856
  • alternative hypothesis: Group 0 has outlying variance
  • sample estimates:
  • 0 1
  • 0.9729731 0.2896914
slide-152
SLIDE 152

Extension of ANOVA

  • ANCOVA

– Analysis of Covariance

  • MANOVA

– Multivariate ANOVA

slide-153
SLIDE 153

Mechanism of missing data

  • Missing Completely at Random
  • Missing at Random
  • Impossible to Observe
  • Censored data (Left censored, Right censored)
slide-154
SLIDE 154

Statistical packages

  • Open source

– Free redistribution, no royalty – Source code open

  • Freeware

– Free distribution, no royalty

  • Proprietary

– Requires royalty

slide-155
SLIDE 155

excel2010

  • One way ANOVA
  • Two-way ANOVA
  • Two-way with replication
  • Correlation
  • Covariance
  • Basic Statistics
slide-156
SLIDE 156

Excel - 2

  • Generating random number
  • Rank and percentile
  • regression
  • sampling
  • Paried t test
  • t test (same variance)
  • T test (unsame variance)
  • Z test
slide-157
SLIDE 157

Excel - 3

  • Summary statistics
  • F test
  • Fourier analysis
  • Histogram
  • Moving average
slide-158
SLIDE 158

Paired t

slide-159
SLIDE 159

Two way ANOVA

slide-160
SLIDE 160

Exporting Data from Excel

  • Prepare

data

  • 1st line =

tiltle

  • 2nd-later

= data

slide-161
SLIDE 161

Exporting Data from Excel

  • Save the

data in a csv forman

slide-162
SLIDE 162

Exporting data from R to Excel

  • >write.csv(data, “filename”)
slide-163
SLIDE 163

Using real data Read the csv file

  • >sp = read.csv(“filename”)
  • To see the filename,
  • Drop the file to console window
  • >load("C:\\Users\\yoshiEPSON\\Documents\\ICC

AGSA2013.docx")

  • Change “load” to “read.csv”
  • >read.csv("C:\\Users\\yoshiEPSON\\Documents\\

ICCAGSA2013.docx“)