SLIDE 1 1
Determining Sample Size for Linear Models in Excel EERA Round Table Presentation 19 February 2016 Bryan W. Griffin Georgia Southern University bwgriffin@georgiasouthern.edu Sample Size in Excel file download location http://www.bwgriffin.com/samplesize Content
- 1. Purpose
- 2. Framework
- 3. Sample Size Worksheet Explanation
- a. Download Source
- b. Worksheet Content
- c. Excel Security Warning—Macros
- 4. Illustrated Examples of Sample Size Determination
- a. Overview of Use
- b. Independent Samples t-test
- c. Pearson Correlation Coefficient
- d. One-way ANOVA
- e. One-way ANCOVA
- f. Regression
- g. Regression with an Incremental Test of a Predictor Set
- 5. Illustrated Examples of Power Determination
- a. Overview of Use
- b. Independent Samples t-test
- c. Pearson Correlation Coefficient
- d. One-way ANOVA
- e. One-way ANCOVA
- f. Regression
- g. Regression with an Incremental Test of a Predictor Set
- 1. Purpose
The purpose of this presentation is to provide an Excel workbook developed to calculate sample sizes for linear models given alpha, power, effect size, and model degrees of freedom. This is the same format for sample size determination used by Cohen (1988) in his popular book Statistical Power Analysis for the Behavior Sciences. The workbook can also be used to calculate power given alpha, effect size, sample size, and model degrees of freedom. While other software exists for computing both sample size and power, this Excel workbook has some instructional, and research, benefit in that it is very fast, very easy to use, has a familiar interface with Excel, and shows both inputs and results on the same screen so students can see results without scrolling or changing screens. As an instructional tool in Excel, having both inputs and outputs on the same screen streamlines the process and brings focus not on how to use the software, but instead on how manipulating alpha, power, effect size, and model degrees of freedom impact sample size.
SLIDE 2 2
In short, this freely available Excel workbook may prove useful to instructors, students, and researchers who encounter situations in which sample size must be determined.
I regularly teach courses in which students are required to learn how to determine appropriate sample sizes for various designs and statistical models. The importance of sample size considerations has been recognized among academics and professionals engaged in empirical research (e.g., Freiman, Chalmers, Smith, & Kuebler, 1978; Hoenig & Heisey, 2001; Moher, Dulberg, & Wells, 1994; Wilkinson, 1999). While texts for sample size determination have existed for some time (Cohen, 1988; Kraemer & Theimann, 1987; Murphy & Myors, 2003), the widespread use
- f computers can and has facilitated power and sample size determination.
Over the past 10 to 15 years software for power and sample size calculations has appeared. For example, there are several commercial products designed for sample size estimation such as SamplePower (http://www- 03.ibm.com/software/products/en/spss-samplepower), PASS (www.ncss.com), and nQuery Advisor (http://www.statsols.com). There are also freely available to use on-line applets written in JAVA (www.java.com) to help researchers find sample sizes (e.g., www.statpages.org/#Power; www.cs.uiowa.edu/~rlenth/Power; http://danielsoper.com/statcalc3/default.aspx ), and also free to use software that can be installed on one’s computer (e.g., G*Power at http://www.gpower.hhu.de ; PS at http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize). I found, however, that most of my students have access to and understand Microsoft Excel (https://products.office.com/en-us/excel), so to allow them work with Excel—software with which they are already familiar—I created a workbook that computes both sample size and power given specific input (i.e., alpha, effect size, model degrees of freedom). These inputs are used with a noncentral F distribution (Patnaik, 1949) to calculate
- n. Excel does not offer an in-built noncentral F distribution function, so that distribution was added via Visual Basic
code. In addition to the sample size calculation using the noncentral F distribution, I also added sample size estimates, and power estimates, obtained from four noncentral F distribution approximations: Laubscher’s cube root (Laubscher, 1960; Severo & Zelen, 1960), Laubscher’s square root (Laubscher, 1960), Tiku’s 3-moment (Tiku, 1965), and Patnaik’s 2-moment (Patnaik, 1949). I included these four for historical interest; until fast computers became widely available these approximations were used for power and sample size estimates because of their relative ease in calculation compared with the noncentral F.
- 3. Sample Size Workbook Explanation
(a) Download Source The Excel file can be downloaded from this page: http://www.bwgriffin.com/samplesize
SLIDE 3
3
(b) Workbook Content The workbook contains three worksheets (see tabbed labels at bottom left of Excel window). The first is labeled “Sample Size” and should be the worksheet to appear when the workbook is opened. This sheet is capable of providing sample size calculations for most linear models with independent data. These include, for example, independent samples t-tests, Pearson’s product moment correlation, linear regression models, ANOVA and ANCOVA models. The second worksheet is “ES Conversion” and provides a tool to convert effect sizes among different metrics (d, r,r2, f, f2, and η2). The third worksheet is entitled “Power” and is designed to calculate power for known alpha level, degrees of freedom, and n. (c) Excel Security Warning—Macros Once downloaded, open the workbook normally in Excel. The workbook was developed in Microsoft Excel version 2003, but also functions in more recent versions of Excel (e.g., 2010, 2016). When opening the workbook, the user is likely to experience a security warning that macros exist. The workbook executes macros so those must be enabled. The macros perform the necessary noncentral F calculations, sample size calculations, and also control user input. In Excel 2003 a pop-up window may appear that allows users to enable macros. In more recent versions of Excel, a security warning may appear directly below the ribbon. On the “Security Warning” ribbon, select Options and then select “Enable this content.”
SLIDE 4 4
- 4. Illustrated Examples of Sample Size Determination
(a) Overview of Use There are five steps to finding sample size:
- 1. Enter alpha error rate (often .05 or .01) – probability of a Type 1 error
- 2. Enter power level (often .80, .90, or .95) – probability of detecting an effect in the sample
- 3. Enter model degrees of freedom (df1; e.g., df = 1 for two group t-test or df1 = 2 for ANOVA with 3 groups)
- 4. Enter the effect size of interest in the appropriate box (d, r, r2, f, f2)
- 5. Click on the “Find n” button under the appropriate effect size box
The image below shows each step as indicated by an arrow. A note on model degrees of freedom. The value sought here is the number of parameter estimates required from a linear model perspective. So estimating a two group t-test via a liner model would require one degree of freedom to estimate the slope or mean difference. For a Pearson correlation, it is similarly one degree of free to estimate the slope in a regression model. If one wishes to compare three groups, the model df would be 2 (two dummy variables required, or df1 = 2 in ANOVA).
SLIDE 5
5
(b) Independent samples t-test Research Question: Is there a difference in motivation to read between females and males in 6th grade? Type 1 Error Rate: α = .05 Desired Power Level: Power = .90 Anticipated Effect Size: d = .35 Model Degrees of Freedom df = 1 for a two-group t-test Resulting total sample size is n = 346 or 346 / 2 = 173 per group. See the image below for the inputs of this example.
SLIDE 6
6
(c) Pearson Correlation Coefficient Research Question: Do mathematics test anxiety and mathematics self-efficacy correlate? Type 1 Error Rate: α = .01 Desired Power Level: Power = .95 Anticipated Effect Size: r = -.45 Model Degrees of Freedom df = 1 for a Pearson correlation Resulting total sample size is n = 74. See the image below for the inputs of this example. Note – n differs from that provided by Cohen (1988); Cohen’s tables of power and n for Pearson r slightly overstate the sample size needed for a given power level (often a difference in sample size of 1 to 3).
SLIDE 7
7
(d) One-way ANOVA Research Question: Are there differences in job satisfaction levels among administrators, faculty, and staff? Type 1 Error Rate: α = .05 Desired Power Level: Power = .80 Anticipated Effect Size: f = .41 Model Degrees of Freedom df = 2 for ANOVA with 3 groups Resulting total sample size is n = 61, so about 61 / 3 = 20 to 21 per group. See the image below for the inputs of this example.
SLIDE 8
8
(e) One-way ANCOVA Research Question: Are there differences in job satisfaction among administrators, faculty, and staff once salary is taken into account? Type 1 Error Rate: α = .05 Desired Power Level: Power = .80 Anticipated Effect Size: (a) Original f = .41 among groups without salary covariate included in model. (b) Covariate effect size: meta-analysis revealed r = .15 between salary and job satisfaction (c) The adjusted f taking into account the possible covariate contribution to model is Adjusted f =
𝑔 √(1−𝑠2) = .41 √(1−.152) = .41 √(1−.0225) = .41 √.9775 = .41 .9886 = .4147
Model Degrees of Freedom df = 2 for ANOVA with 3 groups (essentially ignore the covariate df here) Resulting total sample size is n = 60, so 20 per group. The covariate provides little benefit to reducing sample size in this example due to weak correlation between outcome and covariate. See the image below for the inputs of this example.
SLIDE 9
9
(f) Regression Research Question: Do mathematics test anxiety and mathematics self-efficacy predict mathematics test scores? Type 1 Error Rate: α = .01 Desired Power Level: Power = .90 Anticipated Effect Size: Prior research shows both predictors produce a model R2 of about .40. One may enter the R2 value directly into effect size box labeled r2, or compute the effect size measure f2 as outline by Cohen (1988). Use of either effect size will produce the same sample size. R2 = r2 = .40 Model Degrees of Freedom df = 2 since there are two predictors in the regression equation Resulting total sample size is n = 32. See the image below for the inputs of this example.
SLIDE 10 10
(g) Regression with an Incremental Test of a Predictor Set Research Question: Do mathematics test anxiety and mathematics self-efficacy predict mathematics test scores once background characteristics (hours studied, prior mathematics achievement) are controlled? Type 1 Error Rate: α = .01 Desired Power Level: Power = .90 Anticipated Effect Size: Assume that background characteristics produce an R2 = .25 when predicting test scores, and the addition
- f the anxiety and self-efficacy measures to the model raise R2 to .45. This means anxiety and self-efficacy
increase the model R2 by .20, i.e., Full model R2 with all predictors = .45 Reduced model R2 with only background characteristics = .25 Model R2 difference = ΔR2(anxiety,efficacy) = .45 - .25 = .20 The question of interest is what sample size is needed to ensure adequate power to test for the contribution of anxiety and self-efficacy once background characteristics are controlled. Unlike the previous example, the effect sizes for R2 and f2 cannot be used interchangeably. The effect size f2 is needed here to capture the effect of adding anxiety and efficacy to the model containing background characteristics: f2 (anxiety, efficacy) = ΔR2(anxiety, efficacy) / (1 – R2(full model)) = .20 / (1 – .45) = .20 / .55 = .3636 Model Degrees of Freedom df = 2 since we are adding two predictors to a model that already contains background characteristics Resulting total sample size is n = 53. See the image below for the inputs of this example.
SLIDE 11
11
SLIDE 12 12
- 5. Illustrated Examples of Power Determination
In addition to sample size determination, the Excel spreadsheet can also be used to assess power for a given study. To access the power sheet, click the Power tab on the bottom left of the sheet. See the figure below. (a) Overview of Use There are five steps to finding power:
- 1. Enter alpha error rate (often .05 or .01) – probability of a Type 1 error
- 2. Enter the study sample size, n
- 3. Enter model degrees of freedom (df1; e.g., df = 1 for two group t-test or df1 = 2 for ANOVA with 3 groups)
- 4. Enter the effect size of interest in the appropriate box (d, r, r2, f, f2)
- 5. Click on the “Find Power” button under the appropriate effect size box
The image below shows each step as indicated by an arrow.
SLIDE 13
13
(b) Independent samples t-test Research Question: Is there a difference in motivation to read between females and males in 6th grade? Type 1 Error Rate: α = .05 Sample Size: N = 346 Model Degrees of Freedom df = 1 for a two-group t-test Anticipated Effect Size: d = .35 Resulting power is .9008. See the image below for the inputs of this example.
SLIDE 14
14
(c) Pearson Correlation Coefficient Research Question: Do mathematics test anxiety and mathematics self-efficacy correlate? Type 1 Error Rate: α = .01 Sample Size: N = 74 Model Degrees of Freedom df = 1 for a Pearson correlation Anticipated Effect Size: r = -.45 Resulting power is .95135. See the image below for the inputs of this example.
SLIDE 15
15
(d) One-way ANOVA Research Question: Are there differences in job satisfaction levels among administrators, faculty, and staff? Type 1 Error Rate: α = .05 Sample Size: N = 61 Model Degrees of Freedom df = 2 for ANOVA with 3 groups Anticipated Effect Size: f = .41 Resulting power is .80426. See the image below for the inputs of this example.
SLIDE 16 16
(e) One-way ANCOVA Research Question: Are there differences in job satisfaction levels among administrators, faculty, and staff once salary is taken into account? Type 1 Error Rate: α = .05 Sample Size: N = 60 Model Degrees of Freedom df = 2 for ANOVA with 3 groups (essentially ignore the covariate df here) Anticipated Effect Size: (a) Original f = .41 among groups without salary covariate included in model. (b) Covariate effect size: meta-analysis revealed r = .15 between salary and job satisfaction (c) The adjusted f taking into account the possible covariate contribution to model is Adjusted f =
𝑔 √(1−𝑠2) = .41 √(1−.152) = .41 √(1−.0225) = .41 √.9775 = .41 .9886 = .4147
Resulting power is .80649. The covariate provides little benefit to increase power due to weak correlation between
- utcome of covariate. See the image below for the inputs of this example.
SLIDE 17
17
(f) Regression Research Question: Do mathematics test anxiety and mathematics self-efficacy predict mathematics test scores? Type 1 Error Rate: α = .01 Sample Size: N = 32 Model Degrees of Freedom df = 2 since there are two predictors in the regression equation Anticipated Effect Size: Prior research shows both predictors produce a model R2 of about .40. One may enter the R2 value directly into effect size box labeled r2, or compute the effect size measure f2 as outline by Cohen (1988). Use of either effect size will produce the same sample size. R2 = r2 = .40 Resulting power is .91286. See the image below for the inputs of this example.
SLIDE 18 18
(g) Regression with an Incremental Test of a Predictor Set Research Question: Do mathematics test anxiety and mathematics self-efficacy predict mathematics test scores once background characteristics (hours studied, prior mathematics achievement) are controlled? Type 1 Error Rate: α = .01 Sample Size: N = 53 Model Degrees of Freedom df = 2 since we are adding two predictors to a model that contains background characteristics Anticipated Effect Size: Assume that background characteristics produce an R2 = .25 when predicting test scores, and the addition
- f the anxiety and self-efficacy measures to the model raise R2 to .45. This means anxiety and self-efficacy
raise the model R2 by .20, i.e., Full model R2 with all predictors = .45 Reduced model R2 with only background characteristics = .25 Model R2 difference = ΔR2(anxiety,efficacy) = .45 - .25 = .20 The question of interest is what sample size is needed to ensure adequate power to test for the contribution of anxiety and self-efficacy once background characteristics are controlled. Unlike the previous example, the effect sizes for R2 and f2 cannot be used interchangeably. The effect size f2 is needed here to capture the effect of adding anxiety and efficacy to the model containing background characteristics: f2 (anxiety, efficacy) = ΔR2(anxiety, efficacy) / (1 – R2(full model) ) = .20 / (1 – .45) = .20 / .55 = .3636 Resulting power is .9023. See the image below for the inputs of this example.
SLIDE 19
19
References Cohen, J. (1988). Statistical Power Analysis for the Behavior Sciences (2nd edition). Hillsdale, New Jersey: Erlbaum. Cohen, J., & Nee, J.C.M. (1987). A comparison of two noncentral F approximations, with applications to power analysis in set correlation. Multivariate Behavioral Research, 22, 483-490. Freiman J.A., Chalmers T.C., Smith H. Jr., Kuebler R.R. (1978). The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial. Survey of 71 "negative" trials. N Engl J Med, 690-694. Hoenig, J.M, & Heisey, D.M. (2001). The abuse of power. The American Statistician, 55, 19-24. Kraemer, H.C. and Theimann, S. (1987). How many subjects? Statistical power analysis in research. Newbury Park, CA: Sage. Moher, D., Dulberg, C., & Wells, G. (1994). Statistical power, sample size, and their reporting in randomized controlled trials. JAMA, 272, 122-124. Murphy, K.R. and Myors, B. (2003). Statistical Power Analysis: A Simple and General Model for Traditional and Modern Hypothesis Tests, 2nd Edition. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc Patnaik, N.P. (1949). The non-central χ2 and F distributions and their applications. Biometrika, 36, 202-232. Severo & Zelen (1960). Normal approximations to the chi-square and non-central F probability functions. Biometrika, 52, 415-427; Laubscher, N.F. (1960). Normalizing the noncentral t and F distributions. Annals of Mathematical Statistics, 31, 1105-1112 Tiku, M.L. (1965). Laguerre series forms of non-central χ2 and F distributions. Biometrika, 52, 415-427. Wilkinson, L. (1999). Statistical methods in psychology journals guidelines and explanations. American Psychologist, 54, 594-604.