Introduction to Testing and Measurement Testing: Basic Definitions - PDF document

Introduction to Testing and Measurement

Testing: Basic Definitions • Assessment - process of documenting knowledge, skills, attitudes, and/or beliefs • Evaluation - the making of a judgment about the amount, number, or value • Measurement - quantitative (involves assigning numbers) • Testing - form of measurement

Basic Definitions (Continued) • Reliability - Measures consistency • Validity - Valid to the degree that accomplishes purpose • Objective - To the degree that two or more reasonable persons given a key will agree

Basic Statistics Mean, Median, and Standard Deviation

Mean (Arithmetic Average - the sum divided by the count .) • Advantages – Calculation includes all scores – Indicates “typical” score for group • Disadvantages – Easily distorted by extreme scores

Median (Midpoint - place the numbers in value order and find the middle number ) • Advantages – Not easily distorted by extremely high or low scores • Disadvantages – Does not take into account the value of all the scores in the group

Mean or median? “ Rule of Thumb” • use median when extremely high or low scores (outliers) are present; • use the mean for most other situation

Standard Deviation • Indicates by how much the scores in a distribution typically deviate from the mean • Mean represents 50% of the norm group, – 68% within 1 SD above or below the mean, – 95% within 2 SD above or below the mean, – 99.7% within 3 SD above or below mean

Normal Curve - Properties • Symmetrical, bell-shaped • Total area under the curve represents total number of scores in the distribution • Vertical lines mark sub-areas and represent proportions of scores falling in a particular range • Points along baseline correspond to standard deviations away from the mean

Testing and Measurement Validity & Reliability

Validity of Test Scores • The extent to which the scores on the test are representative of what you are trying to measure – Example - Does the science test measure only the knowledge of science, or is it dependent on reading ability and therefore measuring science and reading ability?

Types of Validity • Content Validity – Determined by the degree to which the questions or items are representative of the universe of behavior the test was designed to sample (does the test assess what it claims to assess?) • Criterion-Related Validity – Determined by whether there is a relationship between a test and an immediate criterion measure – example - a driving test, employment

Factors That Can Reduce Validity? • Factors in the Test – Vague Directions – Irrelevant Items – Poorly Constructed Items – Items that Contain Clues to the Correct Answer – Too Few or Improperly Sequenced Items

What Affects Validity (Continued) • Factors in Test Administration and Scoring – Insufficient Time to Complete the Test – Testing Environment – Undetected Cheating – Inappropriate Help or Coaching – Properly Motivated Students – Unreliable Item Scoring

What Affects Validity (Continued) • Factors Affecting Pupil Responses – High Level of Fear or Anxiety About Taking the Test – A Tendency to Rush Though the Test – Guessing

Reliability of Test Scores • Consistency • Measure of confidence that if same individuals were retested under similar conditions that the results could be replicated

Types of Reliability • Test-Retest: Coefficient of Stability • Alternate Form: Coefficient of Equivalence • Internal Consistency: Consistency of examinee across test items • Interrater Reliability: Consistency of judges or scorers

Reliability General Guidelines • Test scores used for decision about individuals require a much higher degree of reliability than those for making decisions about groups. • Higher reliability coefficients are essential if decisions based on test scores have long term consequences.

Reliability General Guidelines (Continued) • Lower reliability coefficients are tolerable if decisions are reversible or have only a temporary impact. • Reliability coefficients for standardized tests should be .90 or higher • Reliability coefficients are influenced by many factors.

How to Increase Reliability • Use objective tests • Use a more heterogeneous group • Make sure the difficulty level is appropriate for the individuals being tested • Increase the number of items

Reliability vs. Validity • Reliability means that the test- takers will get the same score in multiple takes (within reason of course). • Validity means measuring what it is supposed to measure • Reliability doesn't necessarily equate to validity: – A test can be reliable without being valid. – However, a test cannot be valid unless it is reliable.

Types of Tests Standardized Tests: Norm-Referenced and Criterion-Referenced Tests

Standardized Test • administered and scored in a consistent, or "standard", manner. • designed in such a way that the questions, conditions for administering, scoring procedures, and interpretations are consistent • administered and scored in a predetermined, standard manner. • not necessarily a high-stakes, time- limited, or multiple-choice.

Standardized Testing Benefits • Objectivity • Evidence of validity or reliability of results • Ability to compare across students, schools, states, etc. • Ease of administration and scoring • Efficiency (group testing) • Developed over time and supported with data and research

Standardized Testing Possible issues • Can only sample a portion of the domain • May not match school curriculum • May not answer relevant questions • Interpretations may not be relevant for all populations • Extraneous factors may prevent good measure of the student’s ability • May not be available for some constructs/concepts

Base test type according to decision to be made • Norm-Referenced: Level of achievement compared to others students • Criterion-Referenced: Level of achievement compared to external criterion

Norm-Referenced Scores • Based on the normal curve • Reflects student performance compared to other similar students • Shows relative strengths and weaknesses • Are not standards of “what should be” - only indicators of what “is” Examples : CogAT, Iowa, NNAT, WISC, Stanford, Terra Nova

Norms • A set standard of development or achievement usually derived from the average or median achievement of a large group • Used to compare one student’s results to those of a large sample of students: – National norms - based on a large sample from across the nation – Local norms - based on a large sample from local schools within a city, district, state, etc.

Norms (Continued) • Indicate what the current reality is – are not standards, or indicators of what should be • Derived by assessing students thought to be “typical” • For mental ability scores, use student age norms • For achievement scores, use student grade scores

Good Norms are… • Recent – When outdated norms are used, results can be misleading. Norms change every 5-7 years. (Tests with norms over 10 years old are not used for gifted evaluation in Cobb County.) • Representative – Because participation in the norm group is voluntary, norm groups might not be representative. • Relevant – The “normal” students used to establish the norms may not have been provided a “normal” instructional program.

Norm Referenced Tests (NRT) Appropriate Uses • Used to compare student performance with large, usually national or international, sample of similar students • Used to make relative comparisons among schools or school systems to a national sample

Criterion-Referenced Tests • Allow inferences about: – a curricular domain of skills and knowledge (e.g. the CCGPS, state standards) – a cognitive domain of skill • reading comprehension • math computation – standing with respect to a judgmental criterion • CRCT (Criterion Referenced Competency Test • EOCT (End of Course Test) • Georgia Milestones

Criterion Referenced Tests (CRT) Appropriate Uses • To make instructional decisions about individual students • To make placement decisions about students, along with other information • To make evaluative (formative and summative) decisions about programs • To make decisions about the curriculum

Types of Scores NRT’s & CRT’s

Raw Scores • Actual number of points received on test – For example, 25 correct answers out of 30 questions equals a raw score of 25 • Have not been “cooked” in cauldron of statistics

Standard Scores • Raw scores converted to new scale • Can be used to make direct comparisons among classes, schools, or districts • Can be misinterpreted because somewhat arbitrary scale values used from test to test • Commonly Reported Standard Scores • SAT, GRE, NCEs, Stanines, SAS

Normal Curve Equivalent (NCE) • “Normalized standard scores” used for reporting some standardized achievement tests • Converted to a scale with a mean of 50 and a standard deviation of 21.06 • Reported in a range between values of 1 and 99 • Are not particularly useful in reporting test reports to parents

Introduction to Testing and Measurement Testing: Basic Definitions - PDF document

Introduction to Testing and Measurement Testing: Basic Definitions Assessment - process of documenting knowledge, skills, attitudes, and/or beliefs Evaluation - the making of a judgment about the amount, number, or value

Bridging social and physical measurement: measurement is not scale construction; measurement is

Presentation to Ontario Smart Grid Working Group Who is Measurement Canada? Measurement: A part

Introduction to CSS Measurement Measurement 3 Measurement units units units Selectors

Measurement 4 - 1 Introduction Measurement is finding a number

CHAPTER 2 MEASUREMENT OF HIGH VOLTAGE AND CURRENTS 2.1 MEASUREMENT OF HIGH DIRECT VOLTAGES

Software testing Software Testing Introduction Testing levels Automated testing Principles and

Measurement There are two main systems of measurement: - The English system

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

Measurement Techniques Part 2: Measurement Techniques Terminology and general issues

Measurement: Concepts in Practice Department of Government London School of Economics and

solid inventory measurement Industrialised 3D surface scanning ALLISON Eng inventory measurement

Using measurement uncertainties in the MQO 1 Using measurement uncertainties | 24-25 june 2015

Property-Based Testing Matt Bachmann @mattbachmann Testing is Important Testing is Important

Software Testing Overview What is software testing? General testing criteria Testing

1. Test page This page is for testing. This page is for testing. This page is for testing.

THE POLITICAL ECONMY OF THE POLITICAL ECONMY OF ENVIRONMENTALLY RELATED TAXES IN A

Well-Being Global Index: Assays of multivariate statistical approaches Maria do Carmo Botelho 1

Gary P. Yakub ENVIRONMENTAL LABORATORY SOLUTIONS, LLC Ohio Water Environment Association 2010

Henkel Strategy Kasper Rorsted Carsten Knobel London Nov 16, 2012 1 November 16, 2012

Amsterdam, 27 January 2017 The Return of the CC(C)TB: First Critical Analysis Prof.dr. D.S. Smit

Transfer Pricing Documentation Requirements Michael Friedman , Partner T dd A Mill Todd A. Miller

CANADIAN COTTAGE AND RECREATIONAL PROPERTIES OWNED BY UNITED STATES RESIDENTS: S. 116 INCOME TAX

MAKING SENSE OF BUSINESS VALUATION AND THE DUE DILIGENCE PROCESS Presentation to the Windsor Estate

Sambuz

Useful Links

Newsletter

Mail Us