Introduction to Testing and Measurement Testing: Basic Definitions - - PDF document

introduction to testing and measurement testing basic
SMART_READER_LITE
LIVE PREVIEW

Introduction to Testing and Measurement Testing: Basic Definitions - - PDF document

Introduction to Testing and Measurement Testing: Basic Definitions Assessment - process of documenting knowledge, skills, attitudes, and/or beliefs Evaluation - the making of a judgment about the amount, number, or value


slide-1
SLIDE 1

Introduction to Testing and Measurement

slide-2
SLIDE 2

Testing: Basic Definitions

  • Assessment - process of documenting

knowledge, skills, attitudes, and/or beliefs

  • Evaluation - the making of a

judgment about the amount, number, or value

  • Measurement - quantitative (involves

assigning numbers)

  • Testing - form of measurement
slide-3
SLIDE 3

Basic Definitions

(Continued)

  • Reliability - Measures consistency
  • Validity - Valid to the degree that

accomplishes purpose

  • Objective - To the degree that two or

more reasonable persons given a key will agree

slide-4
SLIDE 4

Basic Statistics

Mean, Median, and Standard Deviation

slide-5
SLIDE 5

Mean

(Arithmetic Average - the sum divided by the count.)

  • Advantages

– Calculation includes all scores – Indicates “typical” score for group

  • Disadvantages

– Easily distorted by extreme scores

slide-6
SLIDE 6

Median

(Midpoint - place the numbers in value

  • rder and find the middle number)
  • Advantages

– Not easily distorted by extremely high or low scores

  • Disadvantages

– Does not take into account the value of all the scores in the group

slide-7
SLIDE 7

Mean or median? “Rule of Thumb”

  • use median when extremely

high or low scores (outliers) are present;

  • use the mean for most other

situation

slide-8
SLIDE 8

Standard Deviation

  • Indicates by how much the

scores in a distribution typically deviate from the mean

  • Mean represents 50% of the

norm group,

– 68% within 1 SD above or below the mean, – 95% within 2 SD above or below the mean, – 99.7% within 3 SD above or below mean

slide-9
SLIDE 9

Normal Curve - Properties

  • Symmetrical, bell-shaped
  • Total area under the curve represents total

number of scores in the distribution

  • Vertical lines mark sub-areas and represent

proportions of scores falling in a particular range

  • Points along baseline correspond to

standard deviations away from the mean

slide-10
SLIDE 10

Testing and Measurement

Validity & Reliability

slide-11
SLIDE 11

Validity of Test Scores

  • The extent to which the scores
  • n the test are representative of

what you are trying to measure

– Example - Does the science test measure only the knowledge of science, or is it dependent on reading ability and therefore measuring science and reading ability?

slide-12
SLIDE 12

Types of Validity

  • Content Validity

– Determined by the degree to which the questions or items are representative of the universe of behavior the test was designed to sample (does the test assess what it claims to assess?)

  • Criterion-Related Validity

– Determined by whether there is a relationship between a test and an immediate criterion measure – example - a driving test, employment

slide-13
SLIDE 13

Factors That Can Reduce Validity?

  • Factors in the Test

–Vague Directions –Irrelevant Items –Poorly Constructed Items –Items that Contain Clues to the Correct Answer –Too Few or Improperly Sequenced Items

slide-14
SLIDE 14

What Affects Validity

(Continued)

  • Factors in Test Administration and

Scoring – Insufficient Time to Complete the Test – Testing Environment – Undetected Cheating – Inappropriate Help or Coaching – Properly Motivated Students – Unreliable Item Scoring

slide-15
SLIDE 15

What Affects Validity

(Continued)

  • Factors Affecting Pupil

Responses –High Level of Fear or Anxiety About Taking the Test –A Tendency to Rush Though the Test –Guessing

slide-16
SLIDE 16

Reliability of Test Scores

  • Consistency
  • Measure of confidence that if

same individuals were retested under similar conditions that the results could be replicated

slide-17
SLIDE 17

Types of Reliability

  • Test-Retest: Coefficient of Stability
  • Alternate Form: Coefficient of

Equivalence

  • Internal Consistency: Consistency of

examinee across test items

  • Interrater Reliability: Consistency of

judges or scorers

slide-18
SLIDE 18

Reliability

General Guidelines

  • Test scores used for decision

about individuals require a much higher degree of reliability than those for making decisions about groups.

  • Higher reliability coefficients

are essential if decisions based

  • n test scores have long term

consequences.

slide-19
SLIDE 19

Reliability General Guidelines

(Continued)

  • Lower reliability coefficients are

tolerable if decisions are reversible or have only a temporary impact.

  • Reliability coefficients for

standardized tests should be .90

  • r higher
  • Reliability coefficients are

influenced by many factors.

slide-20
SLIDE 20

How to Increase Reliability

  • Use objective tests
  • Use a more heterogeneous

group

  • Make sure the difficulty level is

appropriate for the individuals being tested

  • Increase the number of items
slide-21
SLIDE 21

Reliability vs. Validity

  • Reliability means that the test-

takers will get the same score in multiple takes (within reason of course).

  • Validity means measuring what it

is supposed to measure

  • Reliability doesn't necessarily

equate to validity:

– A test can be reliable without being valid. – However, a test cannot be valid unless it is reliable.

slide-22
SLIDE 22

Standardized Tests: Norm-Referenced and Criterion-Referenced Tests

Types of Tests

slide-23
SLIDE 23

Standardized Test

  • administered and scored in a

consistent, or "standard", manner.

  • designed in such a way that the

questions, conditions for administering, scoring procedures, and interpretations are consistent

  • administered and scored in a

predetermined, standard manner.

  • not necessarily a high-stakes, time-

limited, or multiple-choice.

slide-24
SLIDE 24

Standardized Testing

Benefits

  • Objectivity
  • Evidence of validity or reliability of

results

  • Ability to compare across students,

schools, states, etc.

  • Ease of administration and scoring
  • Efficiency (group testing)
  • Developed over time and

supported with data and research

slide-25
SLIDE 25

Standardized Testing

Possible issues

  • Can only sample a portion of the

domain

  • May not match school curriculum
  • May not answer relevant questions
  • Interpretations may not be relevant

for all populations

  • Extraneous factors may prevent

good measure of the student’s ability

  • May not be available for some

constructs/concepts

slide-26
SLIDE 26

Base test type according to decision to be made

  • Norm-Referenced: Level of

achievement compared to others students

  • Criterion-Referenced: Level of

achievement compared to external criterion

slide-27
SLIDE 27

Norm-Referenced Scores

  • Based on the normal curve
  • Reflects student performance

compared to other similar students

  • Shows relative strengths and

weaknesses

  • Are not standards of “what should

be” - only indicators of what “is” Examples: CogAT, Iowa, NNAT, WISC, Stanford, Terra Nova

slide-28
SLIDE 28
  • A set standard of development
  • r achievement usually derived

from the average or median achievement of a large group

  • Used to compare one student’s

results to those of a large sample

  • f students:

– National norms - based on a large sample from across the nation – Local norms - based on a large sample from local schools within a city, district, state, etc.

Norms

slide-29
SLIDE 29

Norms

(Continued)

  • Indicate what the current reality

is

– are not standards, or indicators of what should be

  • Derived by assessing students

thought to be “typical”

  • For mental ability scores, use

student age norms

  • For achievement scores, use

student grade scores

slide-30
SLIDE 30

Good Norms are…

  • Recent

– When outdated norms are used, results can be

  • misleading. Norms change every 5-7 years. (Tests

with norms over 10 years old are not used for gifted evaluation in Cobb County.)

  • Representative

– Because participation in the norm group is voluntary, norm groups might not be representative.

  • Relevant

– The “normal” students used to establish the norms may not have been provided a “normal” instructional program.

slide-31
SLIDE 31

Norm Referenced Tests (NRT) Appropriate Uses

  • Used to compare student

performance with large, usually national or international, sample of similar students

  • Used to make relative

comparisons among schools or school systems to a national sample

slide-32
SLIDE 32

Criterion-Referenced Tests

  • Allow inferences about:

– a curricular domain of skills and knowledge (e.g. the CCGPS, state standards) – a cognitive domain of skill

  • reading comprehension
  • math computation

– standing with respect to a judgmental criterion

  • CRCT (Criterion Referenced Competency Test
  • EOCT (End of Course Test)
  • Georgia Milestones
slide-33
SLIDE 33

Criterion Referenced Tests

(CRT) Appropriate Uses

  • To make instructional decisions

about individual students

  • To make placement decisions about

students, along with other information

  • To make evaluative (formative and

summative) decisions about programs

  • To make decisions about the

curriculum

slide-34
SLIDE 34

Types of Scores

NRT’s & CRT’s

slide-35
SLIDE 35

Raw Scores

  • Actual number of points

received on test

– For example, 25 correct answers

  • ut of 30 questions equals a raw

score of 25

  • Have not been “cooked” in

cauldron of statistics

slide-36
SLIDE 36

Standard Scores

  • Raw scores converted to new

scale

  • Can be used to make direct

comparisons among classes, schools, or districts

  • Can be misinterpreted because

somewhat arbitrary scale values used from test to test

  • Commonly Reported Standard Scores
  • SAT, GRE, NCEs, Stanines, SAS
slide-37
SLIDE 37

Normal Curve Equivalent (NCE)

  • “Normalized standard scores”

used for reporting some standardized achievement tests

  • Converted to a scale with a

mean of 50 and a standard deviation of 21.06

  • Reported in a range between

values of 1 and 99

  • Are not particularly useful in

reporting test reports to parents

slide-38
SLIDE 38

Standard Age Scores (SAS)

  • Used to report the results of

ability tests

  • Sometimes reported as

“deviation” IQ scores

  • Converted to a scale with a

mean of 100 and a standard deviation of 15

  • “Average” is considered 15

above and below 100 – from 85 -115 on the normal curve

slide-39
SLIDE 39

Stanines

  • Standard Scores with whole

number values ranging from 1 to 9

  • Relate to percentile bands
  • Useful as a simple

approximation of performance;

  • May lead to a loss of precision

in reporting

slide-40
SLIDE 40

Percentile Scores

  • Commonly used in expressing results of

standardized tests

  • Probably the best single derived score

for general use in relaying test results

  • Indicate the percentage of students in

the norm group scoring lower than the examinee

  • Range between values of 1 and 99
  • Used to interpret a student’s

performance in comparison to other students

  • Can result in misinterpretation because

all percentile ranks are not equally spaced along any one scale

slide-41
SLIDE 41

Percentile Bands

  • Range of values thought to contain the

student’s “true” percentile rank – smaller bands reflect higher reliability

  • Example: Susan might have a percentile

band ranging between 76 and 86 for math computation on the ITBS, and a percentile band ranging between 82 and 92 for reading. – Scores indicate that Susan probably performs better at reading than she did at math computation – However, exact percentile score for math could be higher than for reading

slide-42
SLIDE 42

Grade Equivalents

  • Identifies grade level at which

“typical” student obtains same raw score

  • Expressed by grade and month
  • Are useful in measuring growth
  • Can be easily misinterpreted
slide-43
SLIDE 43

Grade Equivalent Interpretation

  • Compares student performance on grade-level

material against the average performance of students at other grade levels on the same material

  • Reported in terms of grade level and months
  • Does not mean a 5th grade student with a 9.5

GE score in reading can do 8th grade reading work

  • Does not mean the 5th grade student needs to

be in 8th grade

  • Does mean the 5th grade student is performing

better than peers at same level

  • Does mean that 5th grade student reads 5th

grade material as well as the average 8th grader

slide-44
SLIDE 44

Grade Equivalents- Common Misinterpretations

  • Can not be interpreted as estimate of

grade where a student should be placed

  • Are not equal across the range of the

scale

  • Are not necessarily equal across tests
  • Extremely high or low GE scores are not

dependable estimates of student achievement

slide-45
SLIDE 45

Things to Know

  • Know the Test – study the manual and

understand the content and purpose

  • Know the Norms – cannot interpret

scores well if don’t understand norming population

  • Know the Score – is it standard score,

raw score, percentile rank, or something else?

  • Know the Background – test results

don’t tell the whole story so consider multiple sources of data and information on student

slide-46
SLIDE 46

More to know

  • Research on your own – the more you

know, the more you can explain test results with accuracy and confidence

  • Communicate effectively – provide

pertinent information in a clear, understandable manner to approved individuals

  • Use the test – understanding

increases with multiple uses

  • Use caution – test scores can reflect

ability but they do not determine ability

Reference – Test Scores and What They Mean, 6th edition by H. Lyman,