Science Simulations to Address Assessment Challenges Dawn Cameron| - - PowerPoint PPT Presentation

science simulations to address assessment challenges
SMART_READER_LITE
LIVE PREVIEW

Science Simulations to Address Assessment Challenges Dawn Cameron| - - PowerPoint PPT Presentation

Science Simulations to Address Assessment Challenges Dawn Cameron| Minnesota Department of Education | Specialty and Technical Innovations Paul Katula | Maryland Department of Education | Scoring Specialist Kristen DiCerbo | Pearson | VP of


slide-1
SLIDE 1

Science Simulations to Address Assessment Challenges

Dawn Cameron| Minnesota Department of Education | Specialty and Technical Innovations Paul Katula | Maryland Department of Education | Scoring Specialist Kristen DiCerbo | Pearson | VP of Learning Research and Design

slide-2
SLIDE 2

Overview

  • Minnesota’s history and some challenges for

developing sims

  • Maryland’s efforts to use sims to assess

complex academic standards

  • Pearson’s research with rich sims and difficult

to measure constructs

June 2019 2 Leading for educational excellence and equity, every day for every one. | education.state.mn.us

slide-3
SLIDE 3

Minnesota Science Assessment Program

2008: Online-only science assessment

  • 50% Technology-enhanced (TE) items
  • 50% Multiple choice items

2011: Minnesota started developing simulations

  • 1 Operational and 3 Field Test Simulations per year
  • MC, TE and Task Response item types associated

June 2019 3 Leading for educational excellence and equity, every day for every one. | education.state.mn.us

slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6

Task Response Stats

Grade P-value pbis IRT parameter a IRT parameter b IRT parameter c 5 n=13 0.38 (range 0.20 to 0.68) 0.36 (range 0.27 to 0.44) 0.70 (range 0.38 to 0.96) 0.77 (ranges -0.68 to 1.84) 0.04 (range 0.02 to 0.11) 8 n=11 0.58 (range 0.44 to 0.74) 0.44 (range 0.3 to 0.52) 0.71 (range 0.43 to 0.88)

  • 0.21

(ranges -1.12 to 0.48) 0.04 (range 0.02 to 0.06) HS n=7 0.41 (range 0.18 to 0.73) 0.38 (range 0.32 to 0.58) 0.67 (range 0.48 to 1.10) 0.59 (ranges -1.06 to 1.58) 0.04 (range 0.01 to 0.05)

June 2019 6 Leading for educational excellence and equity, every day for every one. | education.state.mn.us

slide-7
SLIDE 7

Simulation items

  • Includes both Task Response and more traditional TE and

MC item types

  • More effectively measure critical thinking and problem

solving

  • Open-ended, constructed response without the cost of hand

scoring

  • Can decrease amount of text and language load for a

student

June 2019 7 Leading for educational excellence and equity, every day for every one. | education.state.mn.us

slide-8
SLIDE 8

Prerequisite Knowledge

NGSS Structure

  • Three dimensions, intertwined
  • Phenomena & storylines

Test Question Formats

  • QTI Standard Interactions
  • Portable Customized Interactions

Choice Hotspot Drag & Drop Inline Media Upload Extended Text Simple Text Ordering Matching Hot Text

slide-9
SLIDE 9

Where We’re Headed

PCI Type Input Output Score Non-standard Interaction Computer Animation Simulation Capture

slide-10
SLIDE 10

Example from Our Science Practice Test

Slightly better than a video All results pre-programmed Easy to convert for accommodations

slide-11
SLIDE 11

Capture PCI Brainstorming

  • Exploration, Dissections
  • Based on a video game model, where players earn points for making moves.

Games

  • Quality Testing, Life Science
  • In conjunction with questions, determine if test-takers have collected sufficient

evidence to give the answers. Trials

  • Weather, Behavior
  • Where events can be introduced with random strength, frequency, timing, etc.

Random

slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14

Scenario / Storyline

Phenomenon Act 1

  • Your high school’s football coach wants to know how to get the most

distance on a field goal attempt in a real-game situation.

Background Research Act 2

  • How strong is the football team’s place kicker? What’s the initial velocity
  • f the ball after it leaves his foot?

Investigation Act 3

  • Use a device that allows you to set the angle of a ball’s trajectory and the

force with which the ball is “kicked.” Find the optimal angle.

slide-15
SLIDE 15

Computations We Must Do

  • Vectors require trigonometry
  • We can’t assume students would be able to perform trigonometric math on
  • ur science test.

Trigonometry Knowledge Can’t Be Assumed

  • 2D motion goes above the assessment limit
  • The assessment boundary in the NGSS PE being assessed specifically limits

us to motion in one dimension. Assessment Boundary Can’t Be Exceeded

Evidence Statement 3.c (and 2.a.i)

Students express the relationship Fnet = ma in terms of causality, namely that a net force on an object causes the object to accelerate. A typical football weighs 0.4 kg, and a kicker’s foot pushes it for 0.1 sec. How much force is required for the football to leave his foot with a speed of 15 m/s? 30 m/s? What if the football had a mass of 0.5 kg?

slide-16
SLIDE 16

Level Setting the Simulation Parameters

Evidence Statement 1.a requires that students be able to organize data that show the net force, acceleration, and mass (held constant) of a macroscopic object. Rules allow footballs to weigh up to about 0.43 kg. Run the simulation to show the coach whether he should use a lighter or heavier football for his team to kick. Explain how any data points you obtained from the simulation support your advice to the coach. The simulation allows students to vary the following parameters:

  • Launch angle (θ)
  • Initial speed of the kick

However, the simulation is reset after the preceding questions so that all students work with the correct force to get the launch velocity they set.

slide-17
SLIDE 17

Mastery of the S.E.P. is NECESSARY and SUFFICIENT.

t t v h Height (m) vs. time (s) Vertical Velocity (m/s) vs. time (s) With each run, students are given graphs to describe the flight of the football and a data table is filled in, which students can sort or

  • delete. Data must be used to answer the questions.

Evidence statements 3.a and 3.b call on students to use the data as evidence, so the simulation allows them to select rows from the table (which they “generated” themselves) and drag them into a receiving bay for the questions they want to use those trials from their data to support.

slide-18
SLIDE 18

Scoring

We would expect students to drag in rows that hold all variables constant (such as angle and initial speed) constant while varying the mass of the

  • football. We would also expect to see a few trials at each mass and could

score responses accordingly, based on how well the student:

  • Used the data to make valid and reliable scientific claims and determined

an optimal solution (SEP)

  • Integrated empirical evidence to distinguish between cause and

correlation (CCC) Rules allow footballs to weigh up to about 0.43 kg. Run the simulation to show the coach whether he should use a lighter or heavier football for his team to kick. Explain how any data points you obtained from the simulation support your advice to the coach. In answering a question like the one shown earlier, for example (2.a.i):

slide-19
SLIDE 19

Image by XYZ

Evidence from Science Simulations

Pearson Kristen DiCerbo Jinnie Choi Matthew Ventura Emily Lai Minnesota Department of Education Dawn Cameron Jim Wood Judi Iverson

June 2019

slide-20
SLIDE 20

Problem to be Solved

slide-21
SLIDE 21
  • Assess difficult to measure skills
  • Make use of students’ process data
  • Pull apart practices from knowledge for

formative feedback

How can we...

slide-22
SLIDE 22

Approach

slide-23
SLIDE 23

Evidence-centered design

23

slide-24
SLIDE 24

Hard-to-measure standards

24

Five simulations were designed to help teach and assess Minnesota Science Standards. Additionally, we targeted to teach and assess the standards on the science practice, which involve investigation and argument from evidence.

Physics Biome Satellite Submarine

Physical Science Motion, Energy 5.2.2.1.3 Force and motion 6.2.3.2.1 Kinetic-potential energy transformation Life Science Interdependence among living systems 5.4.2.1.2 Impact of changes in parts 7.4.2.1.1 Relationships among the parts in a stable ecosystem The Nature of Science and Engineering Interactions among science, technology, engineering, mathematics and society 8.1.3.4.1 Use maps to describe patterns and make prediction The Nature of Science and Engineering The practice of engineering 6.1.2.2.1 Apply an engineering design process

Gravity

Earth Science The universe 8.3.3.1.3 Describe how mass and distance affect the gravitational force

slide-25
SLIDE 25

Simulation Templates

25

slide-26
SLIDE 26

Evidence-centered design: We designed multiple tasks

26

Each simulation contained multiple activity/task variants. From a student’s interaction with a task, we collected multiple pieces of evidence based

  • n the knowledge and skills we targeted in

the design of the task.

Physics Biome Satellite Submarine

Grade 5 Form: 6 task variants Grade 8 Form: 4 task variants One form, 5 task variants One form, 6 task variants One form, 6 task variants

Gravity

Grade 5 Form: 4 task variants Grade 8 Form: 4 task variants

Biome

One form, 5 task variants

Biome

One form, 5 task variants

slide-27
SLIDE 27

A priori hypotheses about evidence model

27

slide-28
SLIDE 28

We collected data from two groups of students- lower and upper grade levels. The sample sizes for the study only included the students who had interaction data from the simulations as well as the matching near- and far-transfer scores.

Study participants

28

Total number of students 5th grade 8th grade

Gravity 316 193 123 Physics 198 104 94 Biome 304 156 148 Satellite 281 6* 275 Submarine 253 89 164

For each simulation...

MN Simulations Pilot Update |

slide-29
SLIDE 29

Evidence Pieces (Indicators) Near-transfer Knowledge Far-transfer Knowledge

We collected data from multiple sources

29

For each of the five sims, each learner’s interactions with the simulation activities were recorded in a .json log file. Based on the evidence model and scoring algorithms, each evidence piece was coded as 1 if it was present in the interaction and 0

  • therwise.

For each of the five sims, we developed five near-transfer traditional multiple choice items about the targeted standard. Each student’s state assessment score in science (a composite scaled score) was collected at the end of the school year.

Evidence Pieces (Indicators)

slide-30
SLIDE 30

30

Data storage, extraction, transformation

slide-31
SLIDE 31

Evidence Identification

Part Four: RQ1 Internal Structure: Latent Variables

31 MN Simulations Pilot Update |

Sim Grade Construct Evidence Submarine 5,8 Design a Solution 1 if does not repeat the same error after failure Submarine 5,8 Design a Solution 1 if selects tools that are relevant after failure (correct at second or later try) Submarine 5,8 Design a Solution 1 if selects correct sizes for relevant tools (there are 9 successful solutions including Medium hull, Small thruster, Small cockpit, Large battery, armSmall, box, light Submarine 5,8 Design a Solution 1 if runs minimum number of trials to reach solution (minimum: 3) Submarine 5,8 Design a Solution 1 if selects tools that are relevant before failure (correct at first try) Submarine 5,8 Design a Solution 1 if tests individual tools to identify the tool functions (either one more tool or one change, at least

  • ne of these changes)

Submarine 5,8 Design a Solution 1 if selects tools that are relevant after failure (correct at second or later try) Submarine 5,8 Design a Solution 1 if selects correct sizes for relevant tools (2 solutions) Submarine 5,8 Design a Solution 1 if runs minimum number of trials to reach solution (2)

slide-32
SLIDE 32

Results

slide-33
SLIDE 33

Level of Support

slide-34
SLIDE 34

Part Two: Descriptives

34

Gravity 5

Grade 5 Form: 4 task variants Average Completion Rate Average Success Rate Task 1* 94% 57.2% Task 2 100% 14% Task 3 100% 30.6% Task 4 100% 21.2%

MN Simulations Pilot Update |

Tasks with step-level instructions are easier

*Indicates a training task designed to teach students to use the interface. No evidence about science knowledge and skills is gathered from this task.

Satellite

One form, 6 task variants Average Completion Rate Average Success Rate Task 1* 99.7% 78.6% Task 2 77.8% 45.9% Task 3 84% 28.7% Task 4 74.2% 49.6% Task 5 82.6% 42.5% Task 6 71.8% 30.9%

Biome

One form, 5 task variants Number

  • f

Evidence Pieces Average Completion Rate Average Success Rate Task 1* 5 90% 91.1% Task 2 8 87.8% 81.5% Task 3 2 89% 52% Task 4 5 90.8% 85.9% Task 5 3 100% 20.8%

slide-35
SLIDE 35

Separating Knowledge from Practice

slide-36
SLIDE 36

Hypotheses: Latent Variables

Part Four: RQ1 Internal Structure: Latent Variables

36 MN Simulations Pilot Update |

Simulation Form Knowledge Practices

Gravity Grade 5 Gravity Experimental design Using Models Measurement Grade 8 Distance and Gravity Mass and Gravity Physics Grade 5 Force and Motion Experimental design; Hypothesis testing; Cause and effect Grade 8 Force and Motion Experimental design Link claims and evidence Biome One Ecosystems Observation; Link claims and evidence Satellite Grade 8 Detect Pattern Predict Cause and Effect Measurement Submarine One Developing solutions Optimizing solutions

slide-37
SLIDE 37

Example: Physics latent variable structures

Part Four: RQ1 Internal Structure: Latent Variables

37 MN Simulations Pilot Update |

3LV vs. 4LV

Number of Evidence Pieces 33 for Grade 5 Form; 18 for Grade 8 Form

Category Subskill Grade 5 Form Grade 8 Form Interface Use SimUse 11 4 LocateData 1 1 Physics Knowledge ForceAndMotion 4 1 EnergyIsTransferred . 2 Science Practice - Investigation ExperimentalDesign 10 6 HypothesisTesting 5 . Science Practice - Argument from Evidence CauseAndEffect 2 . LinkClaimsAndEvidence . 4

slide-38
SLIDE 38

Results: Decision on the number of latent variables

Part Four: RQ1 Internal Structure: Latent Variables

38 MN Simulations Pilot Update |

Simulation Form Number of Students LV 1 LV 2 Gravity Grade 5 193 Gravity on motion; Experimental design; Using models; Measurement Grade 8 123 Distance and gravity Mass and gravity Physics Grade 5 104 Force and motion Experimental design; Hypothesis testing; Cause and effect Grade 8 94 Force and motion; Energy is transferred; Experimental design; Link claims and evidence Biome One 304 Ecosystem Observation; Link claims and evidence Satellite Grade 8 275 Detect pattern; Predict Cause and effect; Measurement Submarine One 253 Developing solution; Optimizing solution

slide-39
SLIDE 39

Comparing Statistical Models

slide-40
SLIDE 40

We compared four types of evidence aggregation methods

Part Five: RQ2 Models

40 MN Simulations Pilot Update |

Discrete latent variables Produces probability estimates for skills associated with mastery or non-mastery prediction Cut scores required Non-parametric method Percentage of correct (1 vs 0) scores for skills across all attempted evidence

  • n a continuous 0 to 1 scale

Easy to interpret

Bayesian Network Percent Correct

How do we aggregate simulation activity data to skill-level information? We describe and empirically compare different ways to aggregate evidence of learning during students’ use of interactive science simulations. Among many approaches to aggregate evidence of learning (Mislevy et al., 2014), we consider the four methods. Continuous latent variables Produces proficiency estimates for skills

  • n a continuous -4 to 4 logit scale

Hard to interpret Discrete latent variables Produces probability estimates for skills associated with mastery or non-mastery prediction Cut scores required

Item Response Model Cognitive Diagnostic Model

slide-41
SLIDE 41

Adding to preliminary conclusion: Best statistical approach

Part Five: RQ2 Models

41 MN Simulations Pilot Update |

Still, the results do not support any clear winner yet.

  • IRT results correlate well with both near- and far-transfer scores.
  • BN shows the second best correlations.

IRT (MRCMLa) CDM (GDINAb) Bayesian Networkc Percent Correctd

Type of outcome (as a measure of learning of skills and knowledge) Proficiency estimate (-4 to 4 logit scale) Mastery group classification (mastery or non-mastery) Mastery group classification (mastery or non-mastery) Percent correct score (0 to 1) Ease of interpretation Fair Good Good Best Model fit (BIC) Good Good Fair Correlation among the four methods Better Good Best Better Correlation with Near- and Far-transfer Best Good Better Good

slide-42
SLIDE 42

Summary and Next Steps

slide-43
SLIDE 43

Summary: What we learned

Part Eight: Summary and Next Steps

43

  • Legacy systems are not equipped to work with these data and statistical models
  • Looking at user interface may be a key variable
  • We were mostly successful in separating knowledge from practice and where we were not have

learned why

  • The results across evidence aggregation models are relatively highly correlated and perform
  • similarly. Choice of models should be based on purpose of assessment
slide-44
SLIDE 44

Next steps

Part Eight: Summary and Next Steps

44

1.

Building models across simulations

2.

Demonstrating the use of the ‘interface use’ score in interpretation

3.

Demonstration of how results could be used to guide instructional decisions

slide-45
SLIDE 45
slide-46
SLIDE 46

Questions, Contact Information

Dawn Cameron, 651-582-8551, dawn.cameron@state.mn.us Paul Katula, 410-767-7510, paul.katula@maryland.gov Kristen DiCerbo, 623-238-3511, kristen.dicerbo@pearson.com

June 2019 46 Leading for educational excellence and equity, every day for every one. | education.state.mn.us