Addressing the Testing Challenge with a Web-Based E-Assessment - - PowerPoint PPT Presentation

addressing the testing challenge with a web based e
SMART_READER_LITE
LIVE PREVIEW

Addressing the Testing Challenge with a Web-Based E-Assessment - - PowerPoint PPT Presentation

Addressing the Testing Challenge with a Web-Based E-Assessment System that Tutors as it Assesses Mingyu Feng, Worcester Polytechnic Institute (WPI) Neil T. Heffernan, Worcester Polytechnic Institute (WPI) Kenneth R. Koedinger, Carnegie Mellon


slide-1
SLIDE 1

Addressing the Testing Challenge with a Web-Based E-Assessment System that Tutors as it Assesses

Mingyu Feng, Worcester Polytechnic Institute (WPI) Neil T. Heffernan, Worcester Polytechnic Institute (WPI) Kenneth R. Koedinger, Carnegie Mellon University (CMU)

slide-2
SLIDE 2

May 25th, 2006 WWW’06 2

The “ASSISTment” System

An e-assessment and e-learning system

that does both ASSISTing of students and assessMENT (movie)

www.assistment.org

Massachusetts Comprehensive Assessment System

“MCAS”

Web-based system built on

Common Tutoring Object Platform (CTOP) [1]

[1] Nuzzo-Jones., G. Macasek M.A., Walonoski, J., Rasmussen K. P., Heffernan, N.T., Common Tutor Object Platform, an e-Learning Software Development Strategy, WPI technical report. WPI-CS-TR-06-08.

slide-3
SLIDE 3

May 25th, 2006 WWW’06 3

ASSISTment

We break multi-step problems

into “scaffolding questions”

“Hint Messages”: given on

demand that give hints about what step to do next

“Buggy Message”: a context

sensitive feedback message

“Knowledge Components”:

Skills, Strategies, concepts

The state reports to teachers on

5 areas

We seek to report on 100

knowledge components

How does a student work with

the ASSISTment? (movie)

(Demo/movie)

The original question

  • a. Congruence
  • b. Perimeter
  • c. Equation-Solving

The 1st scaffolding question Congruence The 2nd scaffolding question Perimeter A buggy message A hint message

slide-4
SLIDE 4

May 25th, 2006 WWW’06 4

Goal

Help student Learning (this paper’s goal [2][3]) Assess students’ performance and present

results to teachers. (this work focused on)

Online “Grade book” report

[2] Razzaq, L., Feng, M., Nuzzo-Jones, G., Heffernan, N.T., Koedinger, K. R., Junker, B., Ritter, S., Knight, A., Aniszczyk, C., Choksey, S., Livak, T., Mercado, E., Turner, T.E., Upalekar. R, Walonoski, J.A., Macasek. M.A., Rasmussen, K.P. (2005). The Assistment Project: Blending Assessment and Assisting. In C.K. Looi, G. McCalla, B. Bredeweg, & J. Breuker (Eds.) Proceedings of the 12th International Conference on Artificial Intelligence In Education, 555-562. Amsterdam: ISO Press. [3] Razzaq, L., Heffernan, N.T. (in press). Scaffolding vs. hints in the Assistment System. In Ikeda, Ashley & Chan (Eds.). Proceedings of the Eight International Conference on Intelligent Tutoring Systems. Springer-Verlag: Berlin. pp. 635-

  • 644. 2006.
slide-5
SLIDE 5

May 25th, 2006 WWW’06 5

Outline for the talk

Part I: Using Part II: Longitudinal Models tracking student

learning over time

Able to tell which schools provide the most

learning to students

Can we tell teachers which skills are being

learned

slide-6
SLIDE 6

May 25th, 2006 WWW’06 6

Data Source

600+ students of two middle schools Used the ASSISTment system every

  • ther week from Sep. 2004 to June 2005

Real MCAS score

test taken in May 2005

2 paper and pencil based tests,

administered in Sep. 2004 and March 2005.

slide-7
SLIDE 7

May 25th, 2006 WWW’06 7

Part I: Using Dynamic Measures

Research Questions

Can we do a more accurate job of

predicting student's MCAS score using the

  • nline assistance information (concerning

time, performance on scaffoldings, #attempt, #hint)?

Can we do a better job predicting MCAS

in this online assessment system than the tradition paper and pencil test does?

slide-8
SLIDE 8

May 25th, 2006 WWW’06 8

Part I: Using Dynamic Measures

Approach

Run forward stepwise linear regression to train up

regression models using different independent variables

Result

5.44183

  • 423

.663 5 Model II plus all other online measures Model III 6.21108

  • 343

.567 1 The single online static metric of percent correct on original questions Model II 6.20881

  • 358

.588 2 Paper practice results only Model I MAD* BIC+ R2 # Variables Entered Independent Variable’s Model + BIC: Bayesian Information Criterion * MAD: Mean Absolute Deviance

slide-9
SLIDE 9

May 25th, 2006 WWW’06 9

Part I: Using Dynamic Measures

Order Variables Coeff.

  • Std. Coeff.

1 PERCENT_CORRECT 32.976

  • 11.209
  • .037

4 AVG_HINT_REQUEST

  • 2.420
  • .121

5 ORIGINAL_PERCENT_CORRECT 12.618 1.66 .425 2 AVG_ATTEMPT

  • .199

3 AVG_ITEM_TIME

  • .143

Model III

What do we see from Model III?

the more hint, attempt, time a student need to solve

a problem, the worse his predicted score would be

slide-10
SLIDE 10

May 25th, 2006 WWW’06 10

Part II: Track Learning Longitudinally

What if we take time into consideration?

Note: Different from Razzaq, Feng et. al which looks at student performance gain over learning

  • pportunity pairs within the ASSISTment system, here “learning” includes students learning in

class too. Recall the problems of prediction in Grade book

Only based on static measure (discussed in part I) Time ignored part II

Research Questions

Can our system detect performance improving over time? Can we tell the difference on learning rate of students

from different schools? Teacher? (Who cares?)

Do students show difference on learning different skills?

Approach -- longitudinal data analysis

slide-11
SLIDE 11

May 25th, 2006 WWW’06 11

Longitudinal Data Analysis

What do we get from a longitudinal model?

Average population trajectory for the specified group

  • Trajectory indicated by two parameters
  • intercept: slope:
  • The average estimated score for a group at time j is

One trajectory for every single student

  • Each student got two parameters to vary from the group

average

Intercept: slope:

  • The estimated score for student i at time j is

Students’ initial knowledge is indicated by

intercept, while slope shows the learning rate

j j

TIME *

10 00

γ γ + = Υ

j i i ij

TIME * ) ( ) (

1 10 00

ζ γ ζ γ + + + = Υ

00

γ

10

γ

i 00

ζ γ +

i 1 10

ζ γ +

[4] Singer, J. D. & Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Occurrence. Oxford University Press, New York.

slide-12
SLIDE 12

May 25th, 2006 WWW’06 12

slide-13
SLIDE 13

May 25th, 2006 WWW’06 13

17 Student from one class % Correct (Y- Axis) over a given month (X Axis)

Table 2. Regression Models

slide-14
SLIDE 14

May 25th, 2006 WWW’06 14

slide-15
SLIDE 15

May 25th, 2006 WWW’06 15

slide-16
SLIDE 16

May 25th, 2006 WWW’06 16

Part II: Track Learning Longitudinally

Unconditional growth model (Model B, TIME) Unconditional means model (Model A, no predictor) BIC = 31712 #param = 3 BIC = 31628 #param = 6 Model D TIME + SCHOOL BIC = 31616 #param = 8 Model E TIME + TEACHER BIC = 31672 #param = 20 Model F TIME + CLASS BIC = 31668 #param = 70 Diff = 12 Diff = 84

Result

Unconditional model (model A) : no predictors Growth model (model B) estimated initial average PredictedScore = 18 estimated average monthly learning rate = 1.29 Observation : students were learning over time Add in school/teacher/class (model

D/E/F)

Model D shows statistical significant

advantage as measured by BIC

Observation: students from different

schools differ on both incoming knowledge and learning rate

slide-17
SLIDE 17

May 25th, 2006 WWW’06 17

Part II: Track Learning Longitudinally

The last question

Can we detect difference

  • n learning rate of

different skills?

slide-18
SLIDE 18

May 25th, 2006 WWW’06 18

Growth of 5 Skills over Time for One Student

10 20 30 40 50 60 70 80 Sept Oct Nov Dec Jan Feb March Time Percent Correct Geometry Algebra Measurement Data Analysis Number Sence

slide-19
SLIDE 19

May 25th, 2006 WWW’06 19

Growth of 5 Skills over Time for One Student

10 20 30 40 50 60 70 80 Sept Oct Nov Dec Jan Feb March Time Percent Correct Geometry Algebra Measurement Data Analysis Number Sence Linear (Geometry) Linear (Data Analysis) Linear (Algebra) Linear (Measurement) Linear (Number Sence)

slide-20
SLIDE 20

May 25th, 2006 WWW’06 20

Part II: Track Learning Longitudinally

The last question

Can we detect difference

  • n learning rate of

different skills?

Yes we can! In this paper we showed that we can the model with 5 skills to do a more accurate prediction of their own data. Even more recent studies we have down have shown even finer grain model (98 skills) are better at non-only predicting our

  • nline data, but predicting the students test scores.

[7] Pardos, Z. A., Heffernan, N. T., Anderson, B. & Heffernan, C. (in press). Using Fine-Grained Skill Models to Fit Student Performance with Bayesian Networks. Workshop in Educational Data Mining held at the Eight International Conference on Intelligent Tutoring Systems. Taiwan. 2006. [8] Feng, M., Heffernan, N., Mani, M., & Heffernan C. (in press). Using Mixed-Effects Modeling to Compare Different Grain-Sized Skill Models. AAAI'06 Workshop on Educational Data Mining, Boston, 2006.

slide-21
SLIDE 21

May 25th, 2006 WWW’06 21

Large Scale : ASSISTment project

ASSISTments are tagged with skills

slide-22
SLIDE 22

May 25th, 2006 WWW’06 22

Large Scale : ASSISTment project

Are the skill/knowledge components mapping

any good?

Teachers get reports that they think are credible and

  • useful. [6]

[6] Feng, M., Heffernan, N.T. (in press). Informing Teachers Live about Student Learning: Reporting in the Assistment System. To be published in Technology, Instruction, Cognition, and Learning Journal Vol. 3. Old City Publishing, Philadelphia, PA. 2006. [7] Pardos, Z. A., Heffernan, N. T., Anderson, B. & Heffernan, C. (in press). Using Fine-Grained Skill Models to Fit Student Performance with Bayesian Networks. Workshop in Educational Data Mining held at the Eight International Conference on Intelligent Tutoring Systems. Taiwan. 2006. [8] Feng, M., Heffernan, N., Mani, M., & Heffernan C. (in press). Using Mixed-Effects Modeling to Compare Different Grain-Sized Skill Models. AAAI'06 Workshop on Educational Data Mining, Boston, 2006.

slide-23
SLIDE 23

May 25th, 2006 WWW’06 23

slide-24
SLIDE 24

May 25th, 2006 WWW’06 24

slide-25
SLIDE 25

May 25th, 2006 WWW’06 25

Large Scale : ASSISTment project

We built 300 ASSISTments provided ~8

hours of content using the Builder [5]

[5] Heffernan N.T., Turner T.E., Lourenco A.L.N., Macasek M.A., Nuzzo-Jones G., Koedinger K.R., The ASSISTment builder: Towards an Analysis of Cost Effectiveness of ITS creation, Accepted by FLAIRS2006, Florida, USA (2006).

Are the content we created good at producing

learning?

Do students learn from these? [2] Good enough that its used by 1,500 8th graders in

Worcester, every two weeks as part of their math

  • class. (2nd year)
slide-26
SLIDE 26

May 25th, 2006 WWW’06 26

Large Scale : ASSISTment project

Other work Using Hints and Attempts and Time

Can detect how is “gaming” and prevent it Machine learning of user models

[9] Walonoski, J., Heffernan, N.T. (accepted). Detection and Analysis of Off-Task Gaming Behavior in Intelligent Tutoring Systems. In Ikeda, Ashley & Chan (Eds.). Proceedings of the Eight International Conference on Intelligent Tutoring Systems. Springer-Verlag: Berlin. pp. 382-391. 2006 [10] Walonoski, J., Heffernan, N. T. (accepted) Prevention of Off-Task Gaming Behavior in Intelligent Tutoring Systems, Proceedings of the Eight International Conference on Intelligent Tutoring Systems.

slide-27
SLIDE 27

May 25th, 2006 WWW’06 27

Conclusion

Our online assessment system did a better

job of predicting student knowledge by being able to take into consideration how much tutoring assistance was needed.

Promising evidence was found that the online

system was able to track students’ learning during a year well.

We found that the system could reliably track

students’ learning of individual skills.

slide-28
SLIDE 28

Leena RAZZAQ*, Mingyu FENG, Goss NUZZO-JONES, Neil T. HEFFERNAN, Kenneth KOEDINGER+, Brian JUNKER+, Steven RITTER, Andrea KNIGHT+, Edwin MERCADO*, Terrence E. TURNER, Ruta UPALEKAR, Jason A. WALONOSKI Michael A. MACASEK, Christopher ANISZCZYK, Sanket CHOKSEY, Tom LIVAK, Kai RASMUSSEN

Some of the ASSISTMENT TEAM

* This research

was made possible by the US Dept of Education, Institute of Education Science, "Effective Mathematics Education Research" program grant #R305K03140, the Office of Naval Research grant # N00014- 03-1-0221, NSF CAREER award to Neil Heffernan, and the Spencer Foundation. Authors Razzaq and Mercado were funded by the National Science Foundation under Grant No.

  • 0231773. All the
  • pinions in this

article are those of the authors, and not those of any

  • f the funders.

Carnegie Learning

slide-29
SLIDE 29

May 25th, 2006 WWW’06 29

Future work

Predict Student State Test Scores

Regression + longitudinal analysis [9]

Incorporate finer grained cognitive models Item level prediction [8] Apply the models in current reporting system

[9] Feng, M., Heffernan, N.T., & Koedinger, K.R. (in press). Predicting state test scores better with intelligent tutoring systems: developing metrics to measure assistance required. In Ikeda, Ashley & Chan (Eds.). Proceedings of the Eight International Conference on Intelligent Tutoring Systems. Springer-Verlag: Berlin. pp. 31-40. 2006. [8] Feng, M., Heffernan, N., Mani, M., & Heffernan C. (2006, accepted). Using Mixed-Effects Modeling to Compare Different Grain-Sized Skill Models. AAAI'06 Workshop on Educational Data Mining, Boston, 2006.