writing reusable code feedback at scale with mixed
play

Writing Reusable Code Feedback at Scale with Mixed-Initiative - PowerPoint PPT Presentation

Writing Reusable Code Feedback at Scale with Mixed-Initiative Program Synthesis Andrew Head*, Elena Glassman*, Gustavo Soares*, Ryo Suzuki, Lucas Figueredo, Loris DAntoni, Bjrn Hartmann * These three authors contributed equally to the


  1. Writing Reusable Code Feedback at Scale with Mixed-Initiative Program Synthesis Andrew Head*, Elena Glassman*, Gustavo Soares*, Ryo Suzuki, Lucas Figueredo, Loris D’Antoni, Björn Hartmann * These three authors contributed equally to the work.

  2. When Writing Feedback on Student Code, Teachers Can Draw on Deep Domain Knowledge Incorrect Student Code Submissions Teacher Comments X What happens when n is zero? Hint: look at lecture 5’s slides X …but it does not scale. While this helper function is useful, it does not handle the ca X Have you considered what would happen if combiner was set Motivation � 2 � 2

  3. In lieu of Teacher-Written Feedback, Autograder Shows Test Cases Student Submission …but there’s still a gulf of evaluation . Test Case Results Course Autograder Motivation � 3 3

  4. Program Synthesis Techniques Can Shrink the Gulf by Automatically Finding and Suggesting Bug Fixes for Students 1 Student Submission In line 2, change total = 0 to total = 1 …but the automatically generated feedback is Test Case Results often mechanical, formulaic AutoGrader [PLDI13] Can we combine teachers’ deep domain knowledge with AutomataTutor [TOCHI15] program synthesis to give students better feedback ? CodeAssist [FSE16] Motivation � 4 4

  5. Program Synthesis Learning Code Transformations from Pairs of Incorrect and Correct Submissions Student 1 fixes iterative solution Student 2 fixes 
 recursive solution Generalized code transformation Motivation � 5 5

  6. Program Synthesis Learning Bug-Fixing Code Transformations 6 Motivation � 6 6

  7. We Scale Up a Little Teacher-Written Feedback by Attaching It to Code Transformations Incorrect Student Code Submissions X Code Transformation (add base case) X Teacher Comments What happens when 
 n is zero? Hint: look at lecture 5’s slides on base cases. X Motivation � 7 � 7

  8. Two Interfaces for Attaching Feedback to Code Transformations MistakeBrowser: giving feedback on clusters Learn transformations from Autograder Collect feedback from teachers x x x T x incorrect x x … o x submissions o o final correct submission S x S x x S x x o o S o o o S S Feedback Bank Related Systems: Divide and Conquer [ITS14], AutoStyle [ITS16] Motivation � 8 � 8

  9. Two Interfaces for Attaching Feedback to Code Transformations FixPropagator: attaching feedback to individual fixes Learns transformations from and collect feedback from… Teacher fixes Teacher submission and picks a writes a hint submission T x … o x Feedback Bank Motivation � 9 � 9

  10. Our Program Synthesis Backend Refazer (/h ɛ .fa. ˈ ze(h)/) Means “To redo.” Using Refazer [ICSE17] as a backend, our systems learn bug-fixing code transformations. Program Synthesis Motivation � 10 10

  11. Contributions • An approach for combining human expertise with program synthesis for delivering reusable, scalable code feedback • Implementations of two different systems that use our approach: FixPropagator , MistakeBrowser • In-lab studies that suggest that the systems fulfill our goals, also inform teachers about common student bugs

  12. Outline • Related Work • Program Synthesis • Systems • Evaluation

  13. System Design Suggest fixes, feedback Interfaces for Teachers Refazer Program Synthesis [ICSE ’17] [L@S ’17] Demonstrate fixes, write feedback Mixed-initiative workflows Systems � 13 � 13

  14. Uploads test cases Writes feedback for each cluster Test 1 … T T … Test N Teacher Learns Finds transformation x x x x x transformations o that fixes next o o o o submission x Trans 1 x Trans 1 … Trans N o o … … and returns x Clusters submissions Trans N feedback o by transformation written for it System x … x x x x incorrect x x o submissions x o Submits o final correct incorrect submission S S code Submit code S S S S … Next Semester S Students Systems: MistakeBrowser � 14

  15. Systems: MistakeBrowser � 15 � 15

  16. Systems: MistakeBrowser � 16 � 16

  17. Looks like you’re writing a recursive call. What might you be missing to enable recursion? Systems: MistakeBrowser � 17 � 17

  18. But Not All Classes Have Submission Histories for Hundreds of Students x x incorrect x submissions S S Submit code S S S S Systems: MistakeBrowser 18 �

  19. Accepts or modifies Uploads test cases Picks Fixes Writes suggested fixes, submission hint feedback Test 1 … T x x T … T … o o Teacher Test N … x Learns x x o Suggests fixes o o transformations, x and feedback x x makes clusters, x x Returns x attaches … x x x feedback to x x feedback o o students System … x x … x incorrect x x x submissions … x S S S Submit code S S S S S S S S S Students Systems: MistakeBrowser Systems: FixPropagator 19 �

  20. Systems: FixPropagator � 20 20

  21. Systems: FixPropagator � 21 21

  22. New Student Submission with Same Bug Suggested Fix Systems: FixPropagator � 22 22

  23. Systems: FixPropagator � 23 23

  24. Both Fixes and Feedback Can Be Further Modified Systems: FixPropagator � 24 24

  25. A Study of the Systems Participants : Current and former teaching staff from CS1 MistakeBrowser ( N = 9) FixPropagator ( N = 8) Interface Walkthrough (5 mins.) Main Task (30 mins.): Giving feedback on student submissions Measurements : Feedback, Manual corrections, Response to feedback recommendations (accepted, changed, rejected), Between-task surveys… Qualitative Feedback : Survey and Post-interview Evaluation � 25

  26. 1. Can a few manual corrections fix many submissions? Evaluation � 26

  27. 1. Can a few manual corrections fix many submissions? FixPropagator propagates fixes from dozens of corrections to hundreds of submissions. Evaluation � 27

  28. 1. Can a few manual corrections fix many submissions? FixPropagator propagates fixes from dozens of corrections to hundreds of submissions. Median # submissions given feedback by… Teacher FixPropagator 0 50 100 150 200 250 • Fixes were propagated within minutes 
 ( median = 2m20s, σ = 7m34s for each correction). Evaluation � 28

  29. 2. How often is a teacher’s feedback relevant when it is matched to other students’ submission? Evaluation � 29

  30. 2. How often is a teacher’s feedback relevant when it is matched to other students’ submission? Feedback propagated with FixPropagator was correct a majority of the time, but not always. Teachers reused feedback a median of 20 times, modifying it a median of 6 times (30%). Generalizable Non-Generalizable Comment Comment “Check if you have the “Your starting value product of the correct of z should be a number of terms.” function, not an int.” Evaluation � 30

  31. 2. How often is a teacher’s feedback relevant when it is matched to other students’ submission? MistakeBrowser created conceptually consistent clusters of student bugs. Evaluation � 31

  32. 2. How often is a teacher’s feedback relevant when it is matched to other students’ submission? MistakeBrowser created conceptually consistent clusters of student bugs. 40% 30% % of clusters 20% 10% 0% No or 
 50% 75% Almost 
 100% “No idea” 100% Do these submissions share the same misconception? Responses for N = 11 clusters Evaluation � 32

  33. 
 
 Evaluation Questions 1. Can a few manual corrections fix many submissions? 
 With a median of 10 corrections, FixPropagator suggested fixes for a median of 201 submissions. 2. How often is a teacher’s feedback relevant when it is matched to another student submission? 
 Matched feedback was relevant ~75% of the time. Evaluation � 33

  34. Limitations • The impact of teacher feedback on student learning outcomes has not been evaluated • Code transformations were created that fix submissions one or two bugs away from correct Evaluation � 34

  35. Conclusion We present an approach for combining human expertise with program synthesis for delivering reusable, scalable code feedback. And two systems implementing this approach: MistakeBrowser FixPropagator

  36. Conclusion We present an approach for combining human expertise with program synthesis for delivering reusable, scalable code feedback. And two systems implementing this approach: MistakeBrowser FixPropagator Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend