teaching multiple concepts to forgetful learners
play

Teaching Multiple Concepts to Forgetful Learners Yuxin Chen - PowerPoint PPT Presentation

Teaching Multiple Concepts to Forgetful Learners Yuxin Chen chenyuxin@uchicago.edu Oisin Mac Aodha Manuel Gomez Anette Hunziker Yuxin Chen Andreas Krause Pietro Perona Yisong Yue Adish Singla Rodriguez Applications: Language learning


  1. Teaching Multiple Concepts to Forgetful Learners Yuxin Chen chenyuxin@uchicago.edu Oisin Mac Aodha Manuel Gomez Anette Hunziker Yuxin Chen Andreas Krause Pietro Perona Yisong Yue Adish Singla Rodriguez

  2. Applications: Language learning • Over 300+ million students • Based on spaced repetition of flash cards • Can we compute optimal personalized schedule of repetition? 2

  3. Teaching Interaction Using Flashcards Interaction at time 𝒖 = 𝟐, 𝟑, … 𝑼 1. Teacher displays a flashcard 𝑦 𝑢 ∈ {1,2, . . , 𝑜} 2. Learner’s recall is 𝑧 𝑢 ∈ 0, 1 1 3. Teacher provides the correct answer 3 Answer: Spielzeug Learning Phase (1) Learning Phase (2) Learning Phase (3) Learning Phase (4) Learning Phase (5) Learning Phase (6) 1 x jouet 2 3 Answer: Spielzeug Answer: Spielzeug Answer: Nachtisch Answer: Answer: Nachtisch Answer: Nachtisch Buch jouet Submit x jouet ✓ Spielzeug x ✓ Buch x nachs ✓ Nachtisch 2 jouet Submit Spielzeug Submit Submit Buch Submit nachs Submit Nachtisch Submit 3

  4. Background on Teaching Policies Example setup - 𝑈 = 20 and 𝑜 = 5 concepts given by 𝑏, 𝑐, 𝑑, 𝑒, 𝑓 Naïve teaching policies • Random: 𝑏 → 𝑐 → 𝑏 → 𝑓 → 𝑑 → 𝑒 → 𝑏 → 𝑒 → 𝑑 → 𝑏 → 𝑐 → 𝑓 → 𝑏 → 𝑐 → 𝑒 → 𝑓 → • Round-robin: 𝑏 → 𝑐 → 𝑑 → 𝑒 → 𝑓 → 𝑏 → 𝑐 → 𝑑 → 𝑒 → 𝑓 → 𝑏 → 𝑐 → 𝑑 → 𝑒 → 𝑓 → 𝑏 → Key limitation : Schedule agnostic to learning process 4

  5. Background: Pimsieur Method (1967) Used in mainstream language learning platforms Based on spaced repetition ideas 𝑏 → 𝑐 → 𝑏 → 𝑐 → 𝑑 → 𝑏 → 𝑑 → 𝑐 → 𝑒 → 𝑑 → 𝑒 → 𝑏 → 𝑐 → 𝑒 → 𝑑 → 𝑓 → 𝑏 → 𝑐 → 𝑏 → 𝑐 → 𝑑 → 𝑏 → 𝑑 → 𝑐 → 𝑒 → 𝑑 → 𝑒 → 𝑏 → 𝑐 → 𝑒 → 𝑑 → 𝑓 → 5

  6. Background: Leitner System (1972) Adaptive spacing intervals Key limitation : No guarantees on the optimality of the schedule Student 1 : 𝑏 → 𝑐 → 𝑏 → 𝑐 → 𝑑 → 𝑏 → 𝑑 → 𝑐 → 𝑒 → 𝑑 → 𝑒 → 𝑏 → 𝑐 → 𝑒 → 𝑑 → 𝑓 → 𝑏 → 𝑏 → 𝑐 → 𝑏 → 𝑐 → 𝑑 → 𝑏 → 𝑑 → 𝑏 → 𝑐 → 𝑑 → 𝑏 → 𝑐 → 𝑏 → 𝑒 → 𝑑 → Student 2: 6

  7. Modeling Forgetfulness Half-life Regression (HRL) model [Settles & Meeder, ACL 2016] Time since last teaching concept Time t Recall Probability p i ( t | history) = 2 − ∆ ti hi of Concept i: Half-life estimate (depends on feedback) h i += a i h i += b i

  8. Interactive Teaching Protocol • For t = 1…T - Teacher chooses concept 𝑗𝜗 1, … , 𝑛 (e.g., a flashcard) - Learner tries to recall concept (success or fail) - Teacher reveals answer (e.g., “Spielzug”) • Goal: maximize % ' 𝑔 history = 1 1 𝑈 B B 𝑞 𝑗 𝑢 | history $:&)$ 𝑛 "#$ &#$ “Area Under Curve”

  9. Naive Approaches • Round Robin - Doesn’t adapt to new estimates of learner recall probabilities - Over-teaches easy concepts - Under-teaches hard concepts • Lowest Recall Probability - Generalization of Pimsleur method and Leitner system - Doesn’t consider change to recall probability

  10. Greedy Teaching Algorithm (interactive) • Choose concept i to maximize Δ 𝑗 history = 𝐹 * ! 𝑔 history⨁ 𝑗, 𝑧 & − 𝑔(history) y t : success or failure of recall at time t (randomness over model estimate) p i ( t | history) = 2 − ∆ ti hi ( h i updated after observing y t )

  11. Characteristics of the Optimization Problem • Non-submodular - Gain of a concept 𝑦 can increase given longer history - Captured by submodularity ratio 𝛿 over sequences 11

  12. Characteristics of the Optimization Problem (cont.) • Post-fix non-monotone - 𝑔 orange ⨁ blue < 𝑔 blue - Captured by curvature ω 12

  13. Theoretical Guarantees: General Case • Guarantees for the general case ( any memory model ) • Utility of 𝜌 gr (greedy policy) compared to 𝜌 opt is given by ) &*( 1 𝛿 )*& 1 − 𝜕 + 5 𝛿 + ≥ 𝐺 𝜌 #$% 1 − 𝑓 *0 !"# 1 2 !$% 𝐺 𝜌 !" ≥ 𝐺 𝜌 #$% 0 2 𝜕 -./ 𝑈 𝑈 &'( +', Theorem 1 Corollary 2 13

  14. Theoretical Guarantees: HLR Model • Consider the task of teaching 𝑜 concepts where each concept is following an independent HLR model with the same parameters 𝑏 0 = 𝑨, 𝑐 0 = 𝑨 ∀ 𝑦 ∈ {1,2, . . , 𝑜} . A sufficient condition for the algorithm to achieve (1 − 𝜗) high utility is 12 " z ≥ max {log 𝑈, log 3𝑜 , log 3' } 14

  15. Illustration: Simulation Results Round Greedy Robin Optimal Objective

  16. User Study 150 participants from Mechanical Turk platform T=40, m=15, total study time is about 25 mins

  17. Figure 6: Samples from the German dataset GR LR RR RD German Avg. gain 0.572 0.487 0.462 0.467 p-value - 0.0652 0.0197 0.0151 17

  18. (a) Common: Owl, Cat, Horse, Elephant, Lion, Tiger, Bear (b) Rare: Angwantibo, Olinguito, Axolotl, Ptarmigan, Patrijshond, Coelacanth, Pyrrhuloxia GR LR RR RD Biodiversity Avg. gain 0.475 0.411 0.390 0.251 (all species) p-value - 0.0017 0.0001 0.0001 GR LR RR RD Biodiversity Avg. gain 0.766 0.668 0.601 0.396 (rare species) p-value - 0.0001 0.0001 0.0001 18

  19. Summary: Teaching Concepts to People • Teaching forgetful learners - Limited memory (modeling forgetfulness) - Engagement (interface design) • Challenges not covered in this talk: - Limited inference power and noise - Mismatch in representation - Interpretability (e.g., teaching via labels vs. rich feedback) - Safety (e.g., when teaching physical tasks) - Fairness (e.g., when teaching a class) - … Questions? 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend