Learning to Automatically Generate Fill-In-The-Blank Quizzes NLPTEA - PowerPoint PPT Presentation

Learning to Automatically Generate Fill-In-The-Blank Quizzes NLPTEA 2018 Edison Marrese-Taylor Ai Nakajima Yutaka Matsuo Yuichi Ono Graduate School of Engineering The University of Tokyo 1

Table of contents 1. Introduction 2. Proposed Approach 3. Empirical Study 4. Conclusions 2

Introduction

background 3

Background Web Based Automatic Quiz Generation • Fill-in-the-blank questions (CQ) 4

Automatic Quiz Generation multiple choice questions (MCQs) • Commonly used for evaluating knowledge and reading comprehension skill Fill-in-the-blanks questions as cloze questions (CQ) • commonly used for evaluating proficiency of language learners Fill-in-the-blanks questions (FIB) • Commonly used for evaluating listening skill • Amount of blanks and words selected as blanks • Easy to automate 5

Related Work • Sumita et al. (2005) proposed a cloze question generation system which focuses corpus. distractor generation and selection using a large-scale language learners’ • Sakaguchi et al. (2013) a discriminative approach based on SVM classifiers for self-assessment. data and textual descriptions of RDF resources. Cricket portal. approach to generate CQs by making use of a knowledge base extracted from a Machine learning techniques for multiple-choice cloze question generation. • Pino et al. (2008). proficiency. on distractor generation using search engines to automatically measure English 6 • Lee and Seneff (2007), Lin et al. (2007) • Goto et al. (2009) • Narendra and Agarwal (2013), present a system which adopts a semi-structured • Lin et al. (2015) a generic semi-automatic system for quiz generation using linked • Kumar et al. (2015) an approach automatic for CQs generation for student

Our work Contributions • We formalize the problem of automatic fill-in-the-blank question generation. • We present an empirical study using deep learning models for fill-in-the-blank question generation in the context of foreign language learning. 7

Proposed Approach

Approach Formalizing the AQG problem be blanked inside S n . • This setting allows us to train from examples of single blank-annotated sentences . In this way, in order to obtain a sentence with several blanks, multiple passes over the model are required. • This approach works in a way analogous to humans, where blanks are provided one at a time. 8 • We consider a training corpus of N pairs ( S n , C n ) , n = 1 . . . N where S n = s 1 , . . . , s L ( S n ) is a sequence of L ( S n ) tokens and C n ∈ [ 1 , L ( S n )] is an index that indicates the position that should

AQG as Sequence Labeling (1) classic sequence labeling scheme, as follows: (2) (1) y i 9 We model the conditional probability of an output label using the one item (the one in position C n ) belongs to the positive class. • Label sequence is created by simply creating a one-hot vector of • Embedded input sequence S n = s 1 , . . . , s L ( n ) size L ( S n ) for the given class C n , i.e. a sequence of binary classes, Y n = y 1 , . . . , y L ( n ) , where only L ( n ) ∏ ˆ p ( Y n | S n ) ∝ i = 1 ˆ y i = H ( y i − 1 , y i , s i )

AQG as Sequence Labeling (2) (4) model based on an LSTM for AQG. Figure 1: Our sequence labeling real labels y t . the mini-batch, between • We model function H • The loss function is the (5) average cross entropy for 10 (3) using a bidirectional LSTM (Hochreiter and Schmidhuber, 1997). The dog is barking h(1) h(2) h(3) h(4) ⃗ h i = LSTM fw ( ⃗ h i − 1 , x i ) ⃗ ⃗ h i = LSTM bw ( h i + 1 , x i ) y i = softmax ([ ⃗ ⃗ ˆ h i ; h i ]) NN BLANK O O O label distribution ˆ y t and

AQG as Sequence Classification (1) The variable-class-size problem • The output of the model is a position in the input sequence S n , on S n . • Regular sequence classification models use a softmax and therefore are not suitable for our case. Proposed Solution • We propose to use an attention-based approach that allows us to have a variable size dictionary for the output softmax, in a way akin to Pointer Networks (Vinyals et al., 2015). 11 so the size of output dictionary for C n is variable and depends distribution over a fixed output dictionary to compute p ( C n | S n )

AQG as Sequence Classification (2) (6) Figure 2: Our sequence classification model, based on an LSTM for AQG. (10) (9) • Embedded input vector sequence (8) (7) 12 techniques such as max or mean . and the softmax normalizes the vector u to be an output distribution • W and v are learnable parameters, h is obtained using pooling ⃗ h i = LSTM fw ( ⃗ h i − 1 , x i ) S n = s 1 , ..., s L ( n ) ⃗ ⃗ ⃗ h i = LSTM bw ( h i + 1 , x i ) h i = [ ⃗ ⃗ h i ; h i ] u = v ⊺ W [ h i ; ¯ h ] over a dictionary of size L ( S n ) . p ( C n | P n ) = softmax ( u ) • ¯ barking The dog is O O O O O O BLANK h(1) h(2) h(3) h(4) A(1) A(2) A(3) A(4) A

Empirical Study

Data and Pre-processing (1) YouTutors • We use our on-line language learning platform, YouTutors (Nakajima and Tomimatsu, 2013; Ono and Nakajima; Ono et al., 2017), to get data. • YouTutors currently uses a rule-based system for AQG, but we would like to develop a more flexible approach. • With this empirical study, we would like to test to what extent our proposed AQG models are able to encode the behavior of the rule-based system. Figure 3: Quiz interface in YouTutors . 13

Data and Pre-processing (2) Data • Extracted anonymized user interaction data in the manner of real quizzes, obtaining a corpus of approximately 300,000 sentences . • We tokenize using CoreNLP (Manning et al., 2014) to obtain 1.5 million single-quiz question training examples . • We split this dataset using the regular 70/10/20 partition. • We build the vocabulary using the train partition, with a minimum frequency of 1. We do not keep cases and obtain an unknown vocabulary of size 2,029 , and a total vocabulary size of 66,431 tokens. 14

Results on Sequence Labeling Recall Table 1: Results of the sequence labeling approach. 88.80 88.34 88.56 0.0037 Test 88.58 88.81 88.35 0.0037 Valid F1-Score Precision • We use a 2-layer bidirectional LSTM, which we train using Adam Loss Set evaluation. Recall and F1-Score over the positive class for development and positive-class example on each sentence— we use Precision, given the nature of the blanking scheme —there is only one • For evaluation, as accuracy would be extremely unbalanced a drop probability of 0.2. dropout (Srivastava et al., 2014) before and after the LSTM, using word embedding size and hidden state size of 300 and add gradient of our parameters to a maximum norm of 5. We use a 15 Kingma and Ba (2014) with a learning rate of 0 . 001, clipping the

Results on Sequence Classification Loss Table 2: Results of the sequence classification approach. 89.31 102.30 Test 89.17 101.80 Valid Accuracy Set • We use use a 2-layer bidirectional LSTM, which we train using • For evaluation we use accuracy over the validation and test set. so we report results using the last hidden state. noticeable performance difference in preliminary experiments, • Our results for different pooling strategies showed no probability of 0.2 before and after the LSTM. embedding and hidden state of 300, and add dropout with drop our parameters to a maximum norm of 5. We use a word 16 Adam with a learning rate of 0 . 001, also clipping the gradient of

Conclusions

Conclusions • We have formalized the problem of automatic fill-in-the-blanks quiz generation using two well-defined learning schemes: sequence classification and sequence labeling . • We have proposed concrete architectures based on LSTMs to tackle the problem in both cases. • We have presented an empirical study, showing that both proposed training schemes seem to offer fairly good results, with an Accuracy/F1-score of nearly 90%. 17

Future Work Model Improvements • Use pre-trained word embeddings and other features to further improve our results. • Test the power of the models in capturing different quiz styles from real questions created by professors . • Train different models for specific quiz difficulty levels. Platform Improvements It seems possible to transition from a heavily hand-crafted approach for AQG to a learning-based approach on the base of examples derived from the platform on unlabeled data. We are eager to deploy our trained models on our platform and receive feedback. 18 cba

Questions? 18

Learning to Automatically Generate Fill-In-The-Blank Quizzes NLPTEA - PowerPoint PPT Presentation

Learning to Automatically Generate Fill-In-The-Blank Quizzes NLPTEA 2018 Edison Marrese-Taylor Ai Nakajima Yutaka Matsuo Yuichi Ono Graduate School of Engineering The University of Tokyo 1 Table of contents 1. Introduction 2. Proposed

Page 1 of 19 about:blank 5/8/2019 Page 2 of 19 about:blank 5/8/2019 Page 3 of 19 about:blank

Flood-Fill Flood-fill Used in interactive paint systems. The user specify a seed by

Presentation Exercise: Chapter 7 Fill in the Blank. Third declension could be called Latins

Presentation Exercise: Chapter 5 Fill in the Blank. The English future tense sign is

STAT 830 Blank Slides for Notes Richard Lockhart SFU STAT 830 Fall 2020 Richard Lockhart

Presentation Exercise: Chapter 16 True or False. All third-declension adjectives are i- stem. Fill

Language to Image Generation Generate a bird with Generate a bird with Generate a bird

Blank NJUNS Ticket Blank NJUNS Ticket Poles and Asset Page NJUNS Legacy and Ongoing User

Marta Charron, PhD, CFD 1 2 Select Blank Workbook or a Template 3 Blank Workbook 4 RIBBO

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

Presentation Exercise: Chapter 17 Fill in the Blank. A ____________________ agrees with its

Population, Environment & [Fill in the blank] Lessons from the Maya Heartland By: Liza

Presentation Exercise: Chapter 9 Fill in the Blank. ____________________ pronouns, like English

Presentation Exercise: Chapter 35 Fill in the Blank. The ten Latin special verbs covered in

Presentation Exercise: Chapter 39 Fill in the Blank. Gerunds are verbal _____________. Multiple

Decision tree learning Introduction to Machine Learning Task of classification Automatically

The iLab Experience a blended learning hands-on course concept you set the focus WWW Security

Presented in CFSE 2008 Introduction Sizing and capacity planning are key issues that must be

openPOWERLINK over Xenomai Pierre Ficheux (pierre.ficheux@openwide.fr) October 2015

Introduc Introduc Intr troduction to oducti tion t tion to Digi n to Di Digit gital I

a sequence of characters Character? Letter Or number Or even blank space Each

Overview/Questions How is text represented within computer? How can we manipulate text in

Cleaning Dirty Data With Just A Handful of SAS Functions Ben Cochran The Bedford Group

CSE 154 LECTURE 11: REGULAR EXPRESSIONS What is form validation? validation : ensuring that

Learning to Automatically Generate Fill-In-The-Blank Quizzes NLPTEA - PowerPoint PPT Presentation

Learning to Automatically Generate Fill-In-The-Blank Quizzes NLPTEA 2018 Edison Marrese-Taylor Ai Nakajima Yutaka Matsuo Yuichi Ono Graduate School of Engineering The University of Tokyo 1 Table of contents 1. Introduction 2. Proposed

Page 1 of 19 about:blank 5/8/2019 Page 2 of 19 about:blank 5/8/2019 Page 3 of 19 about:blank

Flood-Fill Flood-fill Used in interactive paint systems. The user specify a seed by

Presentation Exercise: Chapter 7 Fill in the Blank. Third declension could be called Latins

Presentation Exercise: Chapter 5 Fill in the Blank. The English future tense sign is

STAT 830 Blank Slides for Notes Richard Lockhart SFU STAT 830 Fall 2020 Richard Lockhart

Presentation Exercise: Chapter 16 True or False. All third-declension adjectives are i- stem. Fill

Language to Image Generation Generate a bird with Generate a bird with Generate a bird

Blank NJUNS Ticket Blank NJUNS Ticket Poles and Asset Page NJUNS Legacy and Ongoing User

Marta Charron, PhD, CFD 1 2 Select Blank Workbook or a Template 3 Blank Workbook 4 RIBBO

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

Presentation Exercise: Chapter 17 Fill in the Blank. A ____________________ agrees with its

Population, Environment &amp; [Fill in the blank] Lessons from the Maya Heartland By: Liza

Presentation Exercise: Chapter 9 Fill in the Blank. ____________________ pronouns, like English

Presentation Exercise: Chapter 35 Fill in the Blank. The ten Latin special verbs covered in

Presentation Exercise: Chapter 39 Fill in the Blank. Gerunds are verbal _____________. Multiple

Decision tree learning Introduction to Machine Learning Task of classification Automatically

The iLab Experience a blended learning hands-on course concept you set the focus WWW Security

Presented in CFSE 2008 Introduction Sizing and capacity planning are key issues that must be

openPOWERLINK over Xenomai Pierre Ficheux (pierre.ficheux@openwide.fr) October 2015

Introduc Introduc Intr troduction to oducti tion t tion to Digi n to Di Digit gital I

a sequence of characters Character? Letter Or number Or even blank space Each

Overview/Questions How is text represented within computer? How can we manipulate text in

Cleaning Dirty Data With Just A Handful of SAS Functions Ben Cochran The Bedford Group

CSE 154 LECTURE 11: REGULAR EXPRESSIONS What is form validation? validation : ensuring that

Population, Environment & [Fill in the blank] Lessons from the Maya Heartland By: Liza