 
              Learning to Ask: Neural Question Generation for Reading Comprehension Xinya Du 1 Junru Shao 2 Claire Cardie 1 1 Department of Computer Science, Cornell University 2 Zhiyuan College, Shanghai Jiao Tong University {xdu, cardie}@cs.cornell.edu yz_sjr@sjtu.edu.cn Abstract Sentence : Oxygen is used in cellular respiration and re- We study automatic question generation leased by photosynthesis, which uses the en- for sentences from text passages in read- ergy of sunlight to produce oxygen from water. ing comprehension. We introduce an Questions : attention-based sequence learning model – What life process produces oxygen in the for the task and investigate the effect of en- presence of light? coding sentence- vs. paragraph-level infor- photosynthesis mation. In contrast to all previous work, our model does not rely on hand-crafted – Photosynthesis uses which energy to form rules or a sophisticated NLP pipeline; it is oxygen from water? instead trainable end-to-end via sequence- sunlight to-sequence learning. Automatic evalu- – From what does photosynthesis get oxygen? ation results show that our system sig- water nificantly outperforms the state-of-the-art rule-based system. In human evaluations, Figure 1: Sample sentence from the second para- questions generated by our system are also graph of the article Oxygen , along with the natural rated as being more natural ( i.e. , grammat- questions and their answers. icality, fluency) and as more difficult to an- swer (in terms of syntactic and lexical di- annotated data sets for natural language process- vergence from the original text and reason- ing (NLP) research in reading comprehension and ing needed to answer). question answering. Indeed the creation of such 1 Introduction datasets, e.g. , SQuAD (Rajpurkar et al., 2016) and MS MARCO (Nguyen et al., 2016), has spurred Question generation (QG) aims to create natu- research in these areas. ral questions from a given a sentence or para- For the most part, question generation has been graph. One key application of question generation tackled in the past via rule-based approaches is in the area of education — to generate ques- ( e.g. , Mitkov and Ha (2003); Rus et al. (2010). tions for reading comprehension materials (Heil- The success of these approaches hinges criti- man and Smith, 2010). Figure 1, for example, cally on the existence of well-designed rules for shows three manually generated questions that test declarative-to-interrogative sentence transforma- a user’s understanding of the associated text pas- tion, typically based on deep linguistic knowledge. sage. Question generation systems can also be de- ployed as chatbot components ( e.g. , asking ques- To improve over a purely rule-based sys- tions to start a conversation or to request feed- tem, Heilman and Smith (2010) introduced an back (Mostafazadeh et al., 2016)) or, arguably, as overgenerate-and-rank approach that generates a clinical tool for evaluating or improving mental multiple questions from an input sentence using health (Weizenbaum, 1966; Colby et al., 1971). a rule-based approach and then ranks them us- In addition to the above applications, question ing a supervised learning-based ranker. Although generation systems can aid in the development of the ranking algorithm helps to produce more ac- 1342 Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics , pages 1342–1352 Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics , pages 1342–1352 Vancouver, Canada, July 30 - August 4, 2017. c Vancouver, Canada, July 30 - August 4, 2017. c � 2017 Association for Computational Linguistics � 2017 Association for Computational Linguistics https://doi.org/10.18653/v1/P17-1123 https://doi.org/10.18653/v1/P17-1123
ceptable questions, it relies heavily on a manually man and Smith (2010). Human evaluations also crafted feature set, and the questions generated of- rated our generated questions as more grammati- ten overlap word for word with the tokens in the cal, fluent, and challenging (in terms of syntactic input sentence, making them very easy to answer. divergence from the original reading passage and reasoning needed to answer) than the state-of-the- Vanderwende (2008) point out that learning to art Heilman and Smith (2010) system. ask good questions is an important task in NLP In the sections below we discuss related work research in its own right, and should consist of (Section 2), specify the task definition (Section 3) more than the syntactic transformation of a declar- and describe our neural sequence learning based ative sentence. In particular, a natural sounding models (Section 4). We explain the experimental question often compresses the sentence on which setup in Section 5. Lastly, we present the evalua- it is based ( e.g. , question 3 in Figure 1), uses syn- tion results as well as a detailed analysis. onyms for terms in the passage ( e.g. , “form” for “produce” in question 2 and “get” for “produce” 2 Related Work in question 3), or refers to entities from preced- ing sentences or clauses ( e.g. , the use of “pho- Reading Comprehension is a challenging task tosynthesis” in question 2). Othertimes, world for machines, requiring both understanding of nat- knowledge is employed to produce a good ques- ural language and knowledge of the world (Ra- tion ( e.g. , identifying “photosynthesis” as a “life jpurkar et al., 2016). Recently many new datasets process” in question 1). In short, constructing nat- have been released and in most of these datasets, ural questions of reasonable difficulty would seem the questions are generated in a synthetic way. to require an abstractive approach that can pro- For example, bAbI (Weston et al., 2016) is a fully duce fluent phrasings that do not exactly match the synthetic dataset featuring 20 different tasks. Her- text from which they were drawn. mann et al. (2015) released a corpus of cloze style questions by replacing entities with place- As a result, and in contrast to all previous work, holders in abstractive summaries of CNN/Daily we propose here to frame the task of question gen- eration as a sequence-to-sequence learning prob- Mail news articles. Chen et al. (2016) claim that the CNN/Daily Mail dataset is easier than previ- lem that directly maps a sentence from a text pas- sage to a question. Importantly, our approach is ously thought, and their system almost reaches the ceiling performance. Richardson et al. (2013) cu- fully data-driven in that it requires no manually rated MCTest, in which crowdworker questions generated rules. are paired with four answer choices. Although More specifically, inspired by the recent suc- MCTest contains challenging natural questions, it cess in neural machine translation (Sutskever is too small for training data-demanding question et al., 2014; Bahdanau et al., 2015), summariza- answering models. tion (Rush et al., 2015; Iyer et al., 2016), and im- Recently, Rajpurkar et al. (2016) released the age caption generation (Xu et al., 2015), we tackle Stanford Question Answering Dataset 1 (SQuAD), question generation using a conditional neural which overcomes the aforementioned small size language model with a global attention mecha- and (semi-)synthetic issues. The questions are nism (Luong et al., 2015a). We investigate several posed by crowd workers and are of relatively high variations of this model, including one that takes quality. We use SQuAD in our work, and simi- into account paragraph- rather than sentence-level larly, we focus on the generation of natural ques- information from the reading passage as well as tions for reading comprehension materials, albeit other variations that determine the importance of via automatic means. pre-trained vs. learned word embeddings. Question Generation has attracted the atten- In evaluations on the SQuAD dataset (Ra- tion of the natural language generation (NLG) jpurkar et al., 2016) using three automatic eval- community in recent years, since the work of Rus uation metrics, we find that our system signif- et al. (2010). icantly outperforms a collection of strong base- Most work tackles the task with a rule-based ap- lines, including an information retrieval-based proach. Generally, they first transform the input system (Robertson and Walker, 1994), a statistical sentence into its syntactic representation, which machine translation approach (Koehn et al., 2007), and the overgenerate-and-rank approach of Heil- 1 https://stanford-qa.com 1343
Recommend
More recommend