C ross- L ingual M achine R eading C omprehension Yiming Cui, - PowerPoint PPT Presentation

C ross- L ingual M achine R eading C omprehension Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, Guoping Hu Research Center for Social Computing and Information Retrieval (SCIR), Harbin Institute of Technology, China Joint Laboratory of HIT and iFLYTEK Research (HFL), Beijing, China Nov 5, 2019 EMNLP-IJCNLP 2019, Hong Kong SAR, China

O UTLINE • Introduction • Related Work • Preliminaries • Back-Translation Approaches • Dual BERT • Experiments • Discussion • Conclusion & Future Work Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 2 / 40 CLMRC - Outline

I NTRODUCTION • To comprehend human language is essential in AI • M achine R eading C omprehension (MRC) has been a trending topic in recent NLP research Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 3 / 40 CLMRC - Introduction

I NTRODUCTION • Machine Reading Comprehension (MRC) • To read and comprehend a given article and answer the questions based on it • Type of MRC • Cloze-style: CNN / Daily Mail (Hermann et al., 2015) , CBT (Hill et al., 2015) • Span-extraction: SQuAD (Rajpurkar et al., 2016) • Choice-selection: MCTest (Richardson et al., 2013) , RACE (Lai et al., 2017) • Conversational: CoQA (Reddy et al., 2018) , QuAC (Choi et al., 2018) • … Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 4 / 40 CLMRC - Introduction

I NTRODUCTION • Problem: Most of the MRC research is mainly for English • Languages other than English are not well-addressed due to the lack of data TriviaQA … HotpotQA NaturalQuestions … C 3 … CNN / DailyMail NarrativeQA WebQA MultiRC CLOTH PD&CFT CMRC 2018 SQuAD DuoRC ARC DRCD CJRC MCTest DuReader … RACE QuAC CMRC 2017 DROP CMRC 2019 MS MARCO DREAM ChID CBT … SCT … CoQA NewsQA SearchQA RecipeQA ▲ English MRC Datasets ▲ Chinese MRC Datasets Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 5 / 40 CLMRC - Introduction

I NTRODUCTION • How to enrich the training data in low-resource language? • Solution 1: Annotate by human experts High quality but… Time-consuming Expensive Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 6 / 40 CLMRC - Introduction

I NTRODUCTION • How to enrich the training data in low-resource language? • Solution 2: Cross-lingual approaches • Multilingual representation, translation-based approaches, etc. English 100k Traditional Chinese 20k Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 7 / 40 CLMRC - Introduction

I NTRODUCTION • Contributions • We propose a new task called Cross-Lingual Machine Reading Comprehension (CLMRC) to address the MRC problems in low-resource language. • Several back-translation based approaches are presented for cross-lingual MRC and yield state-of-the-art performances on Chinese, Japanese, and French data. • Propose a novel model called Dual BERT to simultaneously model <Passage, Question> in both source and target language. • Dual BERT shows promising results on two public Chinese MRC datasets and set new state- of-the-art performances, indicating the potentials in CLMRC research. Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 8 / 40 CLMRC - Introduction

R ELATED W ORK • Asai et al. (2018) propose to use runtime MT for multilingual MRC Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 9 / 40 CLMRC - Related Work

R ELATED W ORK • Contemporaneous Works (not in the paper) • XQA: A Cross-lingual Open-domain Question Answering Dataset (Liu et al., ACL 2019) • Propose a cross-lingual QA dataset • Cross-Lingual Transfer Learning for Question Answering (Lee and Lee, arXiv 201907) • Propose transfer learning approaches for QA • Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model (Hsu et al., EMNLP 2019) • … Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 10 / 40 CLMRC - Related Work

P RELIMINARIES • Task: Span-Extraction Machine Reading Comprehension • SQuAD (Rajpurkar et al., EMNLP 2016) • Passage: From Wikipedia pages, segment into several small paragraphs • Question: Human-annotated, including various query types (what/when/where/who/how/why, etc.) • Answer: Continuous segments (text spans) in the passage, which has a larger search space, and much harder to answer than cloze-style RC Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 11 / 40 CLMRC - Preliminaries

P RELIMINARIES • Terminology • Source Language ( S ): for extracting knowledge • Rich-resourced, large-scale training data • For example, English. • Target Language ( T ): to optimize on • Low-resourced, limited or no training data • For example, Japanese, French, Chinese, etc. • We aim to improve Chinese (target language) MRC using English (source language) resource Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 12 / 40 CLMRC - Preliminaries

B ACK -T RANSLATION A PPROACHES • Google Neural Machine Translation (GNMT) • Easy API for translation, language detection, etc. • Results on NIST MT02~08 show state-of-the-art performances ▲ GNMT performance on NIST MT 02~08 datasets Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 13 / 40 CLMRC - Approaches

B ACK -T RANSLATION A PPROACHES • GNMT ♠ Step3: Back-translate answer into target language Step2: Answer the question using RC system in source language Step1: Translate target sample into source language Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 14 / 40 CLMRC - Approaches

B ACK -T RANSLATION A PPROACHES • Simple Match ♠ • Motivation • recover translated answer into EXACT passage span • Approach • calculate character-level text overlap between translated Answer A trans and arbitrary sliding window in target passage P T[i:j] • Length of window: len( A trans ) ± δ , δ ∈ [0, 5] • We treat the window P T[i:j] that has largest F1-score as the final answer Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 15 / 40 CLMRC - Approaches

B ACK -T RANSLATION A PPROACHES • Answer Aligner • SimpleMatch stops at token-level and lacks semantic awareness between src/trg answers • If we have a few annotated data, we could further improve the answer span • Condition: A few training data available • Solution: Using translated answer and target passage to extract the exact span Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 16 / 40 CLMRC - Approaches

B ACK -T RANSLATION A PPROACHES • Answer Verifier • Answer Aligner does not utilize question information • Condition: A few training data available • Solution: Feed translated target span, target question, and target passage to extract target span Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 17 / 40 CLMRC - Approaches

D UAL BERT • Overview Step4: Fusion and output Step3: target Step2: source representation representation generation generation Step1: create bilingual inputs Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 18 / 40 CLMRC - Dual BERT

D UAL BERT • Dual Encoder Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 19 / 40 CLMRC - Dual BERT

D UAL BERT • Dual Encoder • We use BERT (Devlin et al., NAACL 2019) for RC system Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 20 / 40 CLMRC - Dual BERT

D UAL BERT • Bilingual Decoder • Raw dot attention ↓ BERT representation • Self-Adaptive Attention (SAA) Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 21 / 40 CLMRC - Dual BERT

D UAL BERT • Bilingual Decoder • Fully connected layer with residual layer normalization • Final output for start/end position in the target language • Training objective Loss for target prediction ↓ ↑ Loss for source prediction Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 22 / 40 CLMRC - Dual BERT

D UAL BERT • How to decide λ ? • Idea: measure how the translated samples assemble the real target samples • Approach: calculate cosine similarity between ground truth span in the source and target language Start/End Representation ↓ ↓ ↓ Span Representation λ → 1, translated samples are good, thus we’d like to use L aux λ → 0, translated samples are bad, thus we’d rather NOT use L aux Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 23 / 40 CLMRC - Dual BERT

E XPERIMENTS : D ATASETS • Task: Span-Extraction MRC • Source Language: English SQuAD (Rajpurkar et al., EMNLP 2016) • • Target Language: Chinese CMRC 2018 (Cui et al., EMNLP 2019) • ▲ Statistics of CMRC 2018 & DRCD DRCD (Shao et al., 2018) • Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 24 / 40 CLMRC - Experiments

E XPERIMENTS : S ETUPS • Tokenization WordPiece tokenizer (Wu et al., 2016) for English, character-level tokenizer for Chinese • • BERT Multilingual BERT (base): 12-layers, 110M parameters • • Translation Google Neural Machine Translation (GNMT) API (March, 2019) • • Optimization AdamW / lr 4e-5 / cosine lr decay / batch 64 / 2 epochs • • Implementation TensorFlow (Abadi et al., 2016) / Cloud TPU v2 (64G HBM) • Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 25 / 40 CLMRC - Experiments

E XPERIMENTS : R ESULTS • Zero-shot Approaches ♠ • zero-shot: no training data for target language • Better source BERT, better target performance • Multi-lingual models exceed all other approaches Y. Cui , W. Che, T. Liu, B. Qin, S. Wang, G. Hu 26 / 40 CLMRC - Experiments

C ross- L ingual M achine R eading C omprehension Yiming Cui, - PowerPoint PPT Presentation

C ross- L ingual M achine R eading C omprehension Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, Guoping Hu Research Center for Social Computing and Information Retrieval (SCIR), Harbin Institute of Technology, China Joint Laboratory of

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 2: I NTRODUCTION TO M ACHINE L EARNING Ilya

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 1: N EUROIMAGING T ECHNIQUES Ilya Kuzovkin

An Introduction oduction to Read eading ing at at Home me Wednesday 21 st February 2018

iGESM: i ntelligent G ene)cally E ngineered S lot M achine 1 Our aim Automa0c device for random

Ideas o Ideas on M n Mac achine L hine Lear earning ning In Inter erpr pretabilit ability

Evolutionary Organisations Bringing the Organisational M achine to Life Terrence Bishop Based

T RANSLATION M EMORY M ACHINE T RANSLATION Dj Vu combines both smartly! 2013 C ONTENT

T he Use of M achine L earning for E xploring T ess L ight Curves Adam Friedman University

T HE C LEAN M ACHINE A GLIMPSE AT REAL WORLD SPRAYER CLEAN OUT PRACTICES Thia Walker

Topics in Quantum Machine Learning Vedran Dunjko v.dunjko@liacs.leidenuniv.nl 1 Q uantum M

EECS 591 D ISTRIBUTED S YSTEMS Manos Kapritsos Fall 2020 S TATE M ACHINE R EPLICATION M ODELING

eb- nstrumented an- achine nteractions, ommunities, and emantics* a proposal for a joint

M O C H A M O CHA 1 Java M achine Vision M O CHA - M inimal O ptical Coffee Height A nalysis

11,001 N EW F EATURES FOR S TATISTICAL M ACHINE T RANSLATION David Chiang Kevin Knight Wei Wang

M ACHINE T RANSLATION Marie-Rene Arend 29 May 2013 Machine translation: Background info +

P ARADOXES IN F AIR M ACHINE L EARNING Paul Glz, Anson Kahng, and Ariel Procaccia NeurIPS 2019

Modern Process Management with SOA, BAM und CEP From static process models to executable

(& Philosophy) David Pierre Leibovitz September 26, 2008 26 September 2008 David Pierre

Science for All Learners 2020 Invasive Species Invasive Species: Kudzu Photo credit: Science

Decrystallization of Adult Birdsong Anatomy of the song system by Perturbation of Auditory

Introduction of INPAC @ SJTU Haijun Yang (SJTU) KIT, Germany, Sept. 6-8, 2017 Page . 2

Name Institution City Country AVERETT, Todd W&amp;M Williamsburg UNITED STATES OF

Sampling Representative Users from Large Social Networks Jie Tang, Chenhui Zhang Tsinghua

New Era of Particle Physics In past two decades or so, many new physics (NP) models have been

C ross- L ingual M achine R eading C omprehension Yiming Cui, - PowerPoint PPT Presentation

C ross- L ingual M achine R eading C omprehension Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, Guoping Hu Research Center for Social Computing and Information Retrieval (SCIR), Harbin Institute of Technology, China Joint Laboratory of

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 2: I NTRODUCTION TO M ACHINE L EARNING Ilya

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 1: N EUROIMAGING T ECHNIQUES Ilya Kuzovkin

An Introduction oduction to Read eading ing at at Home me Wednesday 21 st February 2018

iGESM: i ntelligent G ene)cally E ngineered S lot M achine 1 Our aim Automa0c device for random

Ideas o Ideas on M n Mac achine L hine Lear earning ning In Inter erpr pretabilit ability

Evolutionary Organisations Bringing the Organisational M achine to Life Terrence Bishop Based

T RANSLATION M EMORY M ACHINE T RANSLATION Dj Vu combines both smartly! 2013 C ONTENT

T he Use of M achine L earning for E xploring T ess L ight Curves Adam Friedman University

T HE C LEAN M ACHINE A GLIMPSE AT REAL WORLD SPRAYER CLEAN OUT PRACTICES Thia Walker

Topics in Quantum Machine Learning Vedran Dunjko v.dunjko@liacs.leidenuniv.nl 1 Q uantum M

EECS 591 D ISTRIBUTED S YSTEMS Manos Kapritsos Fall 2020 S TATE M ACHINE R EPLICATION M ODELING

eb- nstrumented an- achine nteractions, ommunities, and emantics* a proposal for a joint

M O C H A M O CHA 1 Java M achine Vision M O CHA - M inimal O ptical Coffee Height A nalysis

11,001 N EW F EATURES FOR S TATISTICAL M ACHINE T RANSLATION David Chiang Kevin Knight Wei Wang

M ACHINE T RANSLATION Marie-Rene Arend 29 May 2013 Machine translation: Background info +

P ARADOXES IN F AIR M ACHINE L EARNING Paul Glz, Anson Kahng, and Ariel Procaccia NeurIPS 2019

Modern Process Management with SOA, BAM und CEP From static process models to executable

(&amp; Philosophy) David Pierre Leibovitz September 26, 2008 26 September 2008 David Pierre

Science for All Learners 2020 Invasive Species Invasive Species: Kudzu Photo credit: Science

Decrystallization of Adult Birdsong Anatomy of the song system by Perturbation of Auditory

Introduction of INPAC @ SJTU Haijun Yang (SJTU) KIT, Germany, Sept. 6-8, 2017 Page . 2

Name Institution City Country AVERETT, Todd W&amp;amp;M Williamsburg UNITED STATES OF

Sampling Representative Users from Large Social Networks Jie Tang, Chenhui Zhang Tsinghua

New Era of Particle Physics In past two decades or so, many new physics (NP) models have been

(& Philosophy) David Pierre Leibovitz September 26, 2008 26 September 2008 David Pierre

Name Institution City Country AVERETT, Todd W&M Williamsburg UNITED STATES OF