Learning to reason by reading text and answering questions Minjoon - - PowerPoint PPT Presentation

learning to reason by reading text and answering questions
SMART_READER_LITE
LIVE PREVIEW

Learning to reason by reading text and answering questions Minjoon - - PowerPoint PPT Presentation

Learning to reason by reading text and answering questions Minjoon Seo Natural Language Processing Group University of Washington May 26, 2017 @ Kakao Brain What is reasoning? Simple Question Answering Model What is Hello in Bonjour.


slide-1
SLIDE 1

Learning to reason by reading text and answering questions

Minjoon Seo Natural Language Processing Group University of Washington May 26, 2017

@ Kakao Brain

slide-2
SLIDE 2

What is reasoning?

slide-3
SLIDE 3

Simple Question Answering Model

What is “Hello” in French? Bonjour.

slide-4
SLIDE 4

Examples

  • Most neural machine translation systems (Cho et al., 2014; Bahdanau et al.

, 2014)

  • Need very high hidden state size (~1000)
  • No need to query the database (context) à very fast
  • Most dependency, constituency parser (Chen et al., 2014; Klein et al., 2003)
  • Sentiment classification (Socher et al., 2013)
  • Classifying whether a sentence is positive or negative
  • Most neural image classification systems
  • The question is always “What is in the image?”
  • Most classification systems
slide-5
SLIDE 5

Simple Question Answering Model

What is “Hello” in French? Bonjour.

Problem: parametric model has finite capacity. “You can’t even fit a sentence into a single vector” -Dan Roth

slide-6
SLIDE 6

QA Model with Context

English French Hello Bonjour Thank you Merci What is “Hello” in French? Bonjour. Context (Knowledge Base)

slide-7
SLIDE 7

Examples

  • Wiki QA (Yang et al., 2015)
  • QA Sent (Wang et al., 2007)
  • WebQuestions (Berant et al., 2013)
  • WikiAnswer (Wikia)
  • Free917 (Cai and Yates, 2013)
  • Many deep learning models with external memory (e.g. Memory

Networks)

slide-8
SLIDE 8

QA Model with Context

Eats IsA (Amphibian, insect) (Frog, amphibian) (insect, flower) (Fly, insect) What does a frog eat? Fly Context (Knowledge Base)

Something is missing …

slide-9
SLIDE 9

QA Model with Reasoning Capability

Eats IsA (Amphibian, insect) (Frog, amphibian) (insect, flower) (Fly, insect) What does a frog eat? Fly Context (Knowledge Base) First Order Logic IsA(A, B) ^ IsA(C, D) ^ Eats(B, D) à Eats(A, C)

slide-10
SLIDE 10

Examples

  • Semantic parsing
  • GeoQuery (Krishnamurthy et al., 2013; Artzi et al., 2015)
  • Science questions
  • Aristo Challenge (Clark et al., 2015)
  • ProcessBank (Berant et al., 2014)
  • Machine comprehension
  • MCTest (Richardson et al., 2013)
slide-11
SLIDE 11

“Vague” line between non-reasoning QA and reasoning QA

  • Non-reasoning:
  • The required information is explicit in the context
  • The model often needs to handle lexical / syntactic variations
  • Reasoning:
  • The required information may not be explicit in the context
  • Need to combine multiple facts to derive the answer
  • There is no clear line between the two!
slide-12
SLIDE 12

If our objective is to “answer” difficult questions …

  • We can try to make the machine more capable of reasoning (better

model)

  • We can try to make more information explicit in the context (more

data)

OR

slide-13
SLIDE 13

QA Model with Reasoning Capability

Eats IsA (Amphibian, insect) (Frog, amphibian) (insect, flower) (Fly, insect) What does a frog eat? Fly Context (Knowledge Base) First Order Logic IsA(A, B) ^ IsA(C, D) ^ Eats(B, D) à Eats(A, C) Who makes this? Tell me it’s not me …

slide-14
SLIDE 14

Reasoning QA Model with Unstructured Data

What does a frog eat? Fly Frog is an example of amphibian. Flies are one of the most common insects around us. Insects are good sources of protein for amphibians. … Context in natural language

slide-15
SLIDE 15

I am interested in…

  • Natural language understanding
  • Natural language has diverse surface forms (lexically, syntactically)
  • Learning to read text and reason by question answering (dialog)
  • Text is unstructured data
  • Deriving new knowledge from existing knowledge
  • End-to-end training
  • Minimizing human efforts
slide-16
SLIDE 16
slide-17
SLIDE 17

Reasoning capability NLU capability End-to-end

slide-18
SLIDE 18

AAAI 2014 EMNLP 2015 ECCV 2016 CVPR 2017 ICLR 2017 ACL 2017 ICLR 2017

slide-19
SLIDE 19

Reasoning capability NLU capability End-to-end

Geometry QA

slide-20
SLIDE 20

Geometry QA

In the diagram at the right, circle O has a radius of 5, and CE =

  • 2. Diameter AC is

perpendicular to chord

  • BD. What is the length
  • f BD?

a) 2 b) 4 c) 6 d) 8 e) 10

E B D A O 5 2 C

slide-21
SLIDE 21

Geometry QA Model

What is the length of BD? 8 In the diagram at the right, circle O has a radius of 5, and CE =

  • 2. Diameter AC is

perpendicular to chord BD. First Order Logic Local context Global context

slide-22
SLIDE 22

Method

  • Learn to map question to logical form
  • Learn to map local context to logical form
  • Text à logical form
  • Diagram à logical form
  • Global context is already formal!
  • Manually defined
  • “If AB = BC, then <CAB = <ACB”
  • Solver on all logical forms
  • We created a reasonable numerical solver
slide-23
SLIDE 23

Mapping question / text to logical form

In triangle ABC, line DE is parallel with line AC, DB equals 4, AD is 8, and DE is 5. Find AC. (a) 9 (b) 10 (c) 12.5 (d) 15 (e) 17

B D E A C

IsTriangle(ABC) ∧ Parallel(AC, DE) ∧ Equals(LengthOf(DB), 4) ∧ Equals(LengthOf(AD), 8) ∧ Equals(LengthOf(DE), 5) ∧ Find(LengthOf(AC))

Text Input Logical form

Difficult to directly map text to a long logical form!

slide-24
SLIDE 24

Mapping question / text to logical form

In triangle ABC, line DE is parallel with line AC, DB equals 4, AD is 8, and DE is 5. Find AC. (a) 9 (b) 10 (c) 12.5 (d) 15 (e) 17

B D E A C

IsTriangle(ABC) Parallel(AC, DE) Parallel(AC, DB) Equals(LengthOf(DB), 4) Equals(LengthOf(AD), 8) Equals(LengthOf(DE), 5) Equals(4, LengthOf(AD)) …

Over-generated literals

0.96 0.91 0.74 0.97 0.94 0.94 0.31 …

Text scores

1.00 0.99 0.02 n/a n/a n/a n/a …

Diagram scores Selected subset Text Input Logical form Our method

IsTriangle(ABC) ∧ Parallel(AC, DE) ∧ Equals(LengthOf(DB), 4) ∧ Equals(LengthOf(AD), 8) ∧ Equals(LengthOf(DE), 5) ∧ Find(LengthOf(AC))

slide-25
SLIDE 25

Nu Numerical s solver

Literal Equation Equals(LengthOf(AB),d) (Ax-Bx)2+(Ay-By)2-d2 = 0 Parallel(AB, CD) (Ax-Bx)(Cy-Dy)-(Ay-By)(Cx-Dx) = 0 PointLiesOnLine(B, AC) (Ax-Bx)(By-Cy)-(Ay-By)(Bx-Cx) = 0 Perpendicular(AB,CD) (Ax-Bx)(Cx-Dx)+(Ay-By)(Cy-Dy) = 0

  • Find the solution to the equation system
  • Use off-the-shelf numerical minimizers (Wales and Doye,

1997; Kraft, 1988)

  • Numerical solver can choose not to answer question
  • Translate literals to numeric equations
slide-26
SLIDE 26

Dataset

  • Training questions (67 questions, 121 sentences)
  • Seo et al., 2014
  • High school geometry questions
  • Test questions (119 questions, 215 sentences)
  • We collected them
  • SAT (US college entrance exam) geometry questions
  • We manually annotated the text parse of all

questions

slide-27
SLIDE 27

Results (EMNLP 2015)

10 20 30 40 50 60

Text only Diagram

  • nly

Rule-based GeoS Student average

SAT Score (%) *** 0.25 penalty for incorrect answer

slide-28
SLIDE 28

Demo (ge

geometry.allenai.org/d /demo)

  • )
slide-29
SLIDE 29

Limitations

  • Dataset is small
  • Required level of reasoning is very high
  • A lot of manual efforts (annotations, rule definitions, etc.)
  • End-to-end system is simply hopeless
  • Collect more data?
  • Change task?
  • Curriculum learning? (Do more hopeful tasks first?)
slide-30
SLIDE 30

Reasoning capability NLU capability End-to-end

Diagram QA

slide-31
SLIDE 31

Diagram QA

Q: The process of water being heated by sun and becoming gas is called A: Evaporation

slide-32
SLIDE 32

Is DQA subset of VQA?

  • Diagrams and real images are very different
  • Diagram components are simpler than real images
  • Diagram contains a lot of information in a single image
  • Diagrams are few (whereas real images are almost infinitely many)
slide-33
SLIDE 33

Problem

What comes before second feed? 8

Difficult to latently learn relationships

slide-34
SLIDE 34

Strategy

What does a frog eat? Fly

Diagram Graph

slide-35
SLIDE 35

Diagram Parsing

slide-36
SLIDE 36

Question Answering

slide-37
SLIDE 37

Attention visualization

slide-38
SLIDE 38

Results (ECCV 2016)

Method Training data Accuracy Random (expected)

  • 25.00

LSTM + CNN VQA 29.06 LSTM + CNN AI2D 32.90 Ours AI2D 38.47

slide-39
SLIDE 39

Limitations

  • You can’t really call this reasoning…
  • Rather matchting algorithm
  • No complex inference involved
  • You need a lot of prior knowledge to answer some questions!
  • E.g. “Fly is an insect”, “Frog is an amphibian”
slide-40
SLIDE 40

Textbook QA textbookqa.org (CVPR 2017)

slide-41
SLIDE 41

Reasoning capability NLU capability End-to-end

Machine Comprehension

slide-42
SLIDE 42

Question Answering Task (Stanford Question Answering Dataset, 2016)

Q: Which NFL team represented the AFC at Super Bowl 50? A: Denver Broncos

slide-43
SLIDE 43

Why Neural Attention?

Q: Which NFL team represented the AFC at Super Bowl 50?

Allows a deep learning architecture to focus on the most relevant phrase of the context to the query in a differentiable manner.

slide-44
SLIDE 44

Our Model: Bi-directional Attention Flow (BiDAF)

Attention Modeling MLP + softmax 𝑗$ = 0 𝑗' = 1 Barak Obama is the president of the U.S. Who leads the United States? Attention

slide-45
SLIDE 45

(Bidirectional) Attention Flow

Modeling Layer Output Layer Attention Flow Layer Phrase Embed Layer Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM LSTM LSTM LSTM

Start End

h1 h2 hT u1 uJ

LSTM + Softmax Dense + Softmax

Context Query

Query2Context and Context2Query Attention Character Embed Layer

g1 g2 gT m1 m2 mT

slide-46
SLIDE 46

Char/Word Embedding Layers

Modeling Layer Output Layer Attention Flow Layer Phrase Embed Layer Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM LSTM LSTM LSTM

Start End

h1 h2 hT u1 uJ

LSTM + Softmax Dense + Softmax

Context Query

Query2Context and Context2Query Attention Character Embed Layer

g1 g2 gT m1 m2 mT

slide-47
SLIDE 47

Character and Word Embedding

  • Word embedding is fragile against

unseen words

  • Char embedding can’t easily learn

semantics of words

  • Use both!
  • Char embedding as proposed by Kim

(2015)

S e a t t l e Seattle CNN + Max Pooling concat Embedding vector

slide-48
SLIDE 48

Phrase Embedding Layer

Modeling Layer Output Layer Attention Flow Layer Phrase Embed Layer Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM LSTM LSTM LSTM

Start End

h1 h2 hT u1 uJ

LSTM + Softmax Dense + Softmax

Context Query

Query2Context and Context2Query Attention Character Embed Layer

g1 g2 gT m1 m2 mT

slide-49
SLIDE 49

Phrase Embedding Layer

  • Inputs: the char/word embedding of query and context words
  • Outputs: word representations aware of their neighbors (phrase-

aware words)

  • Apply bidirectional RNN (LSTM) for both query and context

LSTM LSTM

h1 h2 hT u1 uJ

Context Query

slide-50
SLIDE 50

Attention Layer

Modeling Layer Output Layer Attention Flow Layer Phrase Embed Layer Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM LSTM LSTM LSTM

Start End

h1 h2 hT u1 uJ

LSTM + Softmax Dense + Softmax

Context Query

Query2Context and Context2Query Attention Character Embed Layer

g1 g2 gT m1 m2 mT

slide-51
SLIDE 51

Attention Layer

  • Inputs: phrase-aware context and query words
  • Outputs: query-aware representations of

context words

  • Context-to-query attention: For each (phrase-

aware) context word, choose the most relevant word from the (phrase-aware) query words

  • Query-to-context attention: Choose the

context word that is most relevant to any of query words.

h1 h2 hT u1 u2 uJ

Softmax

h1 h2 hT u1 u2 uJ

Max Softmax

Context2Query Query2Context

slide-52
SLIDE 52

Context-to-Query Attention (C2Q)

Q: Who leads the United States? C: Barak Obama is the president of the USA. For each context word, find the most relevant query word.

slide-53
SLIDE 53

Query-to-Context Attention (Q2C)

While Seattle’s weather is very nice in summer, its weather is very rainy in winter, making it one of the most gloomy cities in the U.S. LA is … Q: Which city is gloomy in winter?

slide-54
SLIDE 54

Modeling Layer

Modeling Layer Output Layer Attention Flow Layer Phrase Embed Layer Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM LSTM LSTM LSTM

Start End

h1 h2 hT u1 uJ

LSTM + Softmax Dense + Softmax

Context Query

Query2Context and Context2Query Attention Character Embed Layer

g1 g2 gT m1 m2 mT

slide-55
SLIDE 55

Modeling Layer

  • Attention layer: modeling interactions between query and context
  • Modeling layer: modeling interactions within (query-aware) context

words via RNN (LSTM)

  • Division of labor: let attention and modeling layers solely focus on

their own tasks

  • We experimentally show that this leads to a better result than

intermixing attention and modeling

slide-56
SLIDE 56

Output Layer

Modeling Layer Output Layer Attention Flow Layer Phrase Embed Layer Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM LSTM LSTM LSTM

Start End

h1 h2 hT u1 uJ

LSTM + Softmax Dense + Softmax

Context Query

Query2Context and Context2Query Attention Character Embed Layer

g1 g2 gT m1 m2 mT

slide-57
SLIDE 57

Training

  • Minimizes the negative log probabilities of the true start index and

the true end index

𝑧*

+

True end index of example i 𝑧*

,

True start index of example i 𝐪+ Probability distribution of stop index 𝐪, Probability distribution of start index

slide-58
SLIDE 58

Previous work

  • Using neural attention as a controller (Xiong et al., 2016)
  • Using neural attention within RNN (Wang & Jiang, 2016)
  • Most of these attentions are uni-directional
  • BiDAF (our model)
  • uses neural attention as a layer,
  • Is separated from modeling part (RNN),
  • Is bidirectional
slide-59
SLIDE 59

VGG-16

Modeling Layer Output Layer Attention Flow Layer Phrase Embed Layer Word Embed Layer

x1 x2 x3 xT q1 qJ

LSTM LSTM LSTM LSTM

Start End

h1 h2 hT u1 uJ

LSTM + Softmax Dense + Softmax

Context Query

Query2Context and Context2Query Attention Character Embed Layer

g1 g2 gT m1 m2 mT

BiDAF (ours)

Image Classifier and BiDAF

slide-60
SLIDE 60

Stanford Question Answering Dataset (SQuAD) (Rajpurkar et al., 2016)

  • Most popular articles from Wikipedia
  • Questions and answers from Turkers
  • 90k train, 10k dev, ? test (hidden)
  • Answer must lie in the context
  • Two metrics: Exact Match (EM) and F1
slide-61
SLIDE 61

SQuAD Results (http://stanford-qa.com) as of Dec 2

(ICLR 2017)

slide-62
SLIDE 62

Now..

slide-63
SLIDE 63

50 55 60 65 70 75 80 No Char Embedding No Word Embedding No C2Q Attention No Q2C Attention Dynamic Attention Full Model EM F1

Ablations on dev data

slide-64
SLIDE 64

In Inter erac activ ive e Dem Demo

http://allenai.github.io/bi-att-flow/demo

slide-65
SLIDE 65

Attention Visualizations

Where did Super Bowl 50 take place ?

Super%Bowl%50%was%an%American%football%gam e% to%determine%the%champion%of%the%National% Football%League%(%NFL%)%for%the%2015%season%.% The%American%Football%Conference%(%AFC%)% champion%Denver%Broncos%defeated%the% National%Football%Conference%(%NFC%)%champion% Carolina%Panthers%24–10%to%earn%their%third% Super%Bowl%title%.%The%game%was%played%on% February%7%,%2016%,%at at%Levi% i%'s%Stad adium%in in%the% Sa San%Francisco%Bay%Area%at%Sa Santa%Clara%,% Ca California .%As%this%was%the%50th%Super%Bowl%,% the%league%emphasized%the%"%golden% anniversary%"%with%various%goldZthemed% initiatives%,%as%well%as%temporarily%suspending% the%tradition%of%naming%each%Super%Bowl%gam e% with%Roman%numerals%(%under%which%the%game% would%have%been%known%as%"%Super%Bowl%L%"%)%,% so%that%the%logo%could%prominently%feature%the% Arabic%numerals%50%.

at, the, at, Stadium, Levi, in, Santa, Ana [] Super, Super, Super, Super, Super Bowl, Bowl, Bowl, Bowl, Bowl 50 initiatives

slide-66
SLIDE 66

Embedding Visualization at Word vs Phrase Layers

January September August July May may

effect and may result in the state may not aid

  • f these may be more

Opening in May 1852 at debut on May 5 , from 28 January to 25 but by September had been

slide-67
SLIDE 67

How does it compare with feature-based models?

slide-68
SLIDE 68

CNN/DailyMail Cloze Test (Hermann et al., 2015)

  • Cloze Test (Predicting Missing words)
  • Articles from CNN/DailyMail
  • Human-written summaries
  • Missing words are always entities
  • CNN – 300k article-query pairs
  • DailyMail – 1M article-query pairs
slide-69
SLIDE 69

CNN/DailyMail Cloze Test Results

slide-70
SLIDE 70

Transfer Learning (ACL 2017)

slide-71
SLIDE 71

Some limitations of SQuAD

slide-72
SLIDE 72

Reasoning capability NLU capability End-to-end

bAbI QA & Dialog

slide-73
SLIDE 73

Reasoning Question Answering

slide-74
SLIDE 74

Dialog System

U: Can you book a table in Rome in Italian Cuisine S: How many people in your party? U: For four people please. S: What price range are you looking for?

slide-75
SLIDE 75

Dialog task vs QA

  • Dialog system can be considered as QA system:
  • Last user’s utterance is the query
  • All previous conversations are context to the query
  • The system’s next response is the answer to the query
  • Poses a few unique challenges
  • Dialog system requires tracking states
  • Dialog system needs to look at multiple sentences in the conversation
  • Building end-to-end dialog system is more challenging
slide-76
SLIDE 76

Our approach: Query-Reduction

<START> Sandra got the apple there. Sandra dropped the apple. Daniel took the apple there. Sandra went to the hallway. Daniel journeyed to the garden. Q: Where is the apple? Reduced query: Where is the apple? Where is Sandra? Where is Sandra? Where is Daniel? Where is Daniel? Where is Daniel? à garden A: garden

slide-77
SLIDE 77

Query-Reduction Networks

  • Reduce the query into an easier-to-answer query over the sequence
  • f state-changing triggers (sentences), in vector space

Sandra got the apple there.

!" !" #"

"

#"

$

%"

"

%"

$

Where is Sandra?

Sandra dropped the apple

!$ !$ #$

"

#$

$

%"

"

%$

$

Daniel took the apple there.

!& !& #&

"

#&

$

%"

"

%&

$

Where is Daniel?

Sandra went to the hallway.

!' !' #'

"

#'

$

%"

"

%'

$

Where is Daniel?

Daniel journeyed to the garden.

!( !( #(

"

#(

$

%"

"

%(

$ → *

+

Where is Daniel?

Where is the apple?

#

garden Where is Sandra?

∅ ∅ ∅ ∅

slide-78
SLIDE 78

QRN Cell

𝛽 𝜍 1 − × × + 𝐲𝑢 𝐫𝑢 𝐢𝑢−1 𝐢𝑢 𝐴𝑢 𝐢𝑢

sentence query reduced query (hidden state) update gate candidate reduced query update func reduction func

slide-79
SLIDE 79

Characteristics of QRN

  • Update gate can be considered as local attention
  • QRN chooses to consider / ignore each candidate reduced query
  • The decision is made locally (as opposed to global softmax attention)
  • Subclass of Recurrent Neural Network (RNN)
  • Two inputs, hidden state, gating mechanism
  • Able to handle sequential dependency (attention cannot)
  • Simpler recurrent update enables parallelization over time
  • Candidate hidden state (reduced query) is computed from inputs only
  • Hidden state can be explicitly computed as a function of inputs
slide-80
SLIDE 80

Parallelization

computed from inputs only, so can be trivially parallelized

Can be explicitly expressed as the geometric sum of previous candidate hidden states

slide-81
SLIDE 81

Parallelization

slide-82
SLIDE 82

Characteristics of QRN

  • Update gate can be considered as local attention
  • Subclass of Recurrent Neural Network (RNN)
  • Simpler recurrent update enables parallelization over time

QRN sits between neural attention mechanism and recurrent neural networks, taking the advantage of both paradigms.

slide-83
SLIDE 83

bAbI QA Dataset

  • 20 different tasks
  • 1k story-question pairs for each task (10k also available)
  • Synthetically generated
  • Many questions require looking at multiple sentences
  • For end-to-end system supervised by answers only
slide-84
SLIDE 84

What’s different from SQuAD?

  • Synthetic
  • More than lexical / syntactic understanding
  • Different kinds of inferences
  • induction, deduction, counting, path finding, etc.
  • Reasoning over multiple sentences
  • Interesting testbed towards developing complex QA system (and

dialog system)

slide-85
SLIDE 85

bAbI QA Results (1k) (ICLR 2017)

10 20 30 40 50 60 LSTM DMN+ MemN2N GMemN2N QRN (Ours)

Avg Error (%)

Avg Error (%)

slide-86
SLIDE 86

bAbI QA Results (10k)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 MemN2N DNC GMemN2N DMN+ QRN (Ours)

Avg Error (%)

Avg Error (%)

slide-87
SLIDE 87

Dialog Datasets

  • bAbI Dialog Dataset
  • Synthetic
  • 5 different tasks
  • 1k dialogs for each task
  • DSTC2* Dataset
  • Real dataset
  • Evaluation metric is different from original DSTC2: response generation

instead of “state-tracking”

  • Each dialog is 800+ utterances
  • 2407 possible responses
slide-88
SLIDE 88

bAbI Dialog Results (OOV)

5 10 15 20 25 30 35 MemN2N GMemN2N QRN (Ours)

Avg Error (%)

Avg Error (%)

slide-89
SLIDE 89

DSTC2* Dialog Results

10 20 30 40 50 60 70 MemN2N GMemN2N QRN (Ours)

Avg Error (%)

Avg Error (%)

slide-90
SLIDE 90

bAbI QA Visualization

𝑨/ = Local attention (update gate) at layer l

slide-91
SLIDE 91

DSTC2 (Dialog) Visualization

𝑨/ = Local attention (update gate) at layer l

slide-92
SLIDE 92

So…

slide-93
SLIDE 93

Reasoning capability NLU capability End-to-end

Is this possible?

slide-94
SLIDE 94

Reasoning capability NLU capability End-to-end

Or this?

slide-95
SLIDE 95

So… What should we do?

  • Disclaimer: completely subjective!
  • Logic (reasoning) is discrete
  • Modeling logic with differentiable model is hard
  • Relaxation: either hard to optimize or converge to bad optimum (low

generalization error)

  • Estimation: Low-bias or low-variance methods are proposed (Williams, 1992;

Jang et al., 2017), but improvements are not substantial.

  • Big data: how much do we need? Exponentially many?
  • Perhaps new paradigm is needed…
slide-96
SLIDE 96

“If you got a billion dollars to spend on a huge research project, what would you like to do?” “I'd use the billion dollars to build a NASA-size program focusing on natural language processing (NLP), in all of its glory (semantics, pragmatics, etc).”

Michael Jordan Professor of Computer Science UC Berkeley

slide-97
SLIDE 97

Towards Artificial General Intelligence…

Natural language is the best tool to describe and communicate “thoughts” Asking and answering questions is an effective way to develop deeper “thoughts”

slide-98
SLIDE 98

Thank you!

  • minjoon@cs.uw.edu
  • http://seominjoon.github.io