Squashing Computational Linguistics Noah A. Smith Paul G. Allen - - PowerPoint PPT Presentation

squashing
SMART_READER_LITE
LIVE PREVIEW

Squashing Computational Linguistics Noah A. Smith Paul G. Allen - - PowerPoint PPT Presentation

Squashing Computational Linguistics Noah A. Smith Paul G. Allen School of Computer Science & Engineering University of Washington Seattle, USA @nlpnoah Research supported in part by: NSF, DARPA DEFT, DARPA CWC, Facebook, Google, Samsung,


slide-1
SLIDE 1

Squashing

Computational Linguistics

Noah A. Smith

Paul G. Allen School of Computer Science & Engineering University of Washington Seattle, USA @nlpnoah

Research supported in part by: NSF, DARPA DEFT, DARPA CWC, Facebook, Google, Samsung, University of Washington.

slide-2
SLIDE 2

data

slide-3
SLIDE 3

Applications of NLP in 2017

  • Conversation, IE, MT, QA, summarization, text categorization
slide-4
SLIDE 4

Applications of NLP in 2017

  • Conversation, IE, MT, QA, summarization, text categorization
  • Machine-in-the-loop tools for (human) authors

Elizabeth Clark Collaborate with an NLP model through an “exquisite corpse” storytelling game Chenhao Tan Revise your message with help from NLP tremoloop.com

slide-5
SLIDE 5

Applications of NLP in 2017

  • Conversation, IE, MT, QA, summarization, text categorization
  • Machine-in-the-loop tools for (human) authors
  • Analysis tools for measuring social phenomena

Lucy Lin Sensationalism in science news bit.ly/sensational-news

… bookmark this survey!

Dallas Card Track ideas, propositions, frames in discourse over time

slide-6
SLIDE 6

data

?

slide-7
SLIDE 7

Squash

slide-8
SLIDE 8

Squash Networks

  • Parameterized differentiable functions composed out of simpler

parameterized differentiable functions, some nonlinear

slide-9
SLIDE 9

Squash Networks

  • Parameterized differentiable functions composed out of simpler

parameterized differentiable functions, some nonlinear

From Jack (2010), Dynamic System Modeling and Control, goo.gl/pGvJPS

*Yes, rectified linear units (relus) are only half-squash; hat-tip Martha White.

slide-10
SLIDE 10

Squash Networks

  • Parameterized differentiable functions composed out of simpler

parameterized differentiable functions, some nonlinear

  • Estimate parameters using Leibniz (1676)

From existentialcomics.com

slide-11
SLIDE 11

Who wants an all-squash diet?

wow very Curcurbita much festive many dropout

slide-12
SLIDE 12
slide-13
SLIDE 13

Linguistic Structure Prediction

input (text)

  • utput (structure)
slide-14
SLIDE 14

Linguistic Structure Prediction

input (text)

  • utput (structure)

sequences, trees, graphs, …

slide-15
SLIDE 15

“gold” output

Linguistic Structure Prediction

input (text)

  • utput (structure)

sequences, trees, graphs, …

slide-16
SLIDE 16

“gold” output

Linguistic Structure Prediction

input representation input (text)

  • utput (structure)

sequences, trees, graphs, …

slide-17
SLIDE 17

“gold” output

Linguistic Structure Prediction

input representation input (text)

  • utput (structure)

clusters, lexicons, embeddings, … sequences, trees, graphs, …

slide-18
SLIDE 18

“gold” output

Linguistic Structure Prediction

input representation input (text)

  • utput (structure)

training objective

clusters, lexicons, embeddings, … sequences, trees, graphs, …

slide-19
SLIDE 19

“gold” output

Linguistic Structure Prediction

input representation input (text)

  • utput (structure)

training objective

clusters, lexicons, embeddings, … sequences, trees, graphs, … probabilistic, cost-aware, …

slide-20
SLIDE 20

“gold” output

Linguistic Structure Prediction

input representation input (text) part representations

  • utput (structure)

training objective

clusters, lexicons, embeddings, … sequences, trees, graphs, … probabilistic, cost-aware, …

slide-21
SLIDE 21

“gold” output

Linguistic Structure Prediction

input representation input (text) part representations

  • utput (structure)

training objective

clusters, lexicons, embeddings, … segments/spans, arcs, graph fragments, … sequences, trees, graphs, … probabilistic, cost-aware, …

slide-22
SLIDE 22

“gold” output

Linguistic Structure Prediction

input representation input (text) part representations

  • utput (structure)

training objective

slide-23
SLIDE 23

“gold” output

Linguistic Structure Prediction

input representation input (text) part representations

  • utput (structure)

training objective

error definitions & weights regularization annotation conventions & theory constraints & independence assumptions

data selection

“task”

slide-24
SLIDE 24

Inductive Bias

  • What does your learning

algorithm assume?

  • How will it choose among good

predictive functions?

See also: No Free Lunch Theorem (Mitchell, 1980; Wolpert, 1996)

slide-25
SLIDE 25

data bias

slide-26
SLIDE 26

Three New Models

  • Parsing sentences into

predicate-argument structures

  • Fillmore frames
  • Semantic dependency graphs
  • Language models that

dynamically track entities

slide-27
SLIDE 27

When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much further than the Big Lie about philandering that Stephanopoulos, Carville helped to put over in 1992.

Original story on Slate.com: http://goo.gl/Hp89tD

slide-28
SLIDE 28

Frame-Semantic Analysis

When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much further than the Big Lie about philandering that Stephanopoulos, Carville helped to put over in 1992.

FrameNet: https://framenet.icsi.berkeley.edu

slide-29
SLIDE 29

Frame-Semantic Analysis

When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much further than the Big Lie about philandering that Stephanopoulos, Carville helped to put over in 1992.

FrameNet: https://framenet.icsi.berkeley.edu

slide-30
SLIDE 30

Frame-Semantic Analysis

When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much further than the Big Lie about philandering that Stephanopoulos, Carville helped to put over in 1992.

cognizer: Democrats topic: why … Clinton FrameNet: https://framenet.icsi.berkeley.edu

slide-31
SLIDE 31

Frame-Semantic Analysis

When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much further than the Big Lie about philandering that Stephanopoulos, Carville helped to put over in 1992.

cognizer: Democrats topic: why … Clinton explanation: why degree: so much content: of Clinton experiencer: ? helper: Stephanopoulos … Carville goal: to put over time: in 1992 benefited_party: ? FrameNet: https://framenet.icsi.berkeley.edu

landmark event: Democrats … Clinton trajector event: they… 1992 entity: so … Clinton degree: so mass: resentment of Clinton time: When … Clinton required situation: they … to look … 1992 time: When … Clinton cognizer agent: they ground: much … 1992 sought entity: ? topic: about … 1992 trajector event: the Big Lie … over landmark period: 1992
slide-32
SLIDE 32

FrameNet: https://framenet.icsi.berkeley.edu

brood, consider, contemplate, deliberate, … appraise, assess, evaluate, … commit to memory, learn, memorize, … agonize, fret, fuss, lose sleep, … translate bracket, categorize, class, classify

slide-33
SLIDE 33

Frame-Semantic Analysis

When Democrats wonder why there is so much resentment of Clinton, they don’t need to look much further than the Big Lie about philandering that Stephanopoulos, Carville helped to put over in 1992.

cognizer: Democrats topic: why … Clinton explanation: why degree: so much content: of Clinton experiencer: ? helper: Stephanopoulos … Carville goal: to put over time: in 1992 benefited_party: ? FrameNet: https://framenet.icsi.berkeley.edu

landmark event: Democrats … Clinton trajector event: they… 1992 entity: so … Clinton degree: so mass: resentment of Clinton time: When … Clinton required situation: they … to look … 1992 time: When … Clinton cognizer agent: they ground: much … 1992 sought entity: ? topic: about … 1992 trajector event: the Big Lie … over landmark period: 1992
slide-34
SLIDE 34

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

words + frame

slide-35
SLIDE 35

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

biLSTM (contextualized word vectors) words + frame

slide-36
SLIDE 36

parts: segments up to length d scored by another biLSTM, with labels

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

biLSTM (contextualized word vectors) words + frame

slide-37
SLIDE 37

parts: segments up to length d scored by another biLSTM, with labels

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

biLSTM (contextualized word vectors) words + frame

  • utput: covering sequence of nonoverlapping segments
slide-38
SLIDE 38

Segmental RNN

(Lingpeng Kong, Chris Dyer, N.A.S., ICLR 2016)

biLSTM (contextualized word vectors) input sequence parts: segments up to length d scored by another biLSTM, with labels training objective: log loss

  • utput: covering sequence of nonoverlapping segments, recovered in O(Ldn); see Sarawagi & Cohen, 2004

slide-39
SLIDE 39

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

slide-40
SLIDE 40

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

slide-41
SLIDE 41

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

cognizer topic ∅ ∅ ∅

slide-42
SLIDE 42

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅

slide-43
SLIDE 43

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅

slide-44
SLIDE 44

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅

slide-45
SLIDE 45

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅

slide-46
SLIDE 46

Inference via dynamic programming in O(Ldn)

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅

slide-47
SLIDE 47

Open-SESAME

(Swabha Swayamdipta, Sam Thomson, Chris Dyer, N.A.S., arXiv:1706.09528)

biLSTM (contextualized word vectors) words + frame parts: segments with role labels, scored by another biLSTM training objective: recall-oriented softmax margin (Gimpel et al., 2010)

  • utput: labeled

argument spans

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

cognizer: Democrats topic: why there is so much resentment of Clinton

slide-48
SLIDE 48

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅

slide-49
SLIDE 49

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅ syntax features?

slide-50
SLIDE 50

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

yes no yes yes no Penn Treebank (Marcus et al., 1993)

slide-51
SLIDE 51

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

cognizer topic ∅ wonderCogitation ∅ wonderCogitation wonderCogitation wonderCogitation ∅ main task scaffold task yes no yes yes no

slide-52
SLIDE 52

Multitask Representation Learning

(Caruana, 1997)

“gold” output input representation input (text) part representations
  • utput (structure)
training objective “gold” output input representation input (text) part representations
  • utput (structure)
training objective

main task: find and label semantic arguments scaffold task: predict syntactic constituents

shared training datasets need not overlap

  • utput structures need not be consistent
slide-53
SLIDE 53
  • urs

65 66 67 68 69 70 71 72

SEMAFOR 1.0 (Das et al., 2014) SEMAFOR 2.0 (Kshirsagar et al., 2015) Open-SESAME … with syntactic scaffold … with syntax features Framat (Roth, 2016) FitzGerald et

  • al. (2015)

single model ensemble F1 on frame-semantic parsing (frames & arguments), FrameNet 1.5 test set.

  • pen-source systems
slide-54
SLIDE 54
  • urs

65 66 67 68 69 70 71 72

SEMAFOR 1.0 (Das et al., 2014) SEMAFOR 2.0 (Kshirsagar et al., 2015) Open-SESAME … with syntactic scaffold … with syntax features Framat (Roth, 2016) FitzGerald et

  • al. (2015)

single model ensemble F1 on frame-semantic parsing (frames & arguments), FrameNet 1.5 test set.

  • pen-source systems
slide-55
SLIDE 55

biLSTM (contextualized word vectors) words + frame training objective: recall-oriented softmax margin (Gimpel et al., 2010)

  • utput: labeled

argument spans

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

cognizer: Democrats topic: why there is so much resentment of Clinton

segments get scores

Bias?

parts: segments with role labels, scored by another biLSTM

slide-56
SLIDE 56

biLSTM (contextualized word vectors) words + frame training objective: recall-oriented softmax margin (Gimpel et al., 2010)

  • utput: labeled

argument spans

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

cognizer: Democrats topic: why there is so much resentment of Clinton

segments get scores syntactic scaffold

Bias?

parts: segments with role labels, scored by another biLSTM

slide-57
SLIDE 57

biLSTM (contextualized word vectors) words + frame training objective: recall-oriented softmax margin (Gimpel et al., 2010)

  • utput: labeled

argument spans

When Democrats wonderCogitation why there is so much resentment of Clinton, they don’t need …

cognizer: Democrats topic: why there is so much resentment of Clinton

segments get scores recall-oriented cost syntactic scaffold

Bias?

parts: segments with role labels, scored by another biLSTM

slide-58
SLIDE 58

Semantic Dependency Graphs

(DELPH-IN minimal recursion semantics-derived representation; “DM”)

Oepen et al. (SemEval 2014; 2015), see also http://sdp.delph-in.net

slide-59
SLIDE 59

When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …

Democrats wonder arg1

tanh(C[hwonder; hDemocrats] + b) · ψarg1

slide-60
SLIDE 60

When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …

When Democrats Democrats wonder why is is resentment so much much resentment resentment of

  • f Clinton

arg2 arg1 arg1

comp_so

arg1 arg1 arg1

arg1
slide-61
SLIDE 61

When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …

When Democrats wonder why Democrats wonder why is is resentment so much much resentment resentment of

  • f Clinton

arg2 arg1 arg1

comp_so

arg1 arg1 arg1 arg1

arg1

wonder there arg1 wonder is arg1

slide-62
SLIDE 62

When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …

Inference via AD3

(alternating directions dual decomposition; Martins et al., 2014)

When Democrats wonder why Democrats wonder why is is resentment so much much resentment resentment of

  • f Clinton

arg2 arg1 arg1

comp_so

arg1 arg1 arg1 arg1

arg1

wonder there arg1 wonder is arg1

slide-63
SLIDE 63

Neurboparser

(Hao Peng, Sam Thomson, N.A.S., ACL 2017)

biLSTM (contextualized word vectors) words parts: labeled bilexical dependencies training objective: structured hinge loss

  • utput: labeled

semantic dependency graph with constraints

When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …

slide-64
SLIDE 64

When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …

Three Formalisms, Three Separate Parsers

Democrats wonder arg1 formalism 3 (PSD) formalism 2 (PAS) formalism 1 (DM) Democrats wonder arg1 Democrats wonder act

slide-65
SLIDE 65

When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …

Shared Input Representations

(Daumé, 2007)

Democrats wonder arg1 shared across all formalism 3 (PSD) formalism 2 (PAS) formalism 1 (DM) Democrats wonder arg1 Democrats wonder act

slide-66
SLIDE 66

When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …

Cross-Task Parts

Democrats wonder arg1 Democrats wonder arg1 Democrats wonder act shared across all

slide-67
SLIDE 67

When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …

Both: Shared Input Representations & Cross-Task Parts

Democrats wonder arg1 shared across all formalism 3 (PSD) formalism 2 (PAS) formalism 1 (DM) Democrats wonder arg1 Democrats wonder act

slide-68
SLIDE 68

Multitask Learning: Many Possibilities

  • Shared input representations, parts? Which parts?
  • Joint decoding?
  • Overlapping training data?
  • Scaffold tasks?
“gold” output “gold” output “gold” output

formalism 1 (DM) formalism 2 (PAS) formalism 3 (PSD)

slide-69
SLIDE 69
  • urs

80 81 82 83 84 85 86 87 88 Du et al., 2015 Almeida & Martins, 2015 (no syntax) Almeida & Martins, 2015 (syntax) Neurboparser … with shared input representations and cross-task parts WSJ (in-domain) Brown (out-of-domain) F1 averaged on three semantic dependency parsing formalisms, SemEval 2015 test set.

slide-70
SLIDE 70

50 60 70 80 90 100

DM PAS PSD

WSJ (in-domain) Brown (out-of-domain) Neurboparser F1 on three semantic dependency parsing formalisms, SemEval 2015 test set.

good enough?

slide-71
SLIDE 71

biLSTM (contextualized word vectors) words training objective: structured hinge loss

  • utput: labeled

semantic dependency graph with constraints

When Democrats wonderwhy there is so much resentment of Clinton, they don’t need …

Bias?

cross-formalism sharing

parts: labeled bilexical dependencies

slide-72
SLIDE 72

Text

slide-73
SLIDE 73

Text ≠ Sentences

larger context

slide-74
SLIDE 74

Generative Language Models

history next word

p(W | history)

slide-75
SLIDE 75

Generative Language Models

history next word

p(W | history)

slide-76
SLIDE 76

When Democrats wonderwhy there is so much resentment of

Entity Language Model

(Yangfeng Ji, Chenhao Tan, Sebastian Martschat,

Yejin Choi, N.A.S., EMNLP 2017)

slide-77
SLIDE 77

Entity Language Model

(Yangfeng Ji, Chenhao Tan, Sebastian Martschat,

Yejin Choi, N.A.S., EMNLP 2017)

When Democrats wonderwhy there is so much resentment of

entity 1

slide-78
SLIDE 78

Entity Language Model

(Yangfeng Ji, Chenhao Tan, Sebastian Martschat,

Yejin Choi, N.A.S., EMNLP 2017)

When Democrats wonderwhy there is so much resentment of Clinton,

  • 1. new entity with new

vector

  • 2. mention word will be

“Clinton” entity 1 entity 2

slide-79
SLIDE 79

Entity Language Model

(Yangfeng Ji, Chenhao Tan, Sebastian Martschat,

Yejin Choi, N.A.S., EMNLP 2017)

When Democrats wonderwhy there is so much resentment of Clinton,

entity 1 entity 2

slide-80
SLIDE 80

Entity Language Model

(Yangfeng Ji, Chenhao Tan, Sebastian Martschat,

Yejin Choi, N.A.S., EMNLP 2017)

When Democrats wonderwhy there is so much resentment of Clinton, they

  • 1. coreferent of entity 1

(previously known as “Democrats”)

  • 2. mention word will be

“they”

  • 3. embedding of entity 1

will be updated entity 1 entity 2

slide-81
SLIDE 81

Entity Language Model

(Yangfeng Ji, Chenhao Tan, Sebastian Martschat,

Yejin Choi, N.A.S., EMNLP 2017)

When Democrats wonderwhy there is so much resentment of Clinton, they

entity 1 entity 2

slide-82
SLIDE 82

Entity Language Model

(Yangfeng Ji, Chenhao Tan, Sebastian Martschat,

Yejin Choi, N.A.S., EMNLP 2017)

When Democrats wonderwhy there is so much resentment of Clinton, they don’t

  • 1. not part of an entity

mention

  • 2. “don’t”

entity 1 entity 2

slide-83
SLIDE 83

128 132 136 140 5-gram LM RNN LM entity language model Perplexity on CoNLL 2012 test set. CoNLL 2012 coreference evaluation. 50 55 60 65 70 75 Martschat and Strube (2015) reranked with entity LM CoNLL MUC F1 B3 CEAF 25 35 45 55 65 75 always new shallow features Modi et

  • al. (2017)

entity LM human Accuracy: InScript.

slide-84
SLIDE 84

history next word

p(W | history)

entities

Bias?

slide-85
SLIDE 85

Bias in the Future?

  • Linguistic scaffold tasks.
slide-86
SLIDE 86

Bias in the Future?

  • Linguistic scaffold tasks.
  • Language is by and about people.
slide-87
SLIDE 87

Bias in the Future?

  • Linguistic scaffold tasks.
  • Language is by and about people.
  • NLP is needed when texts are costly to read.
slide-88
SLIDE 88

Bias in the Future?

  • Linguistic scaffold tasks.
  • Language is by and about people.
  • NLP is needed when texts are costly to read.
  • Polyglot learning.
slide-89
SLIDE 89

data bias

slide-90
SLIDE 90

Thank you!