Structured Predictions: Practical Advancements and Applications - - PowerPoint PPT Presentation

β–Ά
structured predictions
SMART_READER_LITE
LIVE PREVIEW

Structured Predictions: Practical Advancements and Applications - - PowerPoint PPT Presentation

Structured Predictions: Practical Advancements and Applications Kai-Wei Chang University of Virginia Department of Computer Science References: http://kwchang.net/talks/sp.html Kai-Wei Chang (http://kwchang.net/talks/sp.html) 1 Supervised


slide-1
SLIDE 1

Structured Predictions:

Practical Advancements and Applications Kai-Wei Chang University of Virginia

Department of Computer Science

References: http://kwchang.net/talks/sp.html

1 Kai-Wei Chang (http://kwchang.net/talks/sp.html)

slide-2
SLIDE 2

Output y∈ Y

An item y drawn from a label space Y

Input x∈ X

An item x drawn from an instance space X Learned Model y = 𝑔 𝑦

Supervised learning

2

Target function

y = π‘”βˆ—(x)

Kai-Wei Chang (http://kwchang.net/talks/sp.html)

slide-3
SLIDE 3

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 3

Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book

Q: [Chris] = [Mr. Robin] ?

Slide modified from Dan Roth

slide-4
SLIDE 4

Complex Decision Structure

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 4

Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book

slide-5
SLIDE 5

Why is structure important? Hand written recognition example v What is this letter?

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 5

slide-6
SLIDE 6

Structured Prediction

Task Input Output

Part-of-speech Tagging They operate ships and banks. Dependency Parsing They operate ships and banks. Segmentation

Pronoun Verb Noun And Noun

Root They operate ships and banks .

Assign values to a set of interdependent output variables

6 Kai-Wei Chang (http://kwchang.net/talks/sp.html)

slide-7
SLIDE 7

Challenge: Scalability Issues

v Large amount of data v Complex decision structure

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 7

Bill Clinton, recently elected as the President of the USA, has been invited by the Russian President], [Vladimir Putin, to visit Russia. President Clinton said that he looks forward to strengthening ties between U SA and Russia

Algorithm 2 is shown to perform better Berg-Kirkpatrick, ACL

  • 2010. It can also be expected to

converge faster -- anyway, the E- step changes the auxiliary function by changing the expected counts, so there's no point in finding a local maximum

  • f the auxiliary

function in each iteration a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous

  • algorithms. This enables us to

develop structured contextual bandits, a partial information structured prediction setting with many potential applications. Can learning to search work even when the reference is poor? We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy. Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal

  • f learning is to

Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about

  • him. The poem was printed in a

magazine for others to read. Mr. Robin then wrote a book Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield

  • Farm. When Chris was three years old,

his father wrote a poem about him. The poem was printed in a magazine for

  • thers to read. Mr. Robin then wrote a

book

slide-8
SLIDE 8

Solution Methods

v Assume a graphical structure; optimize

v Use within various structured predictions algorithms (e.g., CRF, Structured Perceptron, M3N, Structured SVM)

[Lafferty+ 01, Collins02, Taskar04]

v See our AAAI16 tutorial (https://goo.gl/TF7cGj)

v Learning to search approaches

v Assume the complex decision is incrementally constructed by a sequence of decisions vE.g., LASO, dagger, Searn, transition-based methods vSee our NAACL15 tutorials (http://hunch.net/~l2s)

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 8

slide-9
SLIDE 9

Example: Dependency Parsing

v Identifying relations between words

I ate a cake with a fork

9

I ate a cake with a fork

Kai-Wei Chang (http://kwchang.net/talks/sp.html)

slide-10
SLIDE 10

Graphical Model Approaches: Graph-Based Parser [McDonald+. 2005] v Consider all word pairs and assign scores v Score of a tree = sum of score of edges v Can be formulated as a MST problem

vChu-Liu-Edmonds

10 Kai-Wei Chang (http://kwchang.net/talks/sp.html)

slide-11
SLIDE 11

Learning to search approaches Shift-Reduce parser[Nivre03,NIPS16] v Maintain a buffer and a stack v Make predictions from left to right v Three (four) types of actions: Shift, Reduce-Left, Reduce-Right

11 Kai-Wei Chang (http://kwchang.net/talks/sp.html)

Credit: Google research blog

slide-12
SLIDE 12

What We Care about

12

50 55 60 65

Stanford Chen+ Ours (2012) Martschat+ Ours (2013) Fernandes+ HOTCoref Berkeley Ours (2015)

Prediction accuracy Training/test/dev speed

activity cooking agent woman food vegetable

Fairness (data biases) Learning signals

Kai-Wei Chang (http://kwchang.net/talks/sp.html)

slide-13
SLIDE 13

Outline

13

50 55 60 65

Stanford Chen+ Ours (2012) Martschat+ Ours (2013) Fernandes+ HOTCoref Berkeley Ours (2015)

Prediction accuracy Training/test/dev speed

activity cooking agent woman food vegetable

Fairness (data biases) Learning signals

Kai-Wei Chang (http://kwchang.net/talks/sp.html)

slide-14
SLIDE 14

Structured prediction application:

ESL Grammar Error Correction

[CoNLL 13, 14]

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 14

They believe that such situation must be avoided. O situation P a situation P situations O a situations

slide-15
SLIDE 15

Structured prediction application:

Algebra Word Problems [EMNLP 16]

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 15

Problem: Equations: Solution: 𝑛 = 40,π‘œ = 10

slide-16
SLIDE 16

Structured prediction application:

Co-reference Resolution

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 16

Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book

slide-17
SLIDE 17

Performance*

Proposed a novel, principled, linguistically motivated model

Structured prediction application:

Co-reference Resolution

[EMNLP 13a, ICML14, CoNLL 11,12, 15]

50 55 60 65

Stanford Chen+ Ours (2012) Martschat+ Ours (2013) Fernandes+ HOTCoref Berkeley Ours (2015)

17

Winner of the CoNLL ST 11 Winner of the CoNLL ST 12

*Avg ( MUC, B3, CEAF )

Latent forest structure

The state-of-the-artapproachusingNN&RL achieves 65.73(Clark+16)

slide-18
SLIDE 18

Co-reference Resolution Demo

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 18

http://bit.ly/illinoisCoref

slide-19
SLIDE 19

Co-reference Resolution

v Learn a pairwise similarity measure (local predictor)

Example features:

v same sub-string?

v

positions in the paragraph

v

  • ther 30+ feature types

v Key components:

v Pairwise classification v Clustering (jointly or not?)

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 19

Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the

  • Pooh. As a boy, Chris

lived in a pretty home called Cotchfield

  • Farm. When Chris

was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to

  • read. Mr. Robin then

wrote a book

slide-20
SLIDE 20

Decoupling Approach

A heuristic to learn the model [Soon+ 01, Bengtson+

08,CoNLL11]

v Decouple learning and inference:

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 20

Learn a pairwise similarity function Cluster based on this function

slide-21
SLIDE 21

Decoupling Approach-Learning

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 21

As a boy, Chris1 lived in a pretty home called Cotchfield

  • Farm. When Chris2 was three years old, his father3 wrote a

poem about him4. The poem was printed in a magazine for

  • thers to read. Mr. Robin5 then wrote a book

Positive Samples

(Chris1, him4)

(Chris2, him4)

(Chris1, Chris2) (his father3, Mr. Robin5)

Negative Samples

(Chris1, his father3) (Chris2, his father3) (him4, his father3) (Chris1, Mr. Robin5) (Chris2, Mr. Robin5) (him4, Mr. Robin5)

slide-22
SLIDE 22

Greedy Best-Left-Link Clustering

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 22

[Bill Clinton], recently elected as the [President of the USA], has been invited by the [Russian President], [Vladimir Putin], to visit [Russia]. [President Clinton] said that [he] looks forward to strengthening ties between [USA] and [Russia].

slide-23
SLIDE 23

Greedy Best-Left-Link Clustering

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 23

[Bill Clinton], recently elected as the [President of the USA], has been invited by the [Russian President], [Vladimir Putin], to visit [Russia]. [President Clinton] said that [he] looks forward to strengthening ties between [USA] and [Russia].

slide-24
SLIDE 24

Greedy Best-Left-Link Clustering

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 24

[Bill Clinton], recently elected as the [President of the USA], has been invited by the [Russian President], [Vladimir Putin], to visit [Russia]. [President Clinton] said that [he] looks forward to strengthening ties between [USA] and [Russia].

slide-25
SLIDE 25

Greedy Best-Left-Link Clustering

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 25

[Bill Clinton], recently elected as the [President of the USA], has been invited by the [Russian President], [Vladimir Putin], to visit [Russia]. [President Clinton] said that [he] looks forward to strengthening ties between [USA] and [Russia]. [Russia]. [Russia]. [Vladimir Putin] [Bill Clinton] [President of the USA] [Russian President] [President Clinton]

Best Left-Linking Forest

[Soon+ 01, Bengtson+ 08, CoNLL11]

[he]

slide-26
SLIDE 26

Challenges

v Decoupling may lose information

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 26

Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book

slide-27
SLIDE 27

Challenges

v In addition, we need world knowledge

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 27

As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years

  • ld, his father wrote a poem about him.
  • 1. Complexity: need an efficient algorithm
  • 2. Modeling: learn the metric while clustering
  • 3. Knowledge: augment with knowledge
slide-28
SLIDE 28

Structured Learning Approach

Learn the similarity function while clustering

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 28

Cluster based on this function. Update the similarity function

slide-29
SLIDE 29

Attempt: All-Links Clustering

[Mccallum+ 04, CoNLL 11]

v Define a global scoring function:

Attempt: using all within-cluster pairs: vInference problem is too hard

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 29

Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book

slide-30
SLIDE 30

Latent Left-Linking Model (L3M)

[ICML 14, EMNLP 13]

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 30

Score (a clustering C) = Score (the best left-linking forest that is consistent with C) = βˆ‘ Score of edges in the forests

Christopher Robin is alive and well. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book

slide-31
SLIDE 31

Linguistic Constraints

v Must-link constraints: vE.g., SameProperName, … v Cannot-link constraints: vE.g., ModifierMismatch, …

vClustering with constraints[(Basu+08, Zhi+14]

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 31

[Bill Clinton], recently elected as the [President of the USA], has been invited by the [Russian President], [Vladimir Putin], to visit [Russia]. [President Clinton] said that [he] looks forward to strengthening ties between [USA] and [Russia].

slide-32
SLIDE 32

Inference in L3M [ICML 14, EMNLP 13]

v Represented using an ILP formulation[Scott+

2004/2007]

v Inference can be done using a greedy heuristics.

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 32

arg max

𝒛 βˆ‘ 𝑇4,5 𝑧4,5 7

𝑑.𝑒 𝐡𝒛 ≀ 𝑐; 𝑧4,5 ∈ {0,1} 𝑧4,5= 1 ⇔ 𝑗, π‘˜ is an edge in the forest

  • Modeling constraints
  • Linguistic constraints
slide-33
SLIDE 33

Learning L3M (simplified version)[ICML 14, EMNLP 13a]

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 33

predicted forest latent forest

[Bill Clinton], recently elected as the [President of the USA], has been invited by the [Russian President], [Vladimir Putin], to visit [Russia]. [President Clinton] said that [he] looks forward to strengthening ties between [USA] and [Russia]. [Bill Clinton], recently elected as the [President of the USA], has been invited by the [Russian President], [Vladimir Putin], to visit [Russia]. [President Clinton] said that [he] looks forward to strengthening ties between [USA] and [Russia].

slide-34
SLIDE 34

Learning L3M (simplified version)[ICML 14, EMNLP 13a]

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 34

predicted forest latent forest

Loop until stopping condition is met: For each π’š4,𝒛4 pair: 𝒛 F, π’Š F = arg max

𝒛,π’Š 𝒙I𝜚(π’š4,𝒛, π’Š)

𝐒𝐣 = argmax

π’Š

𝒙I𝜚(π’š4,𝒛𝒋, π’Š) 𝒙 ← 𝒙 + πœƒ(𝜚 π’š4,𝒛𝒋, π’Š4 βˆ’ 𝜚 π’š4,𝒛 F, π’Š F ), πœƒ: learning rate

slide-35
SLIDE 35

Extension: Probabilistic L3M

[ICML 14, EMNLP 13a]

v Define a log-linear model

Pr [a clustering C] = βˆ‘ Pr [forests that are consistent with C] = βˆ‘βˆ‘ Pr [edges in the forest] Pr [edge] ~ Pr [βˆ‘ exp(𝒙 β‹… 𝜚(𝑗, π‘˜)/𝛿)

5∈Z

] (Β° : a parameter)

v Regularized Maximum Log-Likelihood Estimation: v Inspired by [McCallum04, Quattoni07]

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 35

Pr [a clustering C] = βˆ‘ Pr [forests that are consistent with C] = βˆ‘Ξ  Pr [edges in the forest] = Ξ 4 βˆ‘ Pr [edge(j,i) ]

5∈Z(4)

Pr [edge(j,i)] ~ exp(𝒙 β‹… 𝜚(π‘˜, 𝑗)/𝛿) ( 𝛿: a parameter)

min

𝐱

LL(w) = 𝛾| 𝐱 |a + βˆ‘ logπ‘Že(𝒙)

e

βˆ’ βˆ‘ βˆ‘ log(βˆ‘ exp (𝒙 β‹… 𝜚(𝑗, π‘˜)

5f4

/𝛿

4 e

)𝐷e(𝑗, π‘˜))

slide-36
SLIDE 36

Coreference: OntoNotes-5.0 (with gold mentions) 72 73 74 75 76 77 78

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 36

Decoupled L3M Probabilistic L3M

Performance*

*Avg ( MUC, B3, CEAF )

Better

slide-37
SLIDE 37

Latent Left-Linking Model (L3M)

[ICML 14, EMNLP 13]

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 37

Advantages:

  • Complexity: Very efficient
  • Modeling: Learn the metric while clustering
  • Knowledge: Easy to incorporate constraints

(must-link or cannot-link) Can be applied to other supervised clustering problems! e.g., the posts in a forum, error reports from users …

slide-38
SLIDE 38

38

50 55 60 65

Stanford Chen+ Ours (2012) Martschat+ Ours (2013) Fernandes+ HOTCoref Berkeley Ours (2015)

Prediction accuracy Training/test/dev speed

activity cooking agent woman food vegetable

Fairness (data biases) Learning signals

Outline

slide-39
SLIDE 39

Solution Methods

v Assume a graphical structure; optimize

vThree ideas for improving learning/inference speed vSee our AAAI16 tutorial (https://goo.gl/TF7cGj)

v Learning to search approaches

vA programmable framework vSee our NAACL15 tutorials (http://hunch.net/~l2s)

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 39

slide-40
SLIDE 40

Graphical model approach: Speed up Inference/Learning

v Observation 1: some decisions are simpler than the others v Idea: adaptively generate computationally costly features during test-time [AAAI 17]

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 40

slide-41
SLIDE 41

Graphical model approach: Speed up Inference/Learning

v Observation 2: Many inference problems share the same solution v Idea: Exploit this redundancy by caching old inference solutions [AAAI 15]

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 41

slide-42
SLIDE 42

Amortized inference – key components

v Formulating the inference as an Integer Linear Programming vA very general formulation [Roth & Yih 04, Sontag 10] vInference can be solved by any (exact or approximate) method v A condition is being checked to determine if a new inference problem has the same solution as a previously observed problem. [Srikumar+ 12; Kundu+ 13]

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 42

arg max

π’›βˆˆ h,i j βˆ‘ 𝑇7𝑧7 7

𝑑. 𝑒 𝐡𝒛 ≀ 𝑐

slide-43
SLIDE 43

Graphical model approach: Speed up Inference/Learning

v Observation 3: Inference can be solved in parallel v Idea: Decouple inference and learning in the dual space v Works both in the multi-thread [ECML13] and the multi-machines [NIPS OPT 15, journal in preparation] settings

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 43

slide-44
SLIDE 44

Learning to search (L2S) approaches

  • 1. Define a search space and features
  • 2. Construct a reference policy (Ref) based
  • n the gold label
  • 3. Learning a policy that imitates Ref

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 44

slide-45
SLIDE 45

Credit Assignment Problem

When making a mistake, which local decision should be blamed? Existing L2S algorithms give

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 45

sentence

slide-46
SLIDE 46

Learning to search approaches: Credit Assignment Compiler [NIPS16] v Write the decoder, providing some side information for training v Library functions:

vpredict: returns individual predictions. vloss: declares the joint loss.

v An analogy to Factorie [McCallum+09]

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 46

slide-47
SLIDE 47

Credit Assignment Compiler [NIPS 16]

v Runs Run() many times to learn predict() that yields low loss(). Γ° turns Run() and training data into model updates v Reduce a joint prediction problem to (cost- sensitive) multi-class problems.

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 47

slide-48
SLIDE 48

Libraries for Structured Predictions

v Illinois-SL: graph-based structured prediction

vSupport various algorithms; parallel β‡’ very fast

v Vowpal-Wabbit: credit assignment compiler

vA general online learning library vSupport search-based structured prediction

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 48

Provide a nice platform

  • for developing novel methods
  • for collaboration
  • for education

More easy-access tools; More collaborations

slide-49
SLIDE 49

49

50 55 60 65

Stanford Chen+ Ours (2012) Martschat+ Ours (2013) Fernandes+ HOTCoref Berkeley Ours (2015)

Prediction accuracy Training/test speed

activity cooking agent woman food vegetable

Fairness (data biases) Learning signals

Outline

slide-50
SLIDE 50

Weak Supervision Challenges

[CRII grant]

v Implicit Supervision

vLoss is not decomposable and can be estimated only when the entire output structure is derived

v Structured Contextual Bandit

vOnly a few (single) structured labels can be

  • bserved.

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 50

slide-51
SLIDE 51

Implicit Supervision

v Consider algebra word problem v Build semantic parser to translate question to an equation system v Then answer can be derived: m=40, n=10

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 51

slide-52
SLIDE 52

Implicit Supervision [EMNLP 16]

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 52

m=40, n=10

slide-53
SLIDE 53

Structured Contextual Bandit Setting

[ICML15]

v Loss of only a single structured label can be observed

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 53

slide-54
SLIDE 54

A Search Problem

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 54

slide-55
SLIDE 55

55

50 55 60 65

Stanford Chen+ Ours (2012) Martschat+ Ours (2013) Fernandes+ HOTCoref Berkeley Ours (2015)

Prediction accuracy Training/test speed

activity cooking agent woman food vegetable

Fairness (data biases) Learning signals

Outline

slide-56
SLIDE 56

Human Bias in Structured model

[in submission]

v A visual semantic role labeling system

[Mark+16]

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 56

activity cooking agent woman food vegetable container bowl tool knife place kitchen

slide-57
SLIDE 57

Word Embeddings can be Dreadfully Sexist

[nips16]

v 𝑀mno βˆ’ 𝑀pqmno + 𝑀ro7sZ ∼ 𝑀nrou

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 57

slide-58
SLIDE 58

Debiasing Learning Models

v Idea1: Remove problematic correlation

vE.g., remove gender bias subspace in WE

v Idea2: Set corpus-wise constraints to calibrate the gender ratios

vTechnique: Inference can be done by Lagrange relaxation

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 58

slide-59
SLIDE 59

Structured Prediction – an active direction v Landscape of methods in Deepβ‹‚Structure

vDeep learning/hidden representation e.g., seq2seq, RNN, SP-energy network vDeep features, traditional factor graph inference e.g., LSTM+CRF, graph transformer networks,

v What is the right way to encode structures?

vHow to constrain the output

vHow can we leverage different learning signals?

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 59

slide-60
SLIDE 60

Conclusions

Goal: Practical Structured Prediction Approaches

Tutorials/Workshops:

  • 1. AAAI-16: Learning and Inference in SP Models
  • 2. NAACL15: Hands-on Learning to Search for SP
  • 3. EMNLP 16, 17: workshop SP for NLP

References/Code/Demos: http://kwchang.net Illinois-SL: a structured learning package Vowpal Wabbit: an online learning library

Kai-Wei Chang (http://kwchang.net/talks/sp.html) 60