A General-Purpose Algorithm for Constrained Sequential Inference - - PowerPoint PPT Presentation

a general purpose algorithm for constrained sequential
SMART_READER_LITE
LIVE PREVIEW

A General-Purpose Algorithm for Constrained Sequential Inference - - PowerPoint PPT Presentation

A General-Purpose Algorithm for Constrained Sequential Inference Daniel Deutsch,* Shyam Upadhyay,* and Dan Roth *equal contribution Co-Authors Shyam Upadhyay Dan Roth 2 Structured Prediction Structured prediction is everywhere


slide-1
SLIDE 1

A General-Purpose Algorithm for Constrained Sequential Inference

Daniel Deutsch,* Shyam Upadhyay,* and Dan Roth

*equal contribution

slide-2
SLIDE 2

2

Co-Authors

Dan Roth Shyam Upadhyay

slide-3
SLIDE 3

3

Structured Prediction

  • Structured prediction is everywhere

– Parsing, tagging, generation

  • Inference is hard: exponentially large search space
  • Not all outputs are “valid”
slide-4
SLIDE 4

4

Valid Output Structures: Parsing

Input: Gold Parse:

(S (NP XX ) (VP XX (NP XX ) ) )

John kissed Mary

slide-5
SLIDE 5

5

Valid Output Structures: Parsing

Input: Gold Parse: Invalid Parse:

(S (NP XX ) (VP XX (NP XX ) ) ) (S (NP ) (NP ) (VP XX XX (NP XX ) ) )

Empty phrase John kissed Mary

slide-6
SLIDE 6

6

Valid Output Structures: Parsing

Input: Gold Parse: Invalid Parse:

(S (NP XX ) (VP XX (NP XX ) ) ) (S (NP ) (NP ) (VP XX XX (NP XX ) ) ) (S (VP XX XX (NP XX XX ) ) )

Empty phrase Incorrect number of pre-terminals Invalid Parse: John kissed Mary

slide-7
SLIDE 7

7

Valid Output Structures: Parsing

Input: Gold Parse: Invalid Parse:

(S (NP XX ) (VP XX (NP XX ) ) ) (S (NP ) (NP ) (VP XX XX (NP XX ) ) ) (S (VP XX XX (NP XX XX ) ) ) (S (NP XX ) (VP XX (NP XX ) ) ) )

Empty phrase Incorrect number of pre-terminals Unbalanced parentheses Invalid Parse: Invalid Parse: John kissed Mary

slide-8
SLIDE 8

8

Valid Output Structures: Semantic Role Labelling

Input: Gold Tags: Bob Smith gave a flower to Alice Jones A0 A0 O O A1 O A2 A2

A0 ≈ agent A1 ≈ patient …

slide-9
SLIDE 9

9

Valid Output Structures: Semantic Role Labelling

Input: Gold Tags: Bob Smith gave a flower to Alice Jones A0 A0 O O A1 O A2 A2 Invalid Tags: Duplicate A0 A0 A0 O O A0 O A2 A2

slide-10
SLIDE 10

10

Valid Output Structures: Semantic Role Labelling

Input: Gold Tags: Legal Args: A0, A1, A2 (from Propbank for “gave”) Bob Smith gave a flower to Alice Jones A0 A0 O O A1 O A2 A2 Invalid Tags: Duplicate A0 A0 A0 O O A0 O A2 A2

slide-11
SLIDE 11

11

Valid Output Structures: Semantic Role Labelling

Input: Gold Tags: Legal Args: A0, A1, A2 (from Propbank for “gave”) Bob Smith gave a flower to Alice Jones A0 A0 O O A1 O A2 A2 Invalid Tags: Duplicate A0 A0 A0 O O A0 O A2 A2 Invalid Tags: Illegal argument A3 A3 A3 O O A1 O A2 A2

slide-12
SLIDE 12

12

Valid Output Structures: Semantic Role Labelling

Input: Gold Tags: Spans: Legal Args: A0, A1, A2 (from Propbank for “gave”) Bob Smith gave a flower to Alice Jones A0 A0 O O A1 O A2 A2 [Bob Smith] gave a flower to [Alice Jones] Invalid Tags: Duplicate A0 A0 A0 O O A0 O A2 A2 Invalid Tags: Illegal argument A3 A3 A3 O O A1 O A2 A2

slide-13
SLIDE 13

13

Valid Output Structures: Semantic Role Labelling

Input: Gold Tags: Spans: Legal Args: A0, A1, A2 (from Propbank for “gave”) Bob Smith gave a flower to Alice Jones A0 A0 O O A1 O A2 A2 [Bob Smith] gave a flower to [Alice Jones] Invalid Tags: Duplicate A0 A0 A0 O O A0 O A2 A2 Invalid Tags: Illegal argument A3 A3 A3 O O A1 O A2 A2 Invalid Tags: Spans not respected A0 O O O A1 O A2 A2

slide-14
SLIDE 14

14

Sequential Inference

  • Many tasks have converged
  • n the same solution

– Assume output structure is a sequence – Beam search from left-to-right – Seq2Seq

  • How are structural

constraints enforced?

The red cat El gato rojo

slide-15
SLIDE 15

15

Enforcing Constraints

  • No single way to enforce constraints

– Not enforced at all (Lample et al., 2016; Choe and Charniak, 2016; Suhr et al., 2018) – Post-hoc (Andreas et al., 2013; Vinyals et al., 2015; Upadhyay et al., 2018) – Custom inference algorithm (Zhu et al., 2013)

slide-16
SLIDE 16

16

This Work

  • Propose a generic algorithm for enforcing constraints in

sequential inference

  • Abstracts over many custom inference procedures in NLP
  • Built on expressing constraints as automata
  • Automata guides inference to valid outputs
slide-17
SLIDE 17

17

Representing Constraints

  • Represent a constraint with an automaton

– Finite-state automata and push-down automata

  • Automaton’s language is all of the valid output

structures that satisfy the constraint

  • Great automaton libraries

– OpenFST (Allauzen et al., 2007), Pynini (Gorman, 2016)

slide-18
SLIDE 18

18

Representing Constraints as Automata

[Alice Smith] gave flowers to [Bob Jones]

O A0 A1 A2 A3

Σ Σ Σ Σ

O A0 A1 A2 A3 O A0 A1 A2 A3 O A0 A1 A2 A3

Σ Σ Σ

Σ = Any tag

slide-19
SLIDE 19

19

Constrained Inference

  • Automaton will be

traversed in lock-step with beam search, guiding inference to find a valid structure

Inference

yi+1

<latexit sha1_base64="ue18fuwzYNKaZT5v9WT2QLrWMI=">AB7nicbVBNS8NAEJ3Ur1q/qh69LBZBEqigh6LXjxWsLbQhrLZTtqlm03Y3Qgh9Ed48aCIV3+PN/+N2zYHbX0w8Hhvhpl5QSK4Nq7ZRWVtfWN8qbla3tnd296v7Bo45TxbDFYhGrTkA1Ci6xZbgR2EkU0igQ2A7Gt1O/YRK81g+mCxBP6JDyUPOqLFSO+vn/Myb9Ks1t+7OQJaJV5AaFGj2q1+9QczSCKVhgmrd9dzE+DlVhjOBk0ov1ZhQNqZD7FoqaYTaz2fnTsiJVQYkjJUtachM/T2R0jrLApsZ0TNSC96U/E/r5ua8NrPuUxSg5LNF4WpICYm09/JgCtkRmSWUKa4vZWwEVWUGZtQxYbgLb68TB7P695F3b2/rDVuijKcATHcAoeXED7qAJLWAwhmd4hTcncV6cd+dj3lpyiplD+APn8wcA949X</latexit>

yi−1

<latexit sha1_base64="6yO3IUclX4/tm74nTaF9Ym+08fg=">AB7nicbVBNS8NAEJ3Ur1q/qh69LBbBiyVRQY9FLx4rWFtoQ9lsJ+3SzSbsboQ+iO8eFDEq7/Hm/GbZuDtj4YeLw3w8y8IBFcG9f9dkorq2vrG+XNytb2zu5edf/gUcepYthisYhVJ6AaBZfYMtwI7CQKaRQIbAfj26nfkKleSwfTJagH9Gh5CFn1FipnfVzfuZN+tWaW3dnIMvEK0gNCjT71a/eIGZphNIwQbXuem5i/Jwqw5nASaWXakwoG9Mhdi2VNELt57NzJ+TEKgMSxsqWNGSm/p7IaR1FgW2M6JmpBe9qfif101NeO3nXCapQcnmi8JUEBOT6e9kwBUyIzJLKFPc3krYiCrKjE2oYkPwFl9eJo/nde+i7t5f1ho3RxlOIJjOAUPrqABd9CEFjAYwzO8wpuTOC/Ou/Mxby05xcwh/IHz+QMEA49Z</latexit>

yi

<latexit sha1_base64="fA+ZJK1EyQbvufKNnkf6r0xzIAs=">AB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lU0GPRi8cKpi20oWy2m3bpZhN2J0IJ/Q1ePCji1R/kzX/jts1BWx8MPN6bYWZemEph0HW/ndLa+sbmVnm7srO7t39QPTxqmSTjPskYnuhNRwKRT3UaDknVRzGoeSt8Px3cxvP3FtRKIecZLyIKZDJSLBKFrJn/RzMe1Xa27dnYOsEq8gNSjQ7Fe/eoOEZTFXyCQ1pu5KQY51SiY5NKLzM8pWxMh7xrqaIxN0E+P3ZKzqwyIFGibSkc/X3RE5jYyZxaDtjiOz7M3E/7xuhtFNkAuVZsgVWyKMkwIbPyUBozlBOLKFMC3srYSOqKUObT8WG4C2/vEpaF3Xvsu4+XNUat0UcZTiBUzgHD6hAfQB8YCHiGV3hzlPivDsfi9aSU8wcwx84nz8n47n</latexit>

ˆ y

<latexit sha1_base64="cKmPdsLA9KyE329ijh5al0YHK8=">AB+XicbVDLSsNAFL2pr1pfUZduBovgqiQq6LoxmUF+4AmlMl0g6dTMLMpBC/sSNC0Xc+ifu/BsnbRZaPTBwOde7pkTJwp7ThfVm1tfWNzq7d2Nnd2z+wD496Kk4loV0S81gOAqwoZ4J2NdOcDhJcRw2g9md6Xfn1OpWCwedZQP8ITwUJGsDbSyLa9Kda5F2E9DcI8K4qR3XRazgLoL3Er0oQKnZH96Y1jkZUaMKxUkPXSbSfY6kZ4bRoeKmiCSYzPKFDQwWOqPLzRfICnRljMJYmic0Wqg/N3IcKZVFgZksI6pVrxT/84apDm/8nIk1VSQ5aEw5UjHqKwBjZmkRPMEwkM1kRmWKJiTZlNUwJ7uqX/5LeRcu9bDkPV832bVHU7gFM7BhWtowz10oAsE5vAEL/Bq5daz9Wa9L0drVrVzDL9gfXwDWDWUHQ=</latexit>

Model

… …

pθ(yi+1 | x, ˆ y1:i)

<latexit sha1_base64="ZaKfDt4t2ZNOVcZXF2PYOFfdmdk=">ACInicbVDJSgNBEO2JW4xb1KOXxiBElDCjgsp6MVjBLNAJgw9nZ6kSc9Cd404DPMtXvwVLx4U9ST4MXYWRBMfdPN4r4qem4kuALT/DRyc/MLi0v5cLK6tr6RnFzq6HCWFJWp6EIZcsligkesDpwEKwVSUZ8V7CmO7ga+s07JhUPg1tItbxS/gHqcEtOQUzyPHhj4DUk6clB9YGbZ93tUfgb7rpfZ4Q+3+wTSJMuc1Lrg2b5TLJkVcwQ8S6wJKaEJak7x3e6GNPZAFQpdqWGUEnJRI4FSwr2LFiEaED0mNtTQPiM9VJRydmeE8rXeyFUr8A8Ej93ZESX6nEd3XlcF017Q3F/7x2DN5ZJ+VBFAML6HiQFwsMIR7mhbtcMgoi0YRQyfWumPaJBR0qgUdgjV98ixpHFWs4p5c1KqXk7iyKMdtIvKyEKnqIquUQ3VEUP6Am9oFfj0Xg23oyPcWnOmPRsoz8wvr4Bu+ikaQ=</latexit>

˜ pθ(yi+1 | x, ˆ y1:i)

<latexit sha1_base64="bAs4JQ0t0NT2RFEW9FikJClOLb8=">ACKnicbVDJSgQxE27O26jHr0EB0FRhm4VFE8uF48KjgrTQ5NOVzth0gtJtdiE/h4v/oXD4p49UPMjIO4PUh4vFdFVb0wl0Kj6746I6Nj4xOTU9O1mdm5+YX64tKlzgrFocUzmanrkGmQIoUWCpRwnStgSjhKuyd9P2rW1BaZOkFljl0EnaTilhwhlYK6kc+ChmByavAxy4gWy8DIza9ivqJiOzHsBvG5q7a+uJ+l6Epqyow3oGoNoJ6w26A9C/xBuSBhniLKg/+VHGiwRS5Jp3fbcHDuGKRcQlXzCw054z12A21LU5aA7pjBqRVds0pE40zZlyIdqN87DEu0LpPQVvbX1b+9vif1y4w3u8YkeYFQso/B8WFpJjRfm40Ego4ytISxpWwu1LeZYpxtOnWbAje75P/ksvtprfTdM93G4fHwzimyApZJevEI3vkJySM9IinNyTR/JMXpwH58l5d4+S0ecYc8y+QHn/QPbzKgf</latexit>

Constraints Ax(yi+1 | ˆ y1:i)

<latexit sha1_base64="zELc0MiJaBdSkCg4kh4vLzZCEM=">ACJXicbVDLSgMxFM34rPVdekmWISKUGZEUMRF1Y3LCvYBnWHIpJk2NPMgyYhDmJ9x46+4cWERwZW/YqYdQVsPBE7OuZd7/FiRoU0zU9jYXFpeW1tFZe39jc2q7s7LZFlHBMWjhiEe96SBGQ9KSVDLSjTlBgcdIxvd5H7ngXBo/BepjFxAjQIqU8xklpyK5d2gOQI6auMldNPp6vHrOslrqKHluZHdA+/NHtIZIqzXSldUGzI7dSNevmBHCeWAWpgJNtzK2+xFOAhJKzJAQPcuMpaMQlxQzkpXtRJAY4REakJ6mIQqIcNTkygweaqUP/YjrF0o4UX93KBQIkQaerszXFbNeLv7n9RLpnzuKhnEiSYing/yEQRnBPDLYp5xgyVJNEOZU7wrxEHGEpQ62rEOwZk+eJ+2TumXWrbvTauO6iKME9sEBqAELnIEGuAVN0AIYPIEX8AbGxrPxarwbH9PSBaPo2QN/YHx9A8bup4=</latexit><latexit sha1_base64="zELc0MiJaBdSkCg4kh4vLzZCEM=">ACJXicbVDLSgMxFM34rPVdekmWISKUGZEUMRF1Y3LCvYBnWHIpJk2NPMgyYhDmJ9x46+4cWERwZW/YqYdQVsPBE7OuZd7/FiRoU0zU9jYXFpeW1tFZe39jc2q7s7LZFlHBMWjhiEe96SBGQ9KSVDLSjTlBgcdIxvd5H7ngXBo/BepjFxAjQIqU8xklpyK5d2gOQI6auMldNPp6vHrOslrqKHluZHdA+/NHtIZIqzXSldUGzI7dSNevmBHCeWAWpgJNtzK2+xFOAhJKzJAQPcuMpaMQlxQzkpXtRJAY4REakJ6mIQqIcNTkygweaqUP/YjrF0o4UX93KBQIkQaerszXFbNeLv7n9RLpnzuKhnEiSYing/yEQRnBPDLYp5xgyVJNEOZU7wrxEHGEpQ62rEOwZk+eJ+2TumXWrbvTauO6iKME9sEBqAELnIEGuAVN0AIYPIEX8AbGxrPxarwbH9PSBaPo2QN/YHx9A8bup4=</latexit><latexit sha1_base64="zELc0MiJaBdSkCg4kh4vLzZCEM=">ACJXicbVDLSgMxFM34rPVdekmWISKUGZEUMRF1Y3LCvYBnWHIpJk2NPMgyYhDmJ9x46+4cWERwZW/YqYdQVsPBE7OuZd7/FiRoU0zU9jYXFpeW1tFZe39jc2q7s7LZFlHBMWjhiEe96SBGQ9KSVDLSjTlBgcdIxvd5H7ngXBo/BepjFxAjQIqU8xklpyK5d2gOQI6auMldNPp6vHrOslrqKHluZHdA+/NHtIZIqzXSldUGzI7dSNevmBHCeWAWpgJNtzK2+xFOAhJKzJAQPcuMpaMQlxQzkpXtRJAY4REakJ6mIQqIcNTkygweaqUP/YjrF0o4UX93KBQIkQaerszXFbNeLv7n9RLpnzuKhnEiSYing/yEQRnBPDLYp5xgyVJNEOZU7wrxEHGEpQ62rEOwZk+eJ+2TumXWrbvTauO6iKME9sEBqAELnIEGuAVN0AIYPIEX8AbGxrPxarwbH9PSBaPo2QN/YHx9A8bup4=</latexit><latexit sha1_base64="zELc0MiJaBdSkCg4kh4vLzZCEM=">ACJXicbVDLSgMxFM34rPVdekmWISKUGZEUMRF1Y3LCvYBnWHIpJk2NPMgyYhDmJ9x46+4cWERwZW/YqYdQVsPBE7OuZd7/FiRoU0zU9jYXFpeW1tFZe39jc2q7s7LZFlHBMWjhiEe96SBGQ9KSVDLSjTlBgcdIxvd5H7ngXBo/BepjFxAjQIqU8xklpyK5d2gOQI6auMldNPp6vHrOslrqKHluZHdA+/NHtIZIqzXSldUGzI7dSNevmBHCeWAWpgJNtzK2+xFOAhJKzJAQPcuMpaMQlxQzkpXtRJAY4REakJ6mIQqIcNTkygweaqUP/YjrF0o4UX93KBQIkQaerszXFbNeLv7n9RLpnzuKhnEiSYing/yEQRnBPDLYp5xgyVJNEOZU7wrxEHGEpQ62rEOwZk+eJ+2TumXWrbvTauO6iKME9sEBqAELnIEGuAVN0AIYPIEX8AbGxrPxarwbH9PSBaPo2QN/YHx9A8bup4=</latexit>
slide-20
SLIDE 20

20

Constrained Inference

Alice

O A0 A1 A2 A3

Smith

O A0 A1 A2 A3

gave

O A0 A1 A2 A3

flowers

O A0 A1 A2 A3

to

O A0 A1 A2 A3

Bob

O A0 A1 A2 A3

Jones

O A0 A1 A2 A3

Σ = Any tag

slide-21
SLIDE 21

21

Constrained Inference

Alice

O A0 A1 A2 A3

Smith

O A0 A1 A2 A3

gave

O A0 A1 A2 A3

flowers

O A0 A1 A2 A3

to

O A0 A1 A2 A3

Bob

O A0 A1 A2 A3

Jones

O A0 A1 A2 A3

O A0 A2 A3

Σ = Any tag

A1

slide-22
SLIDE 22

22

Constrained Inference

Alice

O A0 A1 A2 A3

Smith

O A0 A1 A2 A3

gave

O A0 A1 A2 A3

flowers

O A0 A1 A2 A3

to

O A0 A1 A2 A3

Bob

O A0 A1 A2 A3

Jones

O A0 A1 A2 A3

A0

Σ = Any tag

slide-23
SLIDE 23

23

Constrained Inference

Alice

O A0 A1 A2 A3

Smith

O A0 A1 A2 A3

gave

O A0 A1 A2 A3

flowers

O A0 A1 A2 A3

to

O A0 A1 A2 A3

Bob

O A0 A1 A2 A3

Jones

O A0 A1 A2 A3

Σ

Σ = Any tag

slide-24
SLIDE 24

24

Constrained Inference

Alice

O A0 A1 A2 A3

Smith

O A0 A1 A2 A3

gave

O A0 A1 A2 A3

flowers

O A0 A1 A2 A3

to

O A0 A1 A2 A3

Bob

O A0 A1 A2 A3

Jones

O A0 A1 A2 A3

Σ

Σ = Any tag

slide-25
SLIDE 25

25

Constrained Inference

Alice

O A0 A1 A2 A3

Smith

O A0 A1 A2 A3

gave

O A0 A1 A2 A3

flowers

O A0 A1 A2 A3

to

O A0 A1 A2 A3

Bob

O A0 A1 A2 A3

Jones

O A0 A1 A2 A3

Σ

Σ = Any tag

slide-26
SLIDE 26

26

Constrained Inference

Alice

O A0 A1 A2 A3

Smith

O A0 A1 A2 A3

gave

O A0 A1 A2 A3

flowers

O A0 A1 A2 A3

to

O A0 A1 A2 A3

Bob

O A0 A1 A2 A3

Jones

O A0 A1 A2 A3

O A0 A1 A2 A3

Σ = Any tag

slide-27
SLIDE 27

27

Constrained Inference

Alice

O A0 A1 A2 A3

Smith

O A0 A1 A2 A3

gave

O A0 A1 A2 A3

flowers

O A0 A1 A2 A3

to

O A0 A1 A2 A3

Bob

O A0 A1 A2 A3

Jones

O A0 A1 A2 A3

A2

Σ = Any tag

slide-28
SLIDE 28

28

Constrained Inference

Alice

O A0 A1 A2 A3

Smith

O A0 A1 A2 A3

gave

O A0 A1 A2 A3

flowers

O A0 A1 A2 A3

to

O A0 A1 A2 A3

Bob

O A0 A1 A2 A3

Jones

O A0 A1 A2 A3

Σ = Any tag

slide-29
SLIDE 29

29

Handling Multiple Constraints

  • What about multiple constraints?
slide-30
SLIDE 30

30

Handling Multiple Constraints

  • What about multiple constraints?
  • Can’t traverse jointly -- Need to intersect all of the

automata

slide-31
SLIDE 31

31

Handling Multiple Constraints

  • What about multiple constraints?
  • Can’t traverse jointly -- Need to intersect all of the

automata

  • Intersecting automata can be expensive
slide-32
SLIDE 32

32

Handling Multiple Constraints

  • What about multiple constraints?
  • Can’t traverse jointly -- Need to intersect all of the

automata

  • Intersecting automata can be expensive
  • Idea: Intersect automata only as necessary (Tromble

and Eisner, 2006)

  • Maintain an active set of enforced constraints
slide-33
SLIDE 33

33

Active Set Algorithm

Input

John kissed Mary

slide-34
SLIDE 34

34

Active Set Algorithm

Balanced Parentheses Non-Empty Phrases Correct Num. Pre-T erminals Input

John kissed Mary

slide-35
SLIDE 35

35

Active Set Algorithm

Balanced Parentheses Non-Empty Phrases Correct Num. Pre-T erminals

Constraint Automaton Inference Output Violated Constraints

Input

John kissed Mary

slide-36
SLIDE 36

36

Active Set Algorithm

Balanced Parentheses Non-Empty Phrases Correct Num. Pre-T erminals

Constraint Automaton Inference Output Violated Constraints n/a Correct Num. Pre-Terminals

Input

John kissed Mary

(S (VP XX XX (NP XX XX ) ) )

slide-37
SLIDE 37

37

Active Set Algorithm

Balanced Parentheses Non-Empty Phrases Correct Num. Pre-T erminals

Constraint Automaton Inference Output Violated Constraints n/a Correct Num. Pre-Terminals Balanced Parentheses

Input

John kissed Mary

(S (NP XX ) (VP XX (NP XX ) ) ) ) (S (VP XX XX (NP XX XX ) ) )

slide-38
SLIDE 38

38

Active Set Algorithm

Balanced Parentheses Non-Empty Phrases Correct Num. Pre-T erminals

Constraint Automaton Inference Output Violated Constraints n/a

(S (VP XX XX (NP XX XX ) ) )

Correct Num. Pre-Terminals

(S (NP XX ) (VP XX (NP XX ) ) ) )

Balanced Parentheses

(S (NP XX ) (VP XX (NP XX ) ) )

None

Input

John kissed Mary

<latexit sha1_base64="xUyjzGODw4srTm1bL98hWM4Mcw=">AB63icbVA9SwNBEJ2LXzF+RS1tFoNgFe5E0DJoYxnBxEByhLnNJlmyu3fs7gnhyF+wsVDE1j9k579xL7lCEx8MPN6bYWZelAhurO9/e6W19Y3NrfJ2ZWd3b/+genjUNnGqKWvRWMS6E6FhgivWstwK1k0QxkJ9hNbnP/8Ylpw2P1YKcJCyWOFB9yijaXehSTfrXm1/05yCoJClKDAs1+9as3iGkqmbJUoDHdwE9smKG2nAo2q/RSwxKkExyxrqMKJTNhNr91Rs6cMiDWLtSlszV3xMZSmOmMnKdEu3YLHu5+J/XTe3wOsy4SlLF0sGqaC2Jjkj5MB14xaMXUEqebuVkLHqJFaF0/FhRAsv7xK2hf1wK8H95e1xk0RxlO4BTOIYAraMAdNKEFMbwDK/w5knvxXv3PhatJa+YOY/8D5/AP+kjI=</latexit><latexit sha1_base64="xUyjzGODw4srTm1bL98hWM4Mcw=">AB63icbVA9SwNBEJ2LXzF+RS1tFoNgFe5E0DJoYxnBxEByhLnNJlmyu3fs7gnhyF+wsVDE1j9k579xL7lCEx8MPN6bYWZelAhurO9/e6W19Y3NrfJ2ZWd3b/+genjUNnGqKWvRWMS6E6FhgivWstwK1k0QxkJ9hNbnP/8Ylpw2P1YKcJCyWOFB9yijaXehSTfrXm1/05yCoJClKDAs1+9as3iGkqmbJUoDHdwE9smKG2nAo2q/RSwxKkExyxrqMKJTNhNr91Rs6cMiDWLtSlszV3xMZSmOmMnKdEu3YLHu5+J/XTe3wOsy4SlLF0sGqaC2Jjkj5MB14xaMXUEqebuVkLHqJFaF0/FhRAsv7xK2hf1wK8H95e1xk0RxlO4BTOIYAraMAdNKEFMbwDK/w5knvxXv3PhatJa+YOY/8D5/AP+kjI=</latexit><latexit sha1_base64="xUyjzGODw4srTm1bL98hWM4Mcw=">AB63icbVA9SwNBEJ2LXzF+RS1tFoNgFe5E0DJoYxnBxEByhLnNJlmyu3fs7gnhyF+wsVDE1j9k579xL7lCEx8MPN6bYWZelAhurO9/e6W19Y3NrfJ2ZWd3b/+genjUNnGqKWvRWMS6E6FhgivWstwK1k0QxkJ9hNbnP/8Ylpw2P1YKcJCyWOFB9yijaXehSTfrXm1/05yCoJClKDAs1+9as3iGkqmbJUoDHdwE9smKG2nAo2q/RSwxKkExyxrqMKJTNhNr91Rs6cMiDWLtSlszV3xMZSmOmMnKdEu3YLHu5+J/XTe3wOsy4SlLF0sGqaC2Jjkj5MB14xaMXUEqebuVkLHqJFaF0/FhRAsv7xK2hf1wK8H95e1xk0RxlO4BTOIYAraMAdNKEFMbwDK/w5knvxXv3PhatJa+YOY/8D5/AP+kjI=</latexit><latexit sha1_base64="xUyjzGODw4srTm1bL98hWM4Mcw=">AB63icbVA9SwNBEJ2LXzF+RS1tFoNgFe5E0DJoYxnBxEByhLnNJlmyu3fs7gnhyF+wsVDE1j9k579xL7lCEx8MPN6bYWZelAhurO9/e6W19Y3NrfJ2ZWd3b/+genjUNnGqKWvRWMS6E6FhgivWstwK1k0QxkJ9hNbnP/8Ylpw2P1YKcJCyWOFB9yijaXehSTfrXm1/05yCoJClKDAs1+9as3iGkqmbJUoDHdwE9smKG2nAo2q/RSwxKkExyxrqMKJTNhNr91Rs6cMiDWLtSlszV3xMZSmOmMnKdEu3YLHu5+J/XTe3wOsy4SlLF0sGqaC2Jjkj5MB14xaMXUEqebuVkLHqJFaF0/FhRAsv7xK2hf1wK8H95e1xk0RxlO4BTOIYAraMAdNKEFMbwDK/w5knvxXv3PhatJa+YOY/8D5/AP+kjI=</latexit>

=

slide-39
SLIDE 39

39

Generality of Algorithm

  • Abstract over constraint-specific inference

algorithms:

– BIO tagging – Shift-reduce parsing – Require/disallow n-grams – Constraining verbalizations in text-to-speech (Zhang et al., 2019)

slide-40
SLIDE 40

40

Constituency Parsing Experiments

  • Seq2Seq constituency parsing on the Penn Treebank

John kissed Mary

(S (NP XX (NP XX …

slide-41
SLIDE 41

41

Necessity of Constraints

  • Do we need constraints at all?
  • Enforce three different constraints

– Balanced parentheses – Correct number of pre-terminals – No empty phrases

slide-42
SLIDE 42

42

Necessity of Constraints

20 40 60 80 100 40 60 80 100

Percentage of Training Data

Percent Satisfied

slide-43
SLIDE 43

43

Necessity of Constraints

20 40 60 80 100 40 60 80 100

Percentage of Training Data

Percent Satisfied

slide-44
SLIDE 44

44

Necessity of Constraints

20 40 60 80 100 40 60 80 100

Percentage of Training Data

Percent Satisfied

slide-45
SLIDE 45

45

Necessity of Constraints

20 40 60 80 100 40 60 80 100

Percentage of Training Data

Percent Satisfied

slide-46
SLIDE 46

46

Necessity of Constraints

20 40 60 80 100 40 60 80 100

Percentage of Training Data

Percent Satisfied

Correct number of pre- terminals isn’t learned with full Penn Treebank

slide-47
SLIDE 47

47

Inference Algorithm Comparison

  • Compare how well different algorithms satisfy

constraints

– Unconstrained – Constrained – Post-Hoc

slide-48
SLIDE 48

48

Post-Hoc Inference

  • Run unconstrained beam search with a beam of size k
  • Return highest-scoring valid output

(S (NP XX ) (VP XX (NP XX ) ) ) (S (NP ) (VP XX XX (NP XX ) ) ) (S (VP XX (NP XX ) ) ) (S (NP XX ) (VP XX (NP XX ) ) ) )

0.32 0.28 0.25 0.10

slide-49
SLIDE 49

49

Parsing Learning Curve

20 40 60 80 100 20 40 60 80 100

Percentage of Training Data Percent Satisfied

slide-50
SLIDE 50

50

Parsing Learning Curve

20 40 60 80 100 20 40 60 80 100

Percentage of Training Data Percent Satisfied

slide-51
SLIDE 51

51

Parsing Learning Curve

20 40 60 80 100 20 40 60 80 100

Percentage of Training Data Percent Satisfied

slide-52
SLIDE 52

52

Parsing Learning Curve

20 40 60 80 100 20 40 60 80 100

Percentage of Training Data Percent Satisfied

F1 details available in the paper

slide-53
SLIDE 53

53

Semantic Role Labeling Experiments

  • Model of He et al. (2017) on CoNLL 2005
  • Assume gold predicates

O

a

A1

flower

O

to

A2

Bob Jones

A0

Alice

A0

Smith

O

gave

A2

slide-54
SLIDE 54

54

Semantic Role Labeling Learning Curve

  • Impose constraints at varying levels of supervision

– Disallow duplicate arguments – Disallow invalid arguments (via Propbank) – Require spans to have the same label

slide-55
SLIDE 55

55

Semantic Role Labeling Learning Curve

10 20 30 40 50 60 70 80 90 100 65 70 75 80

Percentage of Training Data CoNLL F1

slide-56
SLIDE 56

56

Semantic Role Labeling Learning Curve

10 20 30 40 50 60 70 80 90 100 65 70 75 80

Percentage of Training Data CoNLL F1

slide-57
SLIDE 57

57

Semantic Role Labeling Learning Curve

10 20 30 40 50 60 70 80 90 100 65 70 75 80

Percentage of Training Data CoNLL F1

slide-58
SLIDE 58

58

Semantic Role Labeling Learning Curve

10 20 30 40 50 60 70 80 90 100 65 70 75 80

Percentage of Training Data CoNLL F1

slide-59
SLIDE 59

59

Semantic Role Labeling Learning Curve

10 20 30 40 50 60 70 80 90 100 65 70 75 80

Percentage of Training Data CoNLL F1

Constraints consistently help at all levels of supervision Constraints help most in low- supervision settings

slide-60
SLIDE 60

60

Active Set Size & Speedup

  • Measured number of constraints added to the

working set until a valid output found

  • Calculated speed of inference relative to intersecting

all constraints then running inference

slide-61
SLIDE 61

61

Active Set Size & Speedup

10 20 30 40 50 60 70 80 90 100 60 65 70 75 80 85 90 95 100

Percentage of Training Data

Size Distribution of

10 20 30 40 50 60 70 80 90 100 3.5x 4x 5x 5.5x 5.2x

Percentage of Training Data

speed-up

Fewer constraints are necessary to impose at higher levels of supervision, which results in larger speedups

slide-62
SLIDE 62

62

Conclusion

  • Presented a general-purpose algorithm to enforce

constraints in sequential inference

  • Demonstrated enforcing constraints is necessary and

beneficial at all levels of supervision

  • Showed the active set method for imposing multiple

constraints results in faster inference

slide-63
SLIDE 63

63

Thank you!

https://github.com/CogComp/gcd

slide-64
SLIDE 64

64

References

  • Cyril Allauzen, Michael Riley, Johan Schalkwyk, Wojciech Skut, and Mehryar Mohri.
  • 2007. OpenFst: A general and efficient weighted finite-state transducer library. In

International Conference on Implementation and Application of Automata.

  • Kyle Gorman. 2016. Pynini: A Python Library for Weighted Finite-state Grammar
  • Compilation. In Proceedings of the SIGFSM

Workshop on Statistical NLP and Weighted Automata.

  • Luheng He, Kenton Lee, Mike Lewis, and Luke Zettlemoyer. 2017. Deep Semantic

Role Labeling: What works and What’s Next. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.

  • Hao Zhang, Richard Sproat, Alex H. Ng, Felix Stahlberg, Xiaochang Peng, Kyle

Gorman, and Brian Roark. 2019. Neural Models of Text Normalization for Speech

  • Applications. In Computational Linguistics.
  • Muhua Zhu,

Yue Zhang, Wenliang Chen, Min Zhang, Jingbo Zhu. 2013. Fast and accurate shift-reduce constituency parsing. In Proceedings of the 51st Annual Meeting

  • f the Association for Computational Linguistics.