Transition-based Dependency Parsing with Selectional Branching - - PowerPoint PPT Presentation

transition based dependency parsing with selectional
SMART_READER_LITE
LIVE PREVIEW

Transition-based Dependency Parsing with Selectional Branching - - PowerPoint PPT Presentation

Transition-based Dependency Parsing with Selectional Branching Presented at the 4th workshop on Statistical Parsing in Morphologically Rich Languages October 18th, 2013 Jinho D. Choi University of Massachusetts Amherst Greedy vs. Non-greedy


slide-1
SLIDE 1

Transition-based Dependency Parsing with Selectional Branching

Presented at the 4th workshop on Statistical Parsing in Morphologically Rich Languages October 18th, 2013 Jinho D. Choi University of Massachusetts Amherst

slide-2
SLIDE 2

Greedy vs. Non-greedy Parsing

  • Greedy parsing
  • Considers only one head for each token.
  • Generates one parse tree per sentence.
  • e.g., transition-based parsing (2 ms / sentence).

  • Non-greedy parsing
  • Considers multiple heads for each token.
  • Generates multiple parse trees per sentence.
  • e.g., transition-based parsing with beam search, graph-

based parsing, linear programming, dual decomposition (≥ 93%).

2

slide-3
SLIDE 3

Motivation

  • How often do we need non-greedy parsing?
  • Our greedy parser performs as accurately as our non-

greedy parser about 64% of the time.

  • This gap is even closer when they are evaluated on non-

benchmark data (e.g., twits, chats, blogs).

  • Many applications are time sensitive.
  • Some applications need at least one complete parse tree

ready given a limited time period (e.g., search, dialog, Q/ A).

  • Hard sentences are hard for any parser!
  • Considering more heads does not always guarantee more

accurate parse results.

3

slide-4
SLIDE 4

Transition-based Parsing

  • Transition-based dependency parsing (greedy)
  • Considers one transition for each parsing state.

4

S t1

tL t′ S t1 tL

t′ S

T

What if t′ is not the correct transition?

slide-5
SLIDE 5

Transition-based Parsing

  • Transition-based dependency parsing with beam search
  • Considers b-num. of transitions for each block of parsing

5

S t1

tL t′1 t′b

t11 t1L tb1 tbL

… …

t′1 t′b S1 Sb

… …

T1 Tb S1 Sb

slide-6
SLIDE 6

Selectional Branching

  • Issues with beam search
  • Generates the fixed number of parse trees no matter

how easy/hard the input sentence is.

  • Is it possible to dynamically adjust the beam size for each

individual sentence?

  • Selectional branching
  • One-best transition sequence is found by a greedy parser.
  • Collect k-best state-transition pairs for each low

confidence transition used to generate the one-best sequence.

  • Generate transition sequences from the b-1 highest

scoring state-transition pairs in the collection.

6

slide-7
SLIDE 7

Selectional Branching

7

S1 t11

t1L t′11 S2 t21 t2L

t′21 Sn

T

λ =

t′12 S1 t′1k S1

t′22 S2 t′2k S2

… …

Pick b-1 number of pairs with the highest scores. low confident? low confident? For our experiments, k = 2 is used.

slide-8
SLIDE 8

Selectional Branching

8

λ =

t′12 S1 t′22 S2 t′32 S3 T S1 t′12 S2 Sa

T S2 t′22 S3 Sb

T S3 t′32 S4 Sc

Carries on parsing states from the one-best sequence. Guarantees to generate fewer trees than
 beam search when |λ| ≤ b.

slide-9
SLIDE 9

Low Confidence Transition

  • Let C1 be a classifier that finds the highest scoring

transition given the parsing state x.
 
 
 


  • Let Ck be a classifier that finds the k-highest scoring

transitions given the parsing state x and the margin m.
 
 


  • The highest scoring transition C1(x) is low confident if 


|Ck(x, m)| > 1.

9

C1(x) = arg max

y2Y {f(x, y)}

f(x, y) = exp(w · Φ(x, y)) P

y02Y exp(w · Φ(x, y0))

Ck(x, m) = K arg max

y2Y {f(x, y)}

s.t. f(x, C1(x)) − f(x, y) ≤ m

slide-10
SLIDE 10

Experiments

  • Parsing algorithm (Choi & McCallum, 2013)
  • Hybrid between Nivre’s arc-eager and list-based

algorithms.

  • Projective parsing: O(n).
  • Non-projective parsing: expected linear time.
  • Features
  • Rich non-local features from Zhang & Nivre, 2011.
  • For languages with coarse-grained POS tags, feature

templates using fine-grained POS tags are replicated.

  • For languages with morphological features, morphologies
  • f σ[0] and β[0] are used as unigram features.

10

slide-11
SLIDE 11

Number of Transitions

  • # of transitions performed with respect to beam

sizes.

11

80 10 20 30 40 50 60

70

1,200,000 200,000 400,000 600,000 800,000 1,000,000

Beam size = 1, 2, 4, 8, 16, 32, 64, 80 Transitions

slide-12
SLIDE 12

Projective Parsing

  • The benchmark setup using WSJ.

12 Approach USA LAS Time bt = 80, bd = 80 92.96 91.93 0.009 bt = 80, bd = 64 92.96 91.93 0.009 bt = 80, bd = 32 92.96 91.94 0.009 bt = 80, bd = 16 92.96 91.94 0.008 bt = 80, bd = 8 92.89 91.87 0.006 bt = 80, bd = 4 92.76 91.76 0.004 bt = 80, bd = 2 92.56 91.54 0.003 bt = 80, bd = 1 92.26 91.25 0.002 bt = 1, bd = 1 92.06 91.05 0.002

slide-13
SLIDE 13

Projective Parsing

  • The benchmark setup using WSJ.

13 Approach USA LAS Time bt = 80, bd = 80 92.96 91.93 0.009 Zhang & Clark, 2008 92.1 Huang & Sagae, 2010 92.1 0.04 Zhang & Nivre, 2011 92.9 91.8 0.03 Bohnet & Nivre, 2012 93.38 92.44 0.4 McDonald et al., 2005 90.9 McDonald & Pereira, 2006 91.5 Sagae & Lavie, 2006 92.7 Koo & Collins, 2010 93.04 Zhang & McDonald, 2012 93.06 91.86 Martins et al., 2010 93.26 Rush et al., 2010 93.8

slide-14
SLIDE 14

Non-projective Parsing

  • CoNLL-X shared task data

14 Approach Danish Dutch Slovene Swedish LAS UAS LAS UAS LAS UAS LAS UAS bt = 80, bd = 80 87.27 91.36 82.45 85.33 77.46 84.65 86.8 91.36 bt = 80, bd = 1 86.75 91.04 80.75 83.59 75.66 83.29 86.32 91.12 Nivre et al., 2006 84.77 89.8 78.59 81.35 70.3 78.72 84.58 89.5 McDonald et al., 2006 84.79 90.58 79.19 83.57 73.44 83.17 82.55 88.93 Nivre, 2009 84.2

  • 75.2
  • F.-Gonz. & G.-Rodr., 2012

85.17 90.1

  • 83.55

89.3 Nivre & McDonald, 2008 86.67

  • 81.63
  • 75.94
  • 84.66
  • Martins et al., 2010
  • 91.5
  • 84.91
  • 85.53
  • 89.8
slide-15
SLIDE 15

SPMRL 2013 Shared Task

  • Baseline results provided by ClearNLP

.

15 Language 5K Full LAS UAS LS LAS UAS LS Arabic 81.72 84.46 93.41 84.19 86.48 94.43 Basque 78.01 84.62 82.71 79.16 85.32 83.63 French 73.39 85.3 81.42 74.51 86.41 82 German 82.58 85.36 90.49 86.73 88.8 92.95 Hebrew 75.09 81.74 82.84

  • Hungarian

81.98 86.09 88.26 82.68 86.56 88.8 Korean 76.28 80.39 87.32 83.55 86.82 92.39 Polish 80.64 88.49 86.47 81.12 89.24 86.59 Swedish 80.96 86.48 85.1

slide-16
SLIDE 16

Conclusion

  • Selectional branching
  • Uses confidence estimates to decide when to employ a

beam.

  • Shows comparable accuracy against traditional beam

search.

  • Gives faster speed against any other non-greedy parsing.
  • ClearNLP
  • Provides several NLP tools including morphological

analyzer, dependency parser, semantic role labeler, etc.

  • Webpage: clearnlp.com.

16