A Fast and Accurate Dependency Parser using Neural Networks Danqi - - PowerPoint PPT Presentation

a fast and accurate dependency parser using neural
SMART_READER_LITE
LIVE PREVIEW

A Fast and Accurate Dependency Parser using Neural Networks Danqi - - PowerPoint PPT Presentation

A Fast and Accurate Dependency Parser using Neural Networks Danqi Chen & Christopher D. Manning Qiming Chen qc2195 Apr. 8. 2015 Dependency Parsing Parsing: He has good control . Dependency Parsing Parsing: He has good control .


slide-1
SLIDE 1

A Fast and Accurate Dependency Parser using Neural Networks

Danqi Chen & Christopher D. Manning

Qiming Chen qc2195

  • Apr. 8. 2015
slide-2
SLIDE 2

Dependency Parsing

  • Parsing: He has good control .
slide-3
SLIDE 3

Dependency Parsing

  • Parsing: He has good control .
slide-4
SLIDE 4

Dependency Parsing

  • Parsing: He has good control .
  • Goal: accurate and fast parsing
slide-5
SLIDE 5

Transition-based Parsing

  • A configuration = a stack, a buffer and some

dependency arcs

  • arc-standard system is employed
slide-6
SLIDE 6

Transition-based Parsing

  • A configuration = a stack, a buffer and some

dependency arcs

  • arc-standard system is employed
slide-7
SLIDE 7

Transition-based Parsing

  • A configuration = a stack, a buffer and some

dependency arcs

  • arc-standard system is employed
slide-8
SLIDE 8

LEFT-ARC

slide-9
SLIDE 9

RIGHT-ARC

slide-10
SLIDE 10

SHIFT

slide-11
SLIDE 11

Traditional Features

slide-12
SLIDE 12

Traditional Features

  • Sparse!
  • Incomplete
  • Computationally expensive
slide-13
SLIDE 13

Neural Networks!

  • Learn a dense and compact feature representation
  • to encode all the available information
  • to model high-order features
slide-14
SLIDE 14

Dense Feature Representation

  • Represent each word as a d-dimensional dense

vector.

  • Meanwhile, part-of-speech tags and dependency

labels are also represented as d-dimensional vectors.

  • NNS (plural noun) should be close to NN (singular

noun).

slide-15
SLIDE 15

Extracting Tokens from Configuration

  • We extract a set of tokens based on the positions:
  • And get their word, POS, deps
slide-16
SLIDE 16

Extracting Tokens from Configuration

  • We extract a set of tokens based on the positions:
  • And get their word, POS, deps
slide-17
SLIDE 17

Extracting Tokens from Configuration

  • We extract a set of tokens based on the positions:
  • And get their word, POS, deps
slide-18
SLIDE 18

Model Architecture

slide-19
SLIDE 19

Model Architecture

slide-20
SLIDE 20

Model Architecture

Cube activation function: g(x) = x^3

slide-21
SLIDE 21

Model Architecture

Softmax

slide-22
SLIDE 22

Cube Activation Function

slide-23
SLIDE 23

Training

  • Data from Penn Tree Bank (Wall Street Journal)
  • Generating training examples using a oracle.
  • Training objective: cross entropy loss
  • Back-propagation to train all embeddings. (Word,

POS, dep)

  • Initialize word embeddings from pre-trained word

vectors

slide-24
SLIDE 24

Parsing Speed-up

  • Embeddings for popular words, POS tags, dep

labels can be pre-computed and cached for speed-up

  • 8 ~ 10 times faster.
slide-25
SLIDE 25

Indicator vs. Dense Features

  • Sparse?
  • Incomplete?
  • Computationally expensive?
slide-26
SLIDE 26

Experimental Details

  • Embedding size = 50
  • Hidden size = 200
  • 0.5 dropout on hidden layer
  • A rich set of 18 tokens from the configuration
  • Pre-trained word embeddings:
  • C & W for English
  • Word2vec for Chinese
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30

Cube Activation Function

slide-31
SLIDE 31

Pre-trained Word Vectors

slide-32
SLIDE 32

POS Embeddings

slide-33
SLIDE 33

Dependency Embeddings

slide-34
SLIDE 34

Summary

  • Transition-based parser using NNs
  • State-of-the-art accuracy and speed
  • Introduced POS / dep. embeddings, and cube

activation function

slide-35
SLIDE 35

Future Work

  • Richer features (lemma, morph, distance, etc)
  • Beam search
  • Dynamic oracle