Introduction to dependency parsing Marco Kuhlmann Department of - - PowerPoint PPT Presentation

β–Ά
introduction to dependency parsing
SMART_READER_LITE
LIVE PREVIEW

Introduction to dependency parsing Marco Kuhlmann Department of - - PowerPoint PPT Presentation

Deep Learning for Natural Language Processing Introduction to dependency parsing Marco Kuhlmann Department of Computer and Information Science Linkping University This work is licensed under a Creative Commons Attribution 4.0 International


slide-1
SLIDE 1

This work is licensed under a Creative Commons Attribution 4.0 International License.

Introduction to dependency parsing

Marco Kuhlmann Department of Computer and Information Science LinkΓΆping University

Deep Learning for Natural Language Processing

slide-2
SLIDE 2

Dependency parsing

  • Syntactic parsing is the task of mapping a sentence to a formal

representation of its syntactic structure.

  • We focus on representations in the form of dependency trees.
  • A syntactic dependency is an asymmetric relation between a

head and a dependent. Koller co-founded Coursera subject

  • bject
slide-3
SLIDE 3
slide-4
SLIDE 4

Dependency trees

  • A dependency tree for a sentence 𝑦 is a digraph 𝐻 = (π‘Š, 𝐡)

where π‘Š = {1, …, |𝑦|} and where there exists a 𝑠 ∈ π‘Š such that every 𝑀 ∈ π‘Š is reachable from 𝑠 via exactly one directed path.

  • Tie vertex 𝑠 is called the root of 𝐻.
  • Tie arcs of a dependency tree may be labelled to indicate the type
  • f the syntactic relation that holds between the two elements.

Universal Dependencies v2 uses 37 universal syntactic relations (list).

slide-5
SLIDE 5

Two parsing paradigms

  • Graph-based dependency parsing

Cast parsing as a combinatorial optimisation problem over a (possibly restricted) set of dependency trees.

  • Transition-based dependency parsing

Cast parsing as a sequence of local classification problems: at each point in time, predict one of several parser actions.

slide-6
SLIDE 6

Graph-based dependency parsing

  • Given a sentence 𝑦 and a set 𝑍(𝑦) of candidate dependency trees

for 𝑦, we want to find a highest-scoring tree 𝑧̂ ∈ 𝑍(𝑦):

  • Tie computational complexity of this problem depends on the

choice of the set 𝑍(𝑦) and the scoring function.

  • <latexit sha1_base64="rkh5/3YeDmLxw/mMStAJIdWX0=">AFiHicjVRtb9MwEM7GCiO8bfCRLxbVpA6lVdN64pUadqmCSTKyrqXorqnOTWkvsEDulJco/4tfwDcGPwW7LRtNmqUkznPfbd2eEPhWyXP61tPxgJfw0epj8nTZ89frK2/PBc8jlw4c7nPo7ZDBPiUwZmk0od2GAEJHB8unKsDb8YQiQoZ6dyHEI3IH1GL6lLpIJ6a0d4QGQyThG26thCOCBywMESxjJhER9bAVklKa9ZIwZehLYbSpuBOrcHkEaWFkofFmby1fLpUnAy1O7Nkb8xGs7e+8hZ73I0DYNL1iRAduxzKrlpSUteH1MSxgJC4V6QPHTVlJADRTSYBp2hDIR65JF6mEQT9H+PhARCR6KY+iNMfAJfYxpBc2bXbiIENzWzpo7Dfa8o5NiH+ofWsaX9b367Scyoyz0oTuRNLEAGhDKt1zERatHvcARExhEIVEeJghBKNKr+irulbQsdhypC4s+w3dSa4yhK0d7Ksmw7Q7O3inatVFvg1bI8RSpmWdVKqkgT5kfav97sKbQ19CkOHVg9O4bnHGhUgqekvC9ls6CcusmDcrU6UHNiP9LgRzcLwU7FjrSJaujqUZDp/eUsrGFJvLaoKCWG9FQTt+ZkHamaVxUOSChvE3ljgwvCpxAP/ZJdA+N6/QvirRiZ3AfBV2YauU2hUMqwgy7Winexbx1LV3emzI1pkfVB4mH4NaxGAfO5dx10fWTnPtC0cKID6kHLg8Cwjzcp0NgicIPQd1TfU3UXfEOVcMJqISo3Wzi5snxfsfuJs20sJmkyYZaHUfA4Nu8Bmacicn+8DvsaQGholNbiuScadqC9KI6jrydmq72Nlesjg5r5Ts7VLt83Z+b3/WZ1aN18Ybo2DYRtXYM94bTePMcI0fxk/jt/EnZ+bKuWquNqUuL818XhlzI7f/F6sEuHg=</latexit>
slide-7
SLIDE 7

The arc-factored model

  • Under the arc-factored model, the score of a dependency tree is

expressed as the sum of the scores of its arcs:

  • Tie score of a single arc can be computed by means of a neural

network that receives the head and the dependent as input.

for example, a simple linear layer: score(𝑦, β„Ž β†’ 𝑒) = [π’Š ; 𝒆] Β· 𝒙 + 𝑐

  • <latexit sha1_base64="tm50TQM6VpUucVmAinxkWYCG5rE=">AFl3icjVRdT9swFA2Mbqz7gu1p2ou1CgmtGoKonRSNQIMWkdhVLoVFeVk1xai8TOYqdrF+WX7Zfsca/bn5idrC2IGEpiXN87vH9sK8deFTIYvHnwuKDpczDR8uPs0+ePnv+YmX15bngUehA0+EeD1s2EeBRBk1JpQetIATi2x5c2Ff7ev1iAKGgnJ3JUQAdn/QYvaQOkQrqrjRxn8h4lCBsVrGJsE9knwcxljCUMQl72PTJMEm68QhytCX9eG4orI78YkRbRpShYODyFZH5qIbHRXcsVCMR1ofmJNJjljMurd1aV32OVO5AOTjkeEaFvFQHaUB5I6HiRZHAkIiHNFetBWU0Z8EJ04jT9Bawpx0SUP1cMkStH/LWLiCx2YuqPyOJT+BrREOqTdW0mAnCS7OxS2+aemxdy5EH1Y+PY1PY3v504YtThLuRT+SwWIH1CmdZrZxFq0O9wCERGIQhURbGCEIo1qv7yO4UtEx0HKkLiTbCdxJziKEre2pxlWdYMzdrMW5VCZY5XmeUpUn6WVS4lipQyP9HetbNn0NLQ58i31fnR3tc40KlFwl4bkNnQVl1olrlKnDhOoh/5cC2b9fCrZNdKhLVkVjZpO7xlIxOl8npBQ0npIEcv2dC2h6ncV5lnwTyNpU7MjwvcAq9yCPhPTSu0z8v0ojs/n0UdGHKpdsUDqgIZtjlUv4u5q176fLelKk2PqoeSDwAp4rFyLcvp6Lrp/k3BOKFoR8QF1wuO8T5uIeHQCLFX4A6p7qa6Luinug+o9PJYSteh3XT4/32lYnrifrG3ESr6ndcQgMvk1rYMaZSP3D7GrBYSKTrkUyqmlcUfSm+o4claSVd3Fmu0l85PzUsHaKlROtnK7e5M+s2y8Md4a64ZlI1d48ioG03DMX4Yv4zfxp/M68yHzGHmaExdXJjYvDKmRubkL9G6vms=</latexit>

head–dependent arc

slide-8
SLIDE 8

Computational complexity

  • Under the arc-factored model, the highest-scoring dependency

tree can be found in 𝑃(π‘œ3) time (π‘œ = sentence length).

Chu–Liu/Edmonds algorithm; McDonald et al. (2005)

  • Even seemingly minor extensions of the arc-factored model

entail intractable parsing.

McDonald and Satta (2007)

  • For some of these extensions, polynomial-time parsing is

possible for restricted classes of dependency trees.

slide-9
SLIDE 9

Transition-based dependency parsing

  • We cast parsing as a sequence of local classification problems

such that solving these problems builds a dependency tree.

  • In most approaches, the number of classifications required for

this is linear in the length of the sentence.

slide-10
SLIDE 10

Transition-based dependency parsing

  • Tie parser starts in the initial configuration.

empty dependency tree

  • It then calls a classifier, which predicts the transition that the

parser should make to move to a next configuration.

extend the partial dependency tree

  • Tiis process is repeated until the parser reaches a terminal

configuration.

complete dependency tree

slide-11
SLIDE 11

Training transition-based dependency parsers

  • To train a transition-based dependency parser, we need a

treebank with gold-standard dependency trees.

  • In addition to that, we need an algorithm that tells us the gold-

standard transition sequence for a tree in that treebank.

  • Such an algorithm is conventionally called an oracle.
slide-12
SLIDE 12

Comparison of the two parsing paradigms

Graph-based parsing slow (in practice, cubic in the length of the sentence) restricted feature models (in practice, arc-factored) features and weights directly defined on target structures Transition-based parsing fast (quasi-linear in the length

  • f the sentence)

rich feature models defined on configurations indirection – features and weights defined on transitions