tagging an overview rule based disambiguation
play

Tagging: An Overview Rule-based Disambiguation Example - PowerPoint PPT Presentation

Tagging: An Overview Rule-based Disambiguation Example after-morphology data (using Penn tagset): I watch a fly . NN NN DT NN . PRP VB NN VB VBP VBP Rules using word forms, from context & current position


  1. Tagging: An Overview

  2. Rule-based Disambiguation • Example after-morphology data (using Penn tagset): I watch a fly . NN NN DT NN . PRP VB NN VB VBP VBP • Rules using – word forms, from context & current position – tags, from context and current position – tag sets, from context and current position – combinations thereof 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 24

  3. Example Rules I watch a fly • If-then style: NN NN DT NN PRP VB NN VB • DT eq,-1,Tag  VBP VBP (implies NN in,0,Set as a condition) • PRP eq,-1,Tag and DT eq,+1,Tag  VBP • {DT,NN} sub,0,Set  DT • {VB,VBZ,VBP,VBD,VBG} inc,+1,Tag  not DT • Regular expressions: • not (<*,*,DT  not  • not (<*,*,PRP>,<*,*, not VBP>,<*,*,DT>) • not (<*,{DT,NN} sub, not DT  • not (<*,*,DT>,<*,*,{VB,VBZ,VBP,VBD,VBG}>) 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 25

  4. Implementation • Finite State Automata – parallel (each rule ~ automaton); • algorithm: keep all paths which cause all automata say yes – compile into single FSA (intersection) • Algorithm: – a version of Viterbi search, but: • no probabilities (“categorical” rules) • multiple input: – keep track of all possible paths 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 26

  5. Example: the FSA • R1: not (<*,*,DT  not  • R2: not (<*,*,PRP>,<*,*, not VBP>,<*,*,DT>) • R3: not (<*,{DT,NN} sub, DT  • R4: not (<*,*,DT>,<*,*,{VB,VBZ,VBP,VBD,VBG}>) • R1: anything <*,*,DT   not  anything F1 N3 F2 else anything else • R3: anything <*,{DT,NN} sub , not DT  anything F1 N2 else 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 27

  6. Applying the FSA I watch a f NN NN DT N PRP VB NN V VBP V • R1: not (<*,*,DT  not  • R2: not (<*,*,PRP>,<*,*, not VBP>,<*,*,DT>) • R3: not (<*,{DT,NN} sub, DT  • R4: not (<*,*,DT>,<*,*,{VB,VBZ,VBP,VBD,VBG}>) • R1 blocks: remains: or a fly a fly a fly DT NN NN DT NN VB VB VBP VBP • R2 blocks: remains e.g.: and more I watch a I watch a NN DT DT PRP VB PRP VBP • R3 blocks: remains only: a a • R4  R1! DT NN 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 28

  7. Applying the FSA (Cont.) I watch a fly NN NN DT NN PRP VB NN VB VBP VBP • Combine: a fly a fly DT NN NN NN VB VBP I watch a DT PRP VBP a DT • Result: I watch a fly . PRP VBP DT NN . 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 29

  8. Tagging by Parsing • Build a parse tree from the multiple input: S VP NP I watch a fly NN NN DT NN PRP VB NN VB VBP VBP • Track down rules: e.g., NP  DT NN: extract (a/DT fly/NN) • More difficult than tagging itself; results mixed 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 30

  9. Statistical Methods (Overview) • “Probabilistic”: • HMM – Merialdo and many more (XLT) • Maximum Entropy – DellaPietra et al., Ratnaparkhi, and others • Rule-based: • TBEDL (Transformation Based, Error Driven Learning) – Brill’s tagger • Example-based – Daelemans, Zavrel, others • Feature-based (inflective languages) • Classifier Combination (Brill’s ideas) 2018/2019 UFAL MFF UK NPFL068/Intro to statistical NLP II/Jan Hajic and Pavel Pecina 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend