Joint Word Segmentation and pos-Tagging using a Single Perceptron - PowerPoint PPT Presentation

Joint Word Segmentation and pos-Tagging using a Single Perceptron Yue Zhang and Stephen Clark Oxford University Computing Laboratory June 5, 2008 Oxford University Computing Laboratory

Introduction of Chinese pos-tagging • Chinese sentences are written as character sequences �� I like reading • Word segmentation is a necessary step before pos -tagging �� Input Ilikereading � � � � � Segment I like reading � /PN � � /V � � /N Tag I/PN like/V reading/N • The traditional approach treats word segmentation and pos -tagging as two separate steps Oxford University Computing Laboratory 1

Two observations • Segmentation errors propagate to the step of pos -tagging �� Input Ilikereading �� Segment Ili ke reading �� /N � /V � � /N Tag Ili/N ke/V reading/N • information about pos helps to improve segmentation � /CD � /M � /N � /CD � � /JJ or � � � � � /CD � /CD � /CD � /CD � /CD � /CD or Oxford University Computing Laboratory 2

Joint segmentation and tagging • The observations lead to the solution of joint segmentation and pos - tagging �� Input Ilikereading � /PN � � /V � � /N Output I/PN like/V reading/N • Consider segmentation and pos information simultaneously • The most appropriate output is chosen from all possible segmented and tagged outputs Oxford University Computing Laboratory 3

Challenges • How to evaluate the correctness of outputs – the model • How to perform decoding – choose the best from all possible outputs Difficulty in the large combined search space: O (2 n − 1 T n ) . Dependending on the feature set, dynamical programming can be inefficient too (which is the case for this paper). • How to automatically train parameters in the model Challenge in the training of features for segmentation and pos -tagging simultaneously. Oxford University Computing Laboratory 4

Existing solutions Ng and Low (2004) • The model: maps joint segmentation and pos -tagging to a character tagging problem, assigning two types of tags to each character to indicate segmentation and pos information respectively. � /s PN � /b V � /e V � /b N � /e N � (I) � � (like) � � (reading) • Decoding: beam search • Training: maximum entropy model for sequence labeling Oxford University Computing Laboratory 5

Existing solutions Shi and Wang (2007) • The model: take the N -best output from the word segmentor and pass them to a separate pos -tagger, ranking candidates by the overall probability score from the segmentor and tagger. • Decoding: A* for word segmentation and dynamic programming for tagging. • Training: conditional random field for sequence labeling. Oxford University Computing Laboratory 6

Existing solutions Potential disadvantage for both models above is the restriction of interaction between segmentation and pos information. • For the character based method, whole word information is not explicitly associated with pos . • For the reranking method, interaction is limited to the best output list from the word segmentor. Oxford University Computing Laboratory 7

Our proposed model The motivation is not to pose any restriction on the interaction between word and pos information during processing. • The model: a linear model with both word segmentation and pos -tagging features. • Decoding: a multiple-beam search algorithm. • Training: the generalized perceptron. Oxford University Computing Laboratory 8

The baseline • Word segmentor from our previous research (Zhang and Clark, 2007) • The perceptron pos -tagger from Collins (2002) Oxford University Computing Laboratory 9

The baseline word segmentor • Linear model trained by the generalized perceptron • Features are extracted from a word bigram context • Encompass both word and character information • Standard beam search decoder Oxford University Computing Laboratory 10

Features from the baseline segmentor 1 word w 2 word bigram w 1 w 2 3 single-character word w 4 a word of length l with starting character c 5 a word of length l with ending character c 6 space-separated characters c 1 and c 2 7 character bigram c 1 c 2 in any word 8 the first / last characters c 1 / c 2 of any word 9 word w immediately before character c 10 character c immediately before word w 11 the starting characters c 1 and c 2 of two consecutive words 12 the ending characters c 1 and c 2 of two consecutive words 13 a word of length l with previous word w 14 a word of length l with next word w Oxford University Computing Laboratory 11

The baseline pos-tagger • Linear model trained by the generalized perceptron • Features redefined for Chinese, including tag trigrams • Standard beam search decoder Oxford University Computing Laboratory 12

Features from the baseline pos-tagger 1, 2, 3 tag t with word w , tag bigram t 1 t 2 , tag trigram t 1 t 2 t 3 4 tag t followed by word w 5 word w followed by tag t 6 word w with tag t and previous character c 7 word w with tag t and next character c 8 tag t on single-character word w in character trigram c 1 wc 2 9 tag t on a word starting with char c 10 tag t on a word ending with char c 11 tag t on a word containing char c in the middle 12 tag t on a word starting with char c 0 and containing char c 13 tag t on a word ending with char c 0 and containing char c 14 tag t on a word containing repeated char cc 15 tag t on a word starting with character category g 16 tag t on a word ending with character category g Oxford University Computing Laboratory 13

The joint segmentor and pos-tagger • Linear model trained by the generalized perceptron • Features are the union of baseline segmentor and tagger features • Multiple beam search decoder Oxford University Computing Laboratory 14

The joint segmentor and pos-tagger • Formulation of the joint segmentation and tagging problem Given an input sentence x , the output F ( x ) satisfies: F ( x ) = arg max Score ( y ) y ∈ GEN ( x ) • The model (denoting the global feature vector for y with Φ( y ) ): Score ( y ) = Φ( y ) · � w Oxford University Computing Laboratory 15

The joint segmentor and pos-tagger Inputs : training examples ( x i , y i ) Initialization : set � w = 0 Algorithm : for t = 1 ..T , i = 1 ..N calculate z i = arg max y ∈ GEN ( x i ) Φ( y ) · � w if z i � = y i w = � w + Φ( y i ) − Φ( z i ) � Outputs : � w Oxford University Computing Laboratory 16

The joint segmentor and pos-tagger • Decoding algorithm is one of the biggest challenges. – Exact inference would be very slow even with dynamic programming – The standard beam search gave inferior accuracy • A multiple beam search decoding algorithm – An agenda given to each character in the input sentence, recording the best segmented and pos -tagged candidates ending with the character – The input sentence is processed incrementally by characters – When each character is processed, all possible words ending with the character are considered, each possible being combined with previous partial candidates previous character to form new partial candidates – System returns the best item from the last agenda Oxford University Computing Laboratory 17

The joint segmentor and pos-tagger A B C D E Oxford University Computing Laboratory 18

The joint segmentor and pos-tagger A B C D E A/T2 A/T1 Oxford University Computing Laboratory 19

The joint segmentor and pos-tagger A B C D E A/T2 B/T2 A/T1 B/T1 A/T2 AB/T2 A/T1 AB/T1 Oxford University Computing Laboratory 20

The joint segmentor and pos-tagger A B C D E A/T2 B/T2 A/T1 B/T1 A/T2 AB/T2 ABC/T2 A/T1 AB/T1 ABC/T1 Oxford University Computing Laboratory 21

The joint segmentor and pos-tagger A B C D E A/T2 B/T2 A/T2 BC/T2 A/T1 B/T1 A/T2 BC/T1 A/T2 AB/T2 A/T1 BC/T1 A/T1 AB/T1 ABC/T2 ABC/T1 A/T1 BC/T2 Oxford University Computing Laboratory 22

The joint segmentor and pos-tagger A B C D E A/T2 B/T2 A/T2 B/T2 C/T1 A/T1 B/T1 AB/T1 C/T1 A/T2 AB/T2 A/T2 BC/T2 A/T1 AB/T1 A/T2 BC/T1 A/T1 BC/T1 ABC/T2 … Oxford University Computing Laboratory 23

The joint segmentor and pos-tagger A B C D E A/T2 B/T2 A/T2 B/T2 C/T1 … … A/T1 B/T1 AB/T1 C/T1 … … A/T2 AB/T2 A/T2 BC/T2 … … A/T1 AB/T1 A/T2 BC/T1 … … Oxford University Computing Laboratory 24

Optimization techniques • The tag dictionary – Frequent words – Closed-set tags • The maximum word length record for each tag • Only the best is stored among candidates in the same context. • All the above information are updated online Oxford University Computing Laboratory 25

Experiments • The experimental data: Chinese Treebank 4 • Test set: 10 -fold cross validation on Chinese Treebank 3 • Development set: The rest of the data are used to determine the number of training iterations, analyse the influence of various factors and draw the distribution of typical errors Oxford University Computing Laboratory 26

The learning curves 0.9 0.92 0.89 0.91 F-score F-score 0.88 0.9 0.87 0.89 0.86 0.88 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Number of training iterations Number of training iterations Oxford University Computing Laboratory 27

Joint Word Segmentation and pos-Tagging using a Single Perceptron - PowerPoint PPT Presentation

Joint Word Segmentation and pos-Tagging using a Single Perceptron Yue Zhang and Stephen Clark Oxford University Computing Laboratory June 5, 2008 Oxford University Computing Laboratory Introduction of Chinese pos-tagging Chinese sentences

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.

POS tagging CMSC 723 / LING 723 / INST 725 Marine Carpuat POS tagging Sequence labeling with

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

Statistical Natural Language Processing Dr. Besnik Fetahu Overview POS tagging

character word embedding and pos tagging for indian languages Anirban Majumdar Amit Kumar

Tagger Comparison (Gao, Johnson) John Wieting CS 598 Unsupervised POS tagging Predict the

By H.W.Weijers Presented at the Workshop on High Temperature Superconducting Magnets for Muon

A depth formula for Tate Tor independent modules over Gorenstein rings Joint work with David A.

Jointly Distributed Random Variables IE 502: Probabilistic Models Jayendran Venkateswaran IE

BY MAHMOOD SHARIF JOINT WORK WITH ORR DUNKELMAN AND RITA OSADCHY Motivation Key-Derivation:

Authentication Frequency (and Continuous Authentication) Mike Just Interactive and Trustworthy

Data-Driven Inference and Observationally Complete Devices joint work with: M. DallArno, A.

Incrementality in Compositional Distributional Semantics M. Sadrzadeh, EECS, QMUL SemDial 2018

Evaluating the value of storage facilities for buffering and arbitrage (Joint work with Lisa

Joint Word Segmentation and pos-Tagging using a Single Perceptron - PowerPoint PPT Presentation

Joint Word Segmentation and pos-Tagging using a Single Perceptron Yue Zhang and Stephen Clark Oxford University Computing Laboratory June 5, 2008 Oxford University Computing Laboratory Introduction of Chinese pos-tagging Chinese sentences

POS Tagging HMMs L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 1 / 17 POS

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning &amp; H.

POS tagging CMSC 723 / LING 723 / INST 725 Marine Carpuat POS tagging Sequence labeling with

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

NLP Programming Tutorial 5 - Part of Speech Tagging with Hidden Markov Models Graham Neubig

Part of Speech Tagging Informatics 2A: Lecture 16 John Longley School of Informatics University

Part of Speech Tagging Informatics 2A: Lecture 15 Mirella Lapata School of Informatics

Statistical Natural Language Processing Dr. Besnik Fetahu Overview POS tagging

character word embedding and pos tagging for indian languages Anirban Majumdar Amit Kumar

Tagger Comparison (Gao, Johnson) John Wieting CS 598 Unsupervised POS tagging Predict the

By H.W.Weijers Presented at the Workshop on High Temperature Superconducting Magnets for Muon

A depth formula for Tate Tor independent modules over Gorenstein rings Joint work with David A.

Jointly Distributed Random Variables IE 502: Probabilistic Models Jayendran Venkateswaran IE

BY MAHMOOD SHARIF JOINT WORK WITH ORR DUNKELMAN AND RITA OSADCHY Motivation Key-Derivation:

Authentication Frequency (and Continuous Authentication) Mike Just Interactive and Trustworthy

Data-Driven Inference and Observationally Complete Devices joint work with: M. DallArno, A.

Incrementality in Compositional Distributional Semantics M. Sadrzadeh, EECS, QMUL SemDial 2018

Evaluating the value of storage facilities for buffering and arbitrage (Joint work with Lisa

Part Of Speech (POS) Tagging Based on Foundations of Statistical NLP by C. Manning & H.