Intro NLP Tools
Sporleder & Rehbein
WS 09/10
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 1 / 15
Intro NLP Tools Sporleder & Rehbein WS 09/10 Sporleder & - - PowerPoint PPT Presentation
Intro NLP Tools Sporleder & Rehbein WS 09/10 Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 1 / 15 Approaches to POS tagging rule-based look up words in the lexicon to get a list of potential POS tags apply
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 1 / 15
◮ look up words in the lexicon to get a list of potential POS tags ◮ apply hand-written rules to select the best candidate tag
◮ for a string of words W = w 1, w 2, w 3, ..., w n
◮ mostly based on (first or second order) Markov Models:
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 2 / 15
◮ look up words in the lexicon to get a list of potential POS tags ◮ apply hand-written rules to select the best candidate tag
◮ for a string of words W = w 1, w 2, w 3, ..., w n
◮ mostly based on (first or second order) Markov Models:
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 2 / 15
◮ look up words in the lexicon to get a list of potential POS tags ◮ apply hand-written rules to select the best candidate tag
◮ for a string of words W = w 1, w 2, w 3, ..., w n
◮ mostly based on (first or second order) Markov Models:
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 2 / 15
◮ p(tn|tn−2tn−1) = F(tn−2 tn−1 tn)
F(tn−2 tn−1 )
◮
F(the/DET white/ADJ house/N) F(the/DET white/ADJ)
◮ zero probabilities (might be ingrammatical or just rare) ◮ unreliable counts for rare events Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 3 / 15
◮ p(tn|tn−2tn−1) = F(tn−2 tn−1 tn)
F(tn−2 tn−1 )
◮
F(the/DET white/ADJ house/N) F(the/DET white/ADJ)
◮ zero probabilities (might be ingrammatical or just rare) ◮ unreliable counts for rare events Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 3 / 15
◮ p(tn|tn−2tn−1) = F(tn−2 tn−1 tn)
F(tn−2 tn−1 )
◮
F(the/DET white/ADJ house/N) F(the/DET white/ADJ)
◮ zero probabilities (might be ingrammatical or just rare) ◮ unreliable counts for rare events Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 3 / 15
◮ p(tn|tn−2tn−1) = F(tn−2 tn−1 tn)
F(tn−2 tn−1 )
◮
F(the/DET white/ADJ house/N) F(the/DET white/ADJ)
◮ zero probabilities (might be ingrammatical or just rare) ◮ unreliable counts for rare events Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 3 / 15
◮ decision tree automatically determines the context size used for
◮ context: unigrams, bigrams, trigrams as well as negations of them
◮ probability of an n-gram is determined by following the corresponding
◮ improves on sparse data, avoids zero frequencies Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 4 / 15
◮ decision tree automatically determines the context size used for
◮ context: unigrams, bigrams, trigrams as well as negations of them
◮ probability of an n-gram is determined by following the corresponding
◮ improves on sparse data, avoids zero frequencies Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 4 / 15
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 5 / 15
◮ more extensive treatment of capitalization for unknown words ◮ features for disambiguation of tense form of verbs ◮ features for disambiguating particles from prepositions and adverbs
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 6 / 15
◮ deleting the correction feature for GIS (Generalised Iterative Scaling) ◮ smoothing of parameters of the ME model: replacing simple frequency
⋆ penalises models that have very large positive or negative weights ⋆ allows to use low frequency features without overfitting Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 7 / 15
◮ conceptual simplicity ◮ each model can be improved seperately ◮ effective A* parsing algorithm (enables efficient, exact inference) Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 8 / 15
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 9 / 15
◮ PCFG-PA: Parent encoding
◮ PCFG-LING: selective parent splitting, order-2 rule markovisation, and
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 10 / 15
◮ PCFG-PA: Parent encoding
◮ PCFG-LING: selective parent splitting, order-2 rule markovisation, and
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 10 / 15
◮ PCFG-PA: Parent encoding
◮ PCFG-LING: selective parent splitting, order-2 rule markovisation, and
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 10 / 15
◮ PCFG-PA: Parent encoding
◮ PCFG-LING: selective parent splitting, order-2 rule markovisation, and
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 10 / 15
1
2
3
◮ DEP-BASIC: generate a dependent conditioned on the head and
◮ DEP-VAL: condition not only on direction, but also on distance and
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 11 / 15
1
2
3
◮ DEP-BASIC: generate a dependent conditioned on the head and
◮ DEP-VAL: condition not only on direction, but also on distance and
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 11 / 15
Namhafte
ATTR
Verstärkungen
OBJA
hingegen
ADV
wird es
SUBJ
für
P P
die
DET
nächste
ATTR
Spielzeit
PN
nicht
ADV
geben
AUX
.
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 12 / 15
1
2
3
4
5
6
7
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 13 / 15
◮ learn an optimally refined grammar for parsing ◮ refine the observed trees with latent variables and learn subcategories ◮ basic nonterminal symbols are alternately split and merged to maximize
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 14 / 15
1
2
3
◮ repeatedly split and re-train the grammar ◮ use Expectation Maximisation (EM) to learn a new grammar whose
4
5
6
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 15 / 15
1
2
3
◮ repeatedly split and re-train the grammar ◮ use Expectation Maximisation (EM) to learn a new grammar whose
4
5
6
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 15 / 15
1
2
3
◮ repeatedly split and re-train the grammar ◮ use Expectation Maximisation (EM) to learn a new grammar whose
4
5
6
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 15 / 15
1
2
3
◮ repeatedly split and re-train the grammar ◮ use Expectation Maximisation (EM) to learn a new grammar whose
4
5
6
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 15 / 15
1
2
3
◮ repeatedly split and re-train the grammar ◮ use Expectation Maximisation (EM) to learn a new grammar whose
4
5
6
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 15 / 15
1
2
3
◮ repeatedly split and re-train the grammar ◮ use Expectation Maximisation (EM) to learn a new grammar whose
4
5
6
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 15 / 15
1
2
3
◮ repeatedly split and re-train the grammar ◮ use Expectation Maximisation (EM) to learn a new grammar whose
4
5
6
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 15 / 15
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 16 / 15
Sporleder & Rehbein (WS 09/10) PS Domain Adaptation October 2009 16 / 15