N-grams Motivation Simple n-grams Smoothing Backoff
N-grams
L445 / L545
- Dept. of Linguistics, Indiana University
Spring 2017
1 / 22
N-grams L445 / L545 Dept. of Linguistics, Indiana University - - PowerPoint PPT Presentation
N-grams Motivation Simple n-grams Smoothing Backoff N-grams L445 / L545 Dept. of Linguistics, Indiana University Spring 2017 1 / 22 N-grams Morphosyntax Motivation Simple n-grams We just finished talking about morphology (cf. words)
N-grams Motivation Simple n-grams Smoothing Backoff
1 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
2 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
3 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
◮ Training data: data used to gather prior information ◮ Testing data: data used to test method accuracy
◮ Type: distinct word (e.g., like) ◮ Token: distinct occurrence of a word (e.g., the type like
4 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
◮ This is: P(w6|w1, ..., w5)
5 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
◮ We would like to say that over has a higher probability in
6 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
7 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
◮ The states in the FSA are words
8 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
9 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
10 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
11 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
◮ lickety split is a possible English bigram, but it may not
12 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
◮ vs. our general approach this semester of defining what
◮ Trigram model trained on Shakespeare represents the
◮ Choice of corpus depends upon the purpose 13 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
◮ We added one to every type of bigram, so we need to
14 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
15 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
16 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
17 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
18 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
19 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
20 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
21 / 22
N-grams Motivation Simple n-grams Smoothing Backoff
◮ We won’t have a true probability model anymore ◮ This is why α1 was used in the previous equations, to
22 / 22