Natural Language Processing (CSE 490U): Featurized Language Models
Noah Smith
c 2017 University of Washington nasmith@cs.washington.edu
January 9, 2017
1 / 62
Natural Language Processing (CSE 490U): Featurized Language Models - - PowerPoint PPT Presentation
Natural Language Processing (CSE 490U): Featurized Language Models Noah Smith 2017 c University of Washington nasmith@cs.washington.edu January 9, 2017 1 / 62 Whats wrong with n-grams? Data sparseness: most histories and most words
1 / 62
2 / 62
3 / 62
4 / 62
5 / 62
6 / 62
7 / 62
8 / 62
9 / 62
10 / 62
11 / 62
12 / 62
13 / 62
14 / 62
◮ Each feature φk controls a factor to the probability (ewk). ◮ If wk < 0 then φk makes the event less likely by a factor of
1 ewk .
◮ If wk > 0 then φk makes the event more likely by a factor of
◮ If wk = 0 then φk has no effect. 15 / 62
assumption
16 / 62
17 / 62
18 / 62
19 / 62
20 / 62
21 / 62
22 / 62
23 / 62
24 / 62
◮ “Feature selection” methods, e.g., ignoring features with very
25 / 62
26 / 62
27 / 62
◮ Some people would rather not spend their time on it! 28 / 62
◮ Some people would rather not spend their time on it!
29 / 62
◮ Some people would rather not spend their time on it!
30 / 62
◮ Some people would rather not spend their time on it!
31 / 62
32 / 62
33 / 62
34 / 62
35 / 62
36 / 62
37 / 62
38 / 62
39 / 62
40 / 62
41 / 62
42 / 62
43 / 62
◮ Just like for n-gram models! Only even more so, since
44 / 62
45 / 62
46 / 62
47 / 62
48 / 62
49 / 62
notation change
50 / 62
notation change
51 / 62
notation change
52 / 62
53 / 62
54 / 62
55 / 62
◮ Many have argued that this is a good thing (Tibshirani, 1996);
56 / 62
◮ Many have argued that this is a good thing (Tibshirani, 1996);
◮ Do not confuse it with data sparseness (a problem to be
57 / 62
◮ Many have argued that this is a good thing (Tibshirani, 1996);
◮ Do not confuse it with data sparseness (a problem to be
58 / 62
◮ Many have argued that this is a good thing (Tibshirani, 1996);
◮ Do not confuse it with data sparseness (a problem to be
59 / 62
60 / 62
61 / 62
62 / 62