Natural Language Processing (CSE 490U): Language Models
Noah Smith
c 2017 University of Washington nasmith@cs.washington.edu
January 6–9, 2017
1 / 67
Natural Language Processing (CSE 490U): Language Models Noah Smith - - PowerPoint PPT Presentation
Natural Language Processing (CSE 490U): Language Models Noah Smith 2017 c University of Washington nasmith@cs.washington.edu January 69, 2017 1 / 67 Very Quick Review of Probability Event space (e.g., X , Y )in this class,
1 / 67
2 / 67
3 / 67
4 / 67
5 / 67
6 / 67
7 / 67
8 / 67
9 / 67
10 / 67
◮ For any x ∈ V†, p(x) ≥ 0 ◮
11 / 67
12 / 67
13 / 67
14 / 67
15 / 67
16 / 67
17 / 67
18 / 67
19 / 67
20 / 67
21 / 67
22 / 67
23 / 67
24 / 67
25 / 67
26 / 67
◮ This motivates a stricter constraint than we had before: ◮ For any x ∈ V†, p(x) > 0 27 / 67
◮ Perplexity is only an intermediate measure of performance. ◮ Understanding the models is more important than
28 / 67
29 / 67
30 / 67
31 / 67
32 / 67
33 / 67
34 / 67
35 / 67
assumption
36 / 67
37 / 67
38 / 67
assumption
39 / 67
◮ p(the the the the) ≫
40 / 67
assumption
41 / 67
42 / 67
43 / 67
◮ Otherwise, perplexity calculations break
44 / 67
45 / 67
46 / 67
◮ (But not as bad as
47 / 67
◮ You cannot fairly compare two language models that apply
48 / 67
49 / 67
50 / 67
51 / 67
52 / 67
53 / 67
54 / 67
55 / 67
56 / 67
57 / 67
58 / 67
≥0
≥0
59 / 67
≥0
60 / 67
≥0
61 / 67
62 / 67
63 / 67
64 / 67
◮ Initial state s0 ∈ S ◮ Final states F ⊆ S
65 / 67
◮ Initial state s0 ∈ S
66 / 67
67 / 67