Feature selection
LING 572 Advanced Statistical Methods for NLP January 21, 2020
1
Feature selection LING 572 Advanced Statistical Methods for NLP - - PowerPoint PPT Presentation
Feature selection LING 572 Advanced Statistical Methods for NLP January 21, 2020 1 Announcements HW1: avg 91.2, good job! Two recurring patterns: Q2c: not using second derivatives to show global optimum Q4b: HMM trigram tagger
LING 572 Advanced Statistical Methods for NLP January 21, 2020
1
2
3
4
In this lecture, we will use “term” and “feature” interchangeably.
5
6
7
8
, with comparable performance
9
In this lecture, we will use “term” and “feature” interchangeably.
10
11
12
13
14
2|r| ⋅ (train + test)
15
measure the “importance” of the terms.
)
16
17
18
19
20
21
22
a b c d
23
24
25
26
27
|C|
i=1
|C|
i=1
ci
, IG} > {#avg} >> {MI}
28
29
30
.
, where is the number of documents that contain .
31
32
33
34
35
36
37