Distributed Keyword Vector Representation for Document Categorization
Yu-Lun Hsieh, Shih-Hung Liu, Yung-Chun Chang, Wen-Lian Hsu Institute of Information Science, Academia Sinica, Taiwan morphe@iis.sinica.edu.tw
Distributed Keyword Vector Representation for Document - - PowerPoint PPT Presentation
Distributed Keyword Vector Representation for Document Categorization Yu-Lun Hsieh, Shih-Hung Liu, Yung-Chun Chang, Wen-Lian Hsu Institute of Information Science, Academia Sinica, Taiwan morphe@iis.sinica.edu.tw Outline Introduction
Yu-Lun Hsieh, Shih-Hung Liu, Yung-Chun Chang, Wen-Lian Hsu Institute of Information Science, Academia Sinica, Taiwan morphe@iis.sinica.edu.tw
2
3
4
5
6
7
8
9
10
11
neighbors
hidden layer
input vector and weights
12
word
vector and weights
13
(Le and Mikolov, 2014)
14
15
16 Mean vector
document Keyword vector Weighted mean of keyword vectors
17
18
documents for learning an SVM classifier
with various settings for the amount of keywords
19
20
21
22
Topic NB VSM LDA DM DBOW DKV Sport 67.07 79.13 80.20 90.67 90.74 92.22 Health 40.41 63.65 80.35 86.73 86.67 90.29 Politics 42.86 66.89 67.31 85.41 85.70 86.78 Travel 42.52 66.31 80.37 74.08 74.40 72.01 Education 28.25 41.07 58.01 71.64 71.61 74.54 Average 44.22 63.41 73.25 81.71 81.82 83.17
satisfactory performances
to substaitial success
topics into a dense vector, leading to the best overall performance
related to keyword size, however,
amount (~2,000 keywords)
simply adding more keywords would not lead to improvement
23
80 81 82 83 84 85 86 200 400 600 800 1000 2000 3000 4000
F-score (%) # keywords
Keyword size vs. F-score
24
25
26
27