Chinese Hypernym-Hyponym Extraction from User Generated Categories
Chengyu Wang, Xiaofeng He
School of Computer Science and Software Engineering, East China Normal University Shanghai, China
Chinese Hypernym-Hyponym Extraction from User Generated Categories - - PowerPoint PPT Presentation
Chinese Hypernym-Hyponym Extraction from User Generated Categories Chengyu Wang, Xiaofeng He School of Computer Science and Software Engineering, East China Normal University Shanghai, China Outline Introduction Background and Related
School of Computer Science and Software Engineering, East China Normal University Shanghai, China
2
– Chinese is-a relations are essential to construct large-scale Chinese taxonomies and knowledge graphs. – It is difficult to extract such relations due to the flexibility of language expression.
– User generated categories are valuable knowledge sources, providing fine- grained candidate hypernyms of entities. – The semantic relations between an entity and its categories are not clear.
3
4
Categories: Political figure, Foreign country, Leader, Person Barack Obama
5
Categories: Political figure, Foreign country, Leader, Person Barack Obama
6
– Example
7
Classes Entities Person Political Leader Entitiy Scientist Country Developed Country
8
Person
Political Leader Entitiy Scientist Country Developed Country
Person
Political Leader Entitiy Scientist Country Developed Country
Learning Algorithm Key challenge: identify is-a relations from user generated categories
9
– Handcraft patterns: high accuracy, low coverage
– Automatic generated patterns: higher coverage, lower accuracy – Not suitable for Chinese with flexible expression
– Taxonomy construction based on existing knowledge sources
– Chinese: relatively low-resourced
10
– Infer relations using distributed similarity measures
contexts of its hypernym and a hypernym can appear in all contexts of its hyponyms
– Not suitable for Chinese with flexible and sparse contexts
– Represent words as dense, low-dimensional vectors – Learn semantic projection models from hyponyms to hypernyms – State-of-the-art approach for Chinese is-a relation extraction (ACL’14)
11 Figures taken from Mikolov et al., 2013
12 Figures taken from Mikolov et al., 2013
13
14
15
– Projection model: 𝑁𝑤 ⃗ 𝑦3 + 𝑐 = 𝑤 ⃗ 𝑧3
– Partition a collection of is-a relations 𝑆7 ⊂ 𝑆∗ into 𝐿 clusters (𝐷:,⋯ ,𝐷<,⋯ ,𝐷=) – Each cluster 𝐷< share projection matrix 𝑁<and offset vector 𝑐< – Optimization function: 𝐾 𝑁<,𝑐<; 𝐷< = 1 𝐷< A 𝑁<𝑤 ⃗ 𝑦3 + 𝑐< − 𝑤 ⃗ 𝑧3
C (DE,FE)∈GH
16
Projection matrix Word vector Offset vector
– Word pairs: positive is-a set 𝑆7, unlabeled set 𝑉 – Model parameters: 𝑁< and 𝑐< for each cluster
1. Sample δ 𝑉 word pairs from 𝑉, denoted as 𝑉(L). 2. Use the model to predict the relation between words. Denote “positive” word pairs as 𝑉M
(L).
3. Use pattern-based relation selection method to select a subset of 𝑉M
(L)
which have high confidence, denoted as 𝑉N
(L).
4. Remove 𝑉N
(L)from 𝑉 and add it to 𝑆7.
17
5. Update cluster centroids incrementally based on 𝑉N
(L).
6. Update model parameters based on new cluster assignments.
18
Old centroid New centroid Learning rate of centroid shift Distance from centroid 𝑑 ⃑<
(LN:) = 𝑑
⃑<
(L) + 𝜇 Q
1 𝑉<
(L)
A 𝑤 ⃑ 𝑦3 − 𝑤 ⃑ 𝑧3 − 𝑑 ⃑<
(L) (DE,FE)∈RH
(S)
𝐾 𝑁<
(L),𝑐< (L); 𝐷< (L)
= 1 𝐷<
(L)
A 𝑁<
(L)𝑤
⃗ 𝑦3 + 𝑐<
(L) − 𝑤
⃗ 𝑧3
C (DE,FE)∈GH
(S)
19
(between 𝑦3/𝑦V and 𝑧)
(between 𝑦3 and 𝑦V)
20
Hypothesis: 𝑦3/𝑦V is-a 𝑧 Hypothesis: 𝑦3 not-is-a 𝑦V 𝑦V not-is-a 𝑦3 Category Example Is-A 𝑦3是一个𝑧 𝑦3 is a kind of 𝑧 Such-As 𝑧,例如𝑦3、𝑦V 𝑧 , such as 𝑦3 and 𝑦V Co-Hyponym 𝑦3、𝑦V等 𝑦3, 𝑦V and others Examples of Chinese Hypernym/Hyponym Patterns
– Positive score
𝑄𝑇 𝑦3,𝑧3 = 𝛽 1 − 𝑒(L) 𝑦3, 𝑧3 max
D,F ∈R^
𝑒(L) 𝑦, 𝑧 + (1 − 𝛽) 𝑜: 𝑦3,𝑧3 + 𝛿 max
D,F ∈R^
𝑜: 𝑦,𝑧 + 𝛿
– Negative score
𝑂𝑇 𝑦3, 𝑧3 = log 𝑜C 𝑦3, 𝑧3 + 𝛿 (𝑜C 𝑦3 + 𝛿) Q (𝑜C 𝑧3 + 𝛿)
– Target: select 𝑛 word pairs from 𝑉M
(L) to generate 𝑉N (L)
21
Confidence of model prediction Statistics of ”positive” patterns
max A 𝑄𝑇 𝑦3,𝑧3
DE,FE ∈Rf
(S)
s.t. A 𝑂𝑇 𝑦3,𝑧3
DE,FE ∈Rf
(S)
< 𝜄, 𝑉N
(L) ⊆ 𝑉M L , 𝑉N (L) = 𝑛
22
23
– Text contents from Baidu Baike, 1.088B words – Train 100-dimensional word vectors using Skip-gram model
– Training: A subset of is-a relations derived from a Chinese taxonomy – Unlabeled: Entities and categories from Baidu Baike – Testing: publicly available labeled dataset (ACL’14)
24
Unlabeled set statistics
– The performance increases first and becomes relatively stable. – A few false positive pairs are still inevitably selected by our approach.
– The performance drops quickly despite the improvement in the first few iterations.
25
26
Pattern-based Dictionay-based Distributed similarity-based Word embedding- based
27
28
29
* The first author would like to thank COLING 2016 for the student support program.