Chinese Hypernym-Hyponym Extraction from User Generated Categories - - PowerPoint PPT Presentation

chinese hypernym hyponym extraction from user generated
SMART_READER_LITE
LIVE PREVIEW

Chinese Hypernym-Hyponym Extraction from User Generated Categories - - PowerPoint PPT Presentation

Chinese Hypernym-Hyponym Extraction from User Generated Categories Chengyu Wang, Xiaofeng He School of Computer Science and Software Engineering, East China Normal University Shanghai, China Outline Introduction Background and Related


slide-1
SLIDE 1

Chinese Hypernym-Hyponym Extraction from User Generated Categories

Chengyu Wang, Xiaofeng He

School of Computer Science and Software Engineering, East China Normal University Shanghai, China

slide-2
SLIDE 2

Outline

  • Introduction
  • Background and Related Work
  • Proposed Approach
  • Experiments
  • Conclusion

2

slide-3
SLIDE 3

Chinese Is-A Relation Extraction

  • Chinese is-a relation extraction

– Chinese is-a relations are essential to construct large-scale Chinese taxonomies and knowledge graphs. – It is difficult to extract such relations due to the flexibility of language expression.

  • User generated categories

– User generated categories are valuable knowledge sources, providing fine- grained candidate hypernyms of entities. – The semantic relations between an entity and its categories are not clear.

3

slide-4
SLIDE 4

4

Baidu Baike: one of the largest online encyclopedias in China, with 13M+ entries

Categories: Political figure, Foreign country, Leader, Person Barack Obama

slide-5
SLIDE 5

5

Categories: Political figure, Foreign country, Leader, Person Barack Obama

Is-a Is-a Is-a Not- is-a The task: distinguishing is-a and not-is-a relations between Chinese words/phases

slide-6
SLIDE 6

Outline

  • Introduction
  • Background and Related Work
  • Proposed Approach
  • Experiments
  • Conclusion

6

slide-7
SLIDE 7

Background

  • Taxonomy: a hierarchical type system for knowledge graphs,

consisting of is-a relations among classes and entities

– Example

7

Classes Entities Person Political Leader Entitiy Scientist Country Developed Country

slide-8
SLIDE 8

Describing the Task

  • Learning is-a relations for taxonomy expansion

8

Person

Political Leader Entitiy Scientist Country Developed Country

Person

Political Leader Entitiy Scientist Country Developed Country

Learning Algorithm Key challenge: identify is-a relations from user generated categories

slide-9
SLIDE 9

Modeling the Task

  • Taxonomy

– Direct acyclic graph 𝐻 = (𝐹, 𝑆) (𝐹: entities/classes, 𝑆: is-a relations)

  • User generated categories

– Collection of entities 𝐹∗ – Set of user generated categories: 𝐷𝑏𝑢 𝑓 for 𝑓 ∈ 𝐹∗

  • Goal

– Predict whether there is an is-a relation between 𝑓 and 𝑑 where 𝑓 ∈ 𝐹∗ and 𝑑 ∈ 𝐷𝑏𝑢 𝑓 based on the taxonomy 𝐻

9

slide-10
SLIDE 10

Previous Approaches

  • Pattern matching-based approaches

– Handcraft patterns: high accuracy, low coverage

  • Hearst Patterns: NP1 such as NP2

– Automatic generated patterns: higher coverage, lower accuracy – Not suitable for Chinese with flexible expression

  • Thesauri and encyclopedia based approaches

– Taxonomy construction based on existing knowledge sources

  • YAGO: Wikipedia + WordNet
  • More precise but have limited scope constrained by sources

– Chinese: relatively low-resourced

  • No Chinese version of WordNet and Freebase available

10

slide-11
SLIDE 11

Previous Approaches

  • Text inference based approach

– Infer relations using distributed similarity measures

  • Assumption: a hyponym can only appear in some of the

contexts of its hypernym and a hypernym can appear in all contexts of its hyponyms

– Not suitable for Chinese with flexible and sparse contexts

  • Word embedding based approach

– Represent words as dense, low-dimensional vectors – Learn semantic projection models from hyponyms to hypernyms – State-of-the-art approach for Chinese is-a relation extraction (ACL’14)

11 Figures taken from Mikolov et al., 2013

slide-12
SLIDE 12

Learning from Previous Work

  • Lessons learned from “state-of-the art”

– Use word embeddings to represent words – Learn relations between hyponyms and hypernyms in the embedding space

  • Basic approaches

– Vector offsets – Linear projection

12 Figures taken from Mikolov et al., 2013

slide-13
SLIDE 13

Observations

  • Word vector offsets between Chinese is-a pairs

– Multiple linguistic regularities may exist in is-a pairs

  • Different levels of hypernyms
  • Different types of is-a relations (instanceOf vs. subClassOf)
  • Different domains

13

slide-14
SLIDE 14

Outline

  • Introduction
  • Background and Related Work
  • Proposed Approach
  • Experiments
  • Conclusion

14

slide-15
SLIDE 15

General Framework

  • Initial stage

– Train piecewise linear projection models based on the Chinese taxonomy

  • Iterative learning stage

– Extract new is-a relations and adjust model parameters based

  • n an incremental learning approach

– Use Chinese Hypernym/Hyponym patterns to prevent “semantic drift” in each iteration

15

slide-16
SLIDE 16

Initial Model Training

  • Linear projection model

– Projection model: 𝑁𝑤 ⃗ 𝑦3 + 𝑐 = 𝑤 ⃗ 𝑧3

  • Piecewise linear projection model

– Partition a collection of is-a relations 𝑆7 ⊂ 𝑆∗ into 𝐿 clusters (𝐷:,⋯ ,𝐷<,⋯ ,𝐷=) – Each cluster 𝐷< share projection matrix 𝑁<and offset vector 𝑐< – Optimization function: 𝐾 𝑁<,𝑐<; 𝐷< = 1 𝐷< A 𝑁<𝑤 ⃗ 𝑦3 + 𝑐< − 𝑤 ⃗ 𝑧3

C (DE,FE)∈GH

16

Projection matrix Word vector Offset vector

slide-17
SLIDE 17

Iterative Learning (1)

  • Initialization

– Word pairs: positive is-a set 𝑆7, unlabeled set 𝑉 – Model parameters: 𝑁< and 𝑐< for each cluster

  • Iterative process (𝑢 = 1, ⋯ , 𝑈)

1. Sample δ 𝑉 word pairs from 𝑉, denoted as 𝑉(L). 2. Use the model to predict the relation between words. Denote “positive” word pairs as 𝑉M

(L).

3. Use pattern-based relation selection method to select a subset of 𝑉M

(L)

which have high confidence, denoted as 𝑉N

(L).

4. Remove 𝑉N

(L)from 𝑉 and add it to 𝑆7.

17

slide-18
SLIDE 18

Iterative Learning (2)

  • Iterative process (𝑢 = 1, ⋯ , 𝑈)

5. Update cluster centroids incrementally based on 𝑉N

(L).

6. Update model parameters based on new cluster assignments.

18

Old centroid New centroid Learning rate of centroid shift Distance from centroid 𝑑 ⃑<

(LN:) = 𝑑

⃑<

(L) + 𝜇 Q

1 𝑉<

(L)

A 𝑤 ⃑ 𝑦3 − 𝑤 ⃑ 𝑧3 − 𝑑 ⃑<

(L) (DE,FE)∈RH

(S)

𝐾 𝑁<

(L),𝑐< (L); 𝐷< (L)

= 1 𝐷<

(L)

A 𝑁<

(L)𝑤

⃗ 𝑦3 + 𝑐<

(L) − 𝑤

⃗ 𝑧3

C (DE,FE)∈GH

(S)

slide-19
SLIDE 19

Iterative Learning (3)

  • Model prediction

– The prediction of the final piecewise linear projection models – The transitivity closure of existing is-a relations

  • Discussion

– Combination of semantic and lexical extraction of is-a relations

  • Sematic level: word embedding based projection models
  • Lexical level: pattern-based relation selection

– Incremental learning

  • Update of cluster centroids
  • Update of model parameters

19

slide-20
SLIDE 20

Pattern-based Relation Selection (1)

  • Two observations

– Positive evidence

  • Is-A patterns
  • Such-As patterns

(between 𝑦3/𝑦V and 𝑧)

– Negative evidence

  • Such-As patterns

(between 𝑦3 and 𝑦V)

  • Co-Hyponym patterns

20

Hypothesis: 𝑦3/𝑦V is-a 𝑧 Hypothesis: 𝑦3 not-is-a 𝑦V 𝑦V not-is-a 𝑦3 Category Example Is-A 𝑦3是一个𝑧 𝑦3 is a kind of 𝑧 Such-As 𝑧,例如𝑦3、𝑦V 𝑧 , such as 𝑦3 and 𝑦V Co-Hyponym 𝑦3、𝑦V等 𝑦3, 𝑦V and others Examples of Chinese Hypernym/Hyponym Patterns

slide-21
SLIDE 21

Pattern-based Relation Selection (2)

  • Positive and negative evidence scores

– Positive score

𝑄𝑇 𝑦3,𝑧3 = 𝛽 1 − 𝑒(L) 𝑦3, 𝑧3 max

D,F ∈R^

𝑒(L) 𝑦, 𝑧 + (1 − 𝛽) 𝑜: 𝑦3,𝑧3 + 𝛿 max

D,F ∈R^

𝑜: 𝑦,𝑧 + 𝛿

– Negative score

𝑂𝑇 𝑦3, 𝑧3 = log 𝑜C 𝑦3, 𝑧3 + 𝛿 (𝑜C 𝑦3 + 𝛿) Q (𝑜C 𝑧3 + 𝛿)

  • Relation selection via optimization

– Target: select 𝑛 word pairs from 𝑉M

(L) to generate 𝑉N (L)

21

Confidence of model prediction Statistics of ”positive” patterns

max A 𝑄𝑇 𝑦3,𝑧3

DE,FE ∈Rf

(S)

s.t. A 𝑂𝑇 𝑦3,𝑧3

DE,FE ∈Rf

(S)

< 𝜄, 𝑉N

(L) ⊆ 𝑉M L , 𝑉N (L) = 𝑛

slide-22
SLIDE 22

Pattern-based Relation Selection (3)

  • Relation selection algorithm

22

slide-23
SLIDE 23

Outline

  • Introduction
  • Background and Related Work
  • Proposed Approach
  • Experiments
  • Conclusion

23

slide-24
SLIDE 24

Experimental Data

  • Text corpus

– Text contents from Baidu Baike, 1.088B words – Train 100-dimensional word vectors using Skip-gram model

  • Is-a relation sets

– Training: A subset of is-a relations derived from a Chinese taxonomy – Unlabeled: Entities and categories from Baidu Baike – Testing: publicly available labeled dataset (ACL’14)

24

Unlabeled set statistics

slide-25
SLIDE 25

Model Performance

  • With pattern-based relation selection

– The performance increases first and becomes relatively stable. – A few false positive pairs are still inevitably selected by our approach.

  • Without pattern-based relation selection

– The performance drops quickly despite the improvement in the first few iterations.

25

slide-26
SLIDE 26

Comparative Study

  • Comparing with state-of-the-art

26

Pattern-based Dictionay-based Distributed similarity-based Word embedding- based

slide-27
SLIDE 27

Error Analysis

  • Hard to distinguish related-to v.s. is-a relations (approx. 72%)

– False positives:

  • 中药 (Traditional Chinese medicine), 药草 (Herb)
  • 元帅 (Marshal), 军事家 (Strategist)
  • Inaccurate representation learning for fine-grained hypernyms

(approx. 28%) – True positive:

  • 兰科 (Orchid) , 植物 (Plant)

– False negative:

  • 兰科 (Orchid) , 单子叶植物纲 (Monocotyledon)

27

slide-28
SLIDE 28

Outline

  • Introduction
  • Background and Related Work
  • Proposed Approach
  • Experiments
  • Conclusion

28

slide-29
SLIDE 29

Conclusion

  • Chinese is-a relation extraction

– Initial model training: word embedding based piecewise linear projection model – Iterative learning: incremental learning with pattern-based relation selection – Application: weakly supervised taxonomy expansion

  • Future work

– Learning generalized Chinese pattern representations for relation extraction

29

slide-30
SLIDE 30

Thanks!

Questions & Answers

* The first author would like to thank COLING 2016 for the student support program.