Supervised word sense disambiguation on polysemy with bidirectional - - PowerPoint PPT Presentation

supervised word sense disambiguation
SMART_READER_LITE
LIVE PREVIEW

Supervised word sense disambiguation on polysemy with bidirectional - - PowerPoint PPT Presentation

Supervised word sense disambiguation on polysemy with bidirectional LSTM: A case study of BUN in Taiwan Hakka Huei-Ling Lai, Hsiao-Ling Hsu, Jyi-Shane Liu, Chia-Hung Lin and Yanhong Chen National Chengchi University, Taiwan Presented at The 21st


slide-1
SLIDE 1

Supervised word sense disambiguation

  • n polysemy with bidirectional LSTM:

A case study of BUN in Taiwan Hakka

Huei-Ling Lai, Hsiao-Ling Hsu, Jyi-Shane Liu, Chia-Hung Lin and Yanhong Chen National Chengchi University, Taiwan Presented at The 21st Chinese Lexical Semantics Workshop (CLSW2020), City University of Hong Kong, Hong Kong. May 28-30, 2020

slide-2
SLIDE 2

TABLE OF CONTENTS

Introduction Related work

01 03 02 04 05 06

Methods

Overall Architecture

Input layers

Output layers

Experiments

Dataset and Evaluation Metrics

Results and Analysis

Conclusion and Future Work References

Polysemous BUN

Bidirectional LSTM for WSD task in Taiwan Hakka

slide-3
SLIDE 3

INTRODUCTION

01

slide-4
SLIDE 4

 Polysemy is ubiquitous in languages  may trigger some problems in some of the NLP tasks

performed by machines, for instance, part-of-speech (POS) tagging

 Word sense disambiguation (WSD)  Navigli (2009:3) defines WSD as ‘the ability to computationally

determine which sense of a word is activated by its use in a particular context ’.

Introduction

slide-5
SLIDE 5

 In the extant literature 

WSD focuses on a few dominant languages, such as English and Chinese.

Findings based on a few dominant languages may lead to narrow applications.

Low-resource languages still receive little attention, for most WSD tasks are trained by supervised learning, which requires a large amount of labeled data : expensive and time-consuming

 A language-specific WSD system is in need to implement in low-

resource languages, for instance, in Taiwan Hakka.

Introduction

slide-6
SLIDE 6

 Polysemous phenomena in Taiwan Hakka have been probed into by

quite a few studies.

 A part-of-speech (POS) tagging system needs to be established for

classification and categorization of the corpus data.

Introduction

slide-7
SLIDE 7

 Our aim:  to schematize a workable coding framework from integrating and

modifying the findings of previous studies on Taiwan Hakka polysemous phenomena

 to develop a model for automatic polysemous word sense

disambiguation for Taiwan Hakka corpus

Introduction

slide-8
SLIDE 8

Related work

02

slide-9
SLIDE 9

 Polysemous BUN in Taiwan Hakka

Related work

Table 1. The various usages of BUN.

slide-10
SLIDE 10
slide-11
SLIDE 11

 Bidirectional LSTM for WSD task in Taiwan Hakka

  • A number of neural network models and algorithms have been proposed

and designed to simulate human understanding with statistical procedures to capture patterns of co-occurrences of words in context.

  • Numerous studies have proposed different neural language models to solve

the task of word sense disambiguation based on the contextual hypotheses for words and senses (cf. Li and Jurafsky, 2015; Peters, et al., 2018).

  • Yuan et al (2016) have exemplified state- of-the-art results of WSD by

employing a supervised and a semi-supervised LSTM neural network model.

Related work

slide-12
SLIDE 12

 Bidirectional LSTM for WSD task in Taiwan Hakka

  • To better capture the surrounding information of polysemous BUN

We employ a bidirectional LSTM (Graves & Schmidhuber, 2005; Graves et al., 2013) and train the model on labeled data annotated by human to disambiguate and predict the sense of BUN.

In disambiguating the sense of polysemous BUN, the contextual and syntactic information of BUN are crucial and should be taken into account.

The basic idea of Bi-LSTM is to capture past and future information by presenting them to two hidden states and then the two hidden states are concatenated to form the final output.

Related work

slide-13
SLIDE 13

Methods

03

slide-14
SLIDE 14

 Overall Architecture

Methods

   

slide-15
SLIDE 15

 Overall Architecture

Methods

Table 1. The number of tokens and types in each dataset Dataset Token

Type

Dataset 1 (manually annotated instances containing BUN ) Word embedding

64,278 5,695

Character embedding

89,126 2,164

POS

64,103

24 Dataset 2 (MOE Read Out Loud Tests ) Word embedding

68,012 7,322

Dataset 3 (Hans Christian Andersen's fairy tales (translation in Hakka)) Character embedding 835,534

3,910

Table 2. The number of tokens and types in each dataset

slide-16
SLIDE 16

Experiments

04

slide-17
SLIDE 17

 Dataset and Evaluation Metrics

Experiments

Table 3. The occurrences of VA1, VA2, P1, and P2 in dataset 1. Dataset 1 Label Occurrence Training Set (Training + Dev) VA1 66 (4%) VA2 753 (46%) P1 238 (15%) P2 576 (35%) Subtotal 1,633 (100%) Test Set VA1 7 (4%) VA2 75 (46%) P1 24(15%) P2 57 (35%) Subtotal 163 (100%) Table 4. The number of samples in the training set, dev set and test set Dataset 1 Token Type Training (around 80%) Word embedding 14,410 2,463 Character embedding 14,410 1,226 POS 14,410 24 Dev (around 10%) Word embedding 1,610 672 Character embedding 1,610 513 POS 1,610 23 Test (around 10%; fixed) Word embedding 1,630 652 Character embedding 1,630 508 POS 1,630 22 Total (100%) Word embedding 17,650 2,708 Character embedding 17,650 1,283 POS 17,650 24

slide-18
SLIDE 18

 Results and Analysis

Experiments

slide-19
SLIDE 19

 Results and Analysis

Experiments

slide-20
SLIDE 20

 Results and Analysis

Experiments

slide-21
SLIDE 21

 Results and Analysis

Experiments

 larger corpus data is needed  sentence embeddings (cf. Wang and Chang, 2016)

slide-22
SLIDE 22

Conclusion and Future Work

05

slide-23
SLIDE 23

 In this study, we propose a WSD model on the classification of polysemy in

Taiwan Hakka, a low-resource language in the world, especially in the field

  • f NLP.

 The model proposed is a supervised bidirectional LSTM model trained and

tested on a small amount of labeled data.

 Four kinds of input features are utilized  POS only  POS + word embeddings  POS + character embeddings  POS + word embeddings+ character embeddings  the best

performance is achieved

Conclusion and Future Work

slide-24
SLIDE 24

 To enhance the robustness and stability of the model, we will design and

include other possible parameters to compare and contrast the performance of the experiments.

 To test the model with different window spans (from L1R1 to L10R10)

and/or with whole sentences as inputs.

 To improve the research design, we will try random selection without

considering the overall distribution of the four labels on the test set for other experiments in the future.

 To test the model on other polysemous words in Taiwan Hakka.

Conclusion and Future Work

slide-25
SLIDE 25

References

06

slide-26
SLIDE 26

1.

Chiang, M. H. (2016). The Functions and Origin of Locative TU5 in Hailu Hakka, with a Note on the Origin of the Durative Marker TEN3. Bulletin of Chinese Linguistics, 9(1), 95- 120.

2.

Graves, A. and Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional lstm networks. In Proceedings of the 2005 International Joint Conference on Neural Networks. Montreal, Canada.

3.

Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural

  • networks. In Proceedings of ICASSP-2013, 6645–6649. IEEE

4.

Huang, H. C. (2014). Semantic Extensions and the Convergence of the Beneficiary Role: A Case Study of BUN and Lau in Hakka. Concentric: Studies in Linguistics, 40(1), 65-94.

5.

Iacobacci, I., Pilehvar, M. T., & Navigli, R. (2016). Embeddings for word sense disambiguation: An evaluation study. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 897-907).

6.

Lai, H. L. (2001). On Hakka BUN: A case of polygrammaticalization. Language and Linguistics, 2(2), 137-153.

References

slide-27
SLIDE 27

7.

Lai, H. L. (2015). Profiling Hakka BUN1 Causative Constructions. Language and Linguistics, 16(3), 369-395.

8.

Li, J., & Jurafsky, D. (2015). Do multi-sense embeddingss improve natural language understanding? In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2015).

9.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR.

  • 10. Mikolov, T., Yih, W., & Zweig, G. (2013b). Linguistic regularities in continuous space word
  • representations. In Proceedings of HLT-NAACL, pp. 746-751.
  • 11. Navigli, R. (2009). Word sense disambiguation: A survey. ACM computing surveys (CSUR),

41(2), 1-69.

  • 12. Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018).

Deep contextualized word representations. In Proceeding of NAACL.

References

slide-28
SLIDE 28
  • 13. Tseng, Y. C. (2012). An Optimality Theoretic Analysis of the Distribution of Hakka Prepositions

DI, DO, BUN, LAU, TUNG, ZIONG. Concentric: Studies in Linguistics, 38(2), 171-209.

  • 14. Yuan, D., Richardson, J., Doherty, R., Evans, C., & Altendorf, E. (2016). Semi-supervised word

sense disambiguation with neural models. In Proceeding of COLING, 1374–1385.

  • 15. Wang, W., & Chang, B. (2016). Graph-based dependency parsing with bidirectional LSTM. In

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 2306-2315).

References

slide-29
SLIDE 29

THANK YOU