is home where the word vectors lead
play

IS HOME WHERE THE WORD VECTORS LEAD? A Corpus-based Diachronic - PowerPoint PPT Presentation

The 21st Chinese Lexical Semantics Workshop (CLSW 2020) IS HOME WHERE THE WORD VECTORS LEAD? A Corpus-based Diachronic Study of Jia Pei-Yi Chen and Shu-Kai Hsieh Graduate Institute of Linguistics, National Taiwan University


  1. The 21st Chinese Lexical Semantics Workshop (CLSW 2020) “IS HOME WHERE THE WORD VECTORS LEAD?” —— A Corpus-based Diachronic Study of Jia Pei-Yi Chen and Shu-Kai Hsieh Graduate Institute of Linguistics, National Taiwan University

  2. Sources of photo THE CONCEPT OF HOME Corpus-based computational linguistics Semantically related words to jia How the concept of home comes into shape through the lens of time

  3. THE CONCEPT OF HOME IN LITERATURE • Has been extensively studied in (environmental) psychology, sociology, anthropology, architecture, and other fields of study. [1, 2, 5, 6] • Discussed under specialized topics such as homelessness, journeying, migration, aging, and gender. • Intertwined with words home, house, dwelling, and family. [1, 2] • “not only of belonging but also of potential alienation when attempts to make home fail or are subverted.” [5]

  4. SEMANTIC CHANGE • The changes encompass changes to “core meanings of words” or “subtle shifts of cultural associations” [10] • The application of computation to larger sets of words across longer periods of time enables the generalization of regularities on semantic change. [9, 10]

  5. TYPES OF SEMANTIC CHANGE (BLOOMFIELD, 1933) A. Narrowing: e.g., ‘skyline’, ‘ 有事 ’ B. Widening: e.g., ‘Kleenex’, ‘ 舒跑 ’ C. Metaphor: e.g., ‘broadcast’ D. Metonymy: e.g., jaw ‘cheek’ → ‘mandible’ E. Synecdoche (whole-part relation): e.g. capital cities → countries or their governments F. Hyperbole (weaker to stronger): e.g., kill ‘torment’ → ‘slaughter’ G. Meiosis (stronger to weaker): e.g., astound "strike with thunder" → "surprise strongly" H. Degeneration: e.g., knave "boy" → "servant" → "deceitful or despicable man". I. Elevation: e.g., knight "boy" → "nobleman".

  6. METHODOLOGY – DISTRIBUTIONAL SEMANTIC APPROACH • Corpus-based / Usage-based approach • A word’s collocational pattern. • The use of word embeddings to trace semantic relies on the idea that these changes synchronize with changes in word co-occurrences. [11] • Data-driven way / Language models: • Skip-gram with negative sampling (SGNS) [13, 14] • Singular value decomposition (SVD) [15] • t-Distributed stochastic neighbor embedding (t-SNE) [15]

  7. DATA COLLECTION

  8. NOTES ON METHODOLOGIES • Character and word embeddings: Character-based methods are likely to produce a more desirable results than word-based ones at some times, especially when the input data are “vulnerable to the presence of out-of- vocabulary (OOV) words.” [18] However, it is not to conclude that word segmentation is unnecessary, but that alternatives exist. • Vector alignment: Vector alignment is based on Procrustes analysis by Hamilton and Heuser on GitHub 3 . [9] • Dimensionality reduction: A two-dimensional data visualization is plotted by employing the t-SNE technique. [19, 20]

  9. DISCUSSIONS – JIA IN PRE-MODERN TIME Adopted from [2]

  10. DISCUSSIONS – JIA IN MODERN TIME

  11. WORD VECTOR VISUALIZATION Fig. 2. Visualization of word vectors from Qing dynasty and Sinica Corpus

  12. CONCLUSION A compressed history of the Chinese society and the Chinese language • The properties of a physical space and a structured social unit • Less associated with individuated roles such as a wife, but more closely focused on • the self, depicting personal memories of home leaving and returning The meaning conflation of home, house, and family can be explored as different • components Aspects of meanings are encoded in different two-character words in modern time • In the field of corpus and computational linguistics, changes of word choice and the • inclusion of more senses allow for a closer look at the texts in snapshots of specific time frames, while resonates with studies in other disciplines.

  13. REFERENCES • 1. Mallett, S.: Understanding home: A critical review of the literature. Sociol. Rev. 52, 62–89 (2004). https://doi.org/10.1111/j.1467-954x.2004.00442.x. • 2. Sixsmith, J.: The meaning of home: An exploratory study of environmental experience. J. Environ. Psychol. 6, 281–298 (1986). https://doi.org/10.1016/S0272-4944(86)80002-0. • 3. Home, https://www.oed.com/view/Entry/87869?rskey=OqFwzy&result=1#contentWrapper, (2020). • 4. Jia, http://dict.revised.moe.edu.tw/cgi-bin/cbdic/gsweb.cgi?o=dcbdic&searchid=W00000005502, (2015). • 5. Samanani, Farhan, & Lenhard, J.: House and Home, http://www.anthroencyclopedia.com/entry/house-and-home, (2019). • 6. Moore, J.: Placing home in context. J. Environ. Psychol. 20, 207–217 (2000). https://doi.org/10.1006/jevp.2000.0178. • 7. Shen, M.-Y., Fu, C.-C.: Transformation of modern residential design in Taiwan: A case study on public housing projects from 1920s to 1960s. J. Des. 20, 43–62 (2015).

  14. • 8. NMTH: Abode architecture in Taiwan, (2020). • 9. Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of semantic change. In: 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers. pp. 1489–1501. Association for Computational Linguistics (ACL) (2016). https://doi.org/10.18653/v1/p16-1141. • 10. Kutuzov, A., Øvrelid, L., Szymanski, T., Velldal, E.: Diachronic word embeddings and semantic shifts: a survey. In: Proceedings of COLING 2018 (2018). • 11. Hamilton, W.L., Leskovec, J., Jurafsky, D.: Cultural shift or linguistic drift? Comparing two computational measures of semantic change. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp. 2116–2121. NIH Public Access (2016). • 12. Tahmasebi, N., Borin, L., Jatowt, A.: Survey of computational approaches to lexical semantic change. Comput. Linguist. 1, (2018). • 13. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. In: Advances in neural information processing systems. pp. 3111–3119 (2013). • 14. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (2013).

  15. • 15. Smilkov, D., Brain, G., Thorat, N., Nicholson, C., Reif, E., Viégas, F.B., Wattenberg, M.: Embedding Projector: Interactive visualization and interpretation of embeddings. In: 30th Conference on Neural Information Processing Systems (NIPS 2016). , Barcelona, Spain (2016). • 16. Sturgeon, D.: Chinese Text Project: a dynamic digital library of premodern Chinese. Digit. Scholarsh. Humanit. (2018). • 17. Huang, C.-R., Chen, K.-J.C., Chang, L.-P .C., Hsu, H.-L.: An introduction to the Academia Sinica Balanced Corpus of Chinese. In: Proceedings of ROCLING. pp. 81–99 (1995). • 18. Li, X., Meng, Y., Sun, X., Han, Q., Yuan, A., Li, J.: Is word segmentation necessary for deep learning of Chinese representations? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019). pp. 3242–3252. Association for Computational Linguistics (ACL) (2019). https://doi.org/10.18653/v1/p19-1314. • 19. Smetanin, S.: Google News and Leo Tolstoy: Visualizing Word2Vec word embeddings using t- SNE, https://towardsdatascience.com/google-news-and-leo-tolstoy-visualizing-word2vec-word- embeddings-with-t-sne-11558d8bd4d. • 20. Van Der Maaten, L., Hinton, G.: Visualizing Data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

  16. • 21. Camacho-Collados, J., Pilehvar, M.T.: From word to sense embeddings: A survey on vector representations of meaning. J. Artif. Intell. Res. 63, 743–788 (2018). https://doi.org/10.1613/jair.1.11259. • 22. Pelevina, M., Arefiev, N., Biemann, C., Panchenko, A.: Making Sense of Word Embeddings. In: Proceedings of the 1st Workshop on Representation Learning for NLP . pp. 174–183 (2016). https://doi.org/10.18653/V1/W16-1620. • 23. Jatowt, A., Campos, R., Bhowmick, S.S., Tahmasebi, N., Doucet, A.: Every word has its history: Interactive exploration and visualization of word sense evolution. In: Proceedings of International Conference on Information and Knowledge Management. pp. 1899–1902. Association for Computing Machinery (2018). https://doi.org/10.1145/3269206.3269218.

  17. THANKS FOR LISTENING

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend