SLIDE 1
IS HOME WHERE THE WORD VECTORS LEAD? A Corpus-based Diachronic - - PowerPoint PPT Presentation
IS HOME WHERE THE WORD VECTORS LEAD? A Corpus-based Diachronic - - PowerPoint PPT Presentation
The 21st Chinese Lexical Semantics Workshop (CLSW 2020) IS HOME WHERE THE WORD VECTORS LEAD? A Corpus-based Diachronic Study of Jia Pei-Yi Chen and Shu-Kai Hsieh Graduate Institute of Linguistics, National Taiwan University
SLIDE 2
SLIDE 3
THE CONCEPT OF HOME IN LITERATURE
- Has been extensively studied in (environmental) psychology, sociology,
anthropology, architecture, and other fields of study. [1, 2, 5, 6]
- Discussed under specialized topics such as homelessness, journeying,
migration, aging, and gender.
- Intertwined with words home, house, dwelling, and family. [1, 2]
- “not only of belonging but also of potential alienation when attempts
to make home fail or are subverted.” [5]
SLIDE 4
SEMANTIC CHANGE
- The changes encompass changes to “core meanings of words” or
“subtle shifts of cultural associations” [10]
- The application of computation to larger sets of words across longer
periods of time enables the generalization of regularities on semantic
- change. [9, 10]
SLIDE 5
TYPES OF SEMANTIC CHANGE (BLOOMFIELD, 1933)
A. Narrowing: e.g., ‘skyline’, ‘有事’ B. Widening: e.g., ‘Kleenex’, ‘舒跑’ C. Metaphor: e.g., ‘broadcast’ D. Metonymy: e.g., jaw ‘cheek’ → ‘mandible’ E. Synecdoche (whole-part relation): e.g. capital cities → countries or their governments F. Hyperbole (weaker to stronger): e.g., kill ‘torment’ → ‘slaughter’ G. Meiosis (stronger to weaker): e.g., astound "strike with thunder" → "surprise strongly" H. Degeneration: e.g., knave "boy" → "servant" → "deceitful or despicable man". I. Elevation: e.g., knight "boy" → "nobleman".
SLIDE 6
METHODOLOGY – DISTRIBUTIONAL SEMANTIC APPROACH
- Corpus-based / Usage-based approach
- A word’s collocational pattern.
- The use of word embeddings to trace semantic relies on the idea that these changes
synchronize with changes in word co-occurrences. [11]
- Data-driven way / Language models:
- Skip-gram with negative sampling (SGNS) [13, 14]
- Singular value decomposition (SVD) [15]
- t-Distributed stochastic neighbor embedding (t-SNE) [15]
SLIDE 7
DATA COLLECTION
SLIDE 8
NOTES ON METHODOLOGIES
- Character and word embeddings: Character-based methods are likely to
produce a more desirable results than word-based ones at some times, especially when the input data are “vulnerable to the presence of out-of- vocabulary (OOV) words.” [18] However, it is not to conclude that word segmentation is unnecessary, but that alternatives exist.
- Vector alignment: Vector alignment is based on Procrustes analysis by
Hamilton and Heuser on GitHub 3 . [9]
- Dimensionality reduction: A two-dimensional data visualization is plotted by
employing the t-SNE technique. [19, 20]
SLIDE 9
DISCUSSIONS – JIA IN PRE-MODERN TIME
Adopted from [2]
SLIDE 10
DISCUSSIONS – JIA IN MODERN TIME
SLIDE 11
WORD VECTOR VISUALIZATION
- Fig. 2. Visualization of word vectors from Qing dynasty and Sinica Corpus
SLIDE 12
CONCLUSION
- A compressed history of the Chinese society and the Chinese language
- The properties of a physical space and a structured social unit
- Less associated with individuated roles such as a wife, but more closely focused on
the self, depicting personal memories of home leaving and returning
- The meaning conflation of home, house, and family can be explored as different
components
- Aspects of meanings are encoded in different two-character words in modern time
- In the field of corpus and computational linguistics, changes of word choice and the
inclusion of more senses allow for a closer look at the texts in snapshots of specific time frames, while resonates with studies in other disciplines.
SLIDE 13
REFERENCES
- 1. Mallett, S.: Understanding home: A critical review of the literature. Sociol. Rev. 52, 62–89 (2004).
https://doi.org/10.1111/j.1467-954x.2004.00442.x.
- 2. Sixsmith, J.: The meaning of home: An exploratory study of environmental experience. J. Environ.
- Psychol. 6, 281–298 (1986). https://doi.org/10.1016/S0272-4944(86)80002-0.
- 3. Home, https://www.oed.com/view/Entry/87869?rskey=OqFwzy&result=1#contentWrapper, (2020).
- 4. Jia, http://dict.revised.moe.edu.tw/cgi-bin/cbdic/gsweb.cgi?o=dcbdic&searchid=W00000005502,
(2015).
- 5. Samanani, Farhan, & Lenhard, J.: House and Home,
http://www.anthroencyclopedia.com/entry/house-and-home, (2019).
- 6. Moore, J.: Placing home in context. J. Environ. Psychol. 20, 207–217 (2000).
https://doi.org/10.1006/jevp.2000.0178.
- 7. Shen, M.-Y., Fu, C.-C.: Transformation of modern residential design in Taiwan: A case study on
public housing projects from 1920s to 1960s. J. Des. 20, 43–62 (2015).
SLIDE 14
- 8. NMTH: Abode architecture in Taiwan, (2020).
- 9. Hamilton, W.L., Leskovec, J., Jurafsky, D.: Diachronic word embeddings reveal statistical laws of
semantic change. In: 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers. pp. 1489–1501. Association for Computational Linguistics (ACL) (2016). https://doi.org/10.18653/v1/p16-1141.
- 10. Kutuzov, A., Øvrelid, L., Szymanski, T., Velldal, E.: Diachronic word embeddings and semantic
shifts: a survey. In: Proceedings of COLING 2018 (2018).
- 11. Hamilton, W.L., Leskovec, J., Jurafsky, D.: Cultural shift or linguistic drift? Comparing two
computational measures of semantic change. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp. 2116–2121. NIH Public Access (2016).
- 12. Tahmasebi, N., Borin, L., Jatowt, A.: Survey of computational approaches to lexical semantic
- change. Comput. Linguist. 1, (2018).
- 13. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words
and Phrases and their Compositionality. In: Advances in neural information processing systems. pp. 3111–3119 (2013).
- 14. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in
vector space. In: Proceedings of the International Conference on Learning Representations (2013).
SLIDE 15
- 15. Smilkov, D., Brain, G., Thorat, N., Nicholson, C., Reif, E., Viégas, F.B., Wattenberg, M.:
Embedding Projector: Interactive visualization and interpretation of embeddings. In: 30th Conference on Neural Information Processing Systems (NIPS 2016). , Barcelona, Spain (2016).
- 16. Sturgeon, D.: Chinese Text Project: a dynamic digital library of premodern Chinese. Digit.
- Scholarsh. Humanit. (2018).
- 17. Huang, C.-R., Chen, K.-J.C., Chang, L.-P
.C., Hsu, H.-L.: An introduction to the Academia Sinica Balanced Corpus of Chinese. In: Proceedings of ROCLING. pp. 81–99 (1995).
- 18. Li, X., Meng, Y., Sun, X., Han, Q., Yuan, A., Li, J.: Is word segmentation necessary for deep
learning of Chinese representations? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019). pp. 3242–3252. Association for Computational Linguistics (ACL) (2019). https://doi.org/10.18653/v1/p19-1314.
- 19. Smetanin, S.: Google News and Leo Tolstoy: Visualizing Word2Vec word embeddings using t-
SNE, https://towardsdatascience.com/google-news-and-leo-tolstoy-visualizing-word2vec-word- embeddings-with-t-sne-11558d8bd4d.
- 20. Van Der Maaten, L., Hinton, G.: Visualizing Data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605
(2008).
SLIDE 16
- 21. Camacho-Collados, J., Pilehvar, M.T.: From word to sense embeddings: A survey on vector
representations of meaning. J. Artif. Intell. Res. 63, 743–788 (2018). https://doi.org/10.1613/jair.1.11259.
- 22. Pelevina, M., Arefiev, N., Biemann, C., Panchenko, A.: Making Sense of Word Embeddings. In:
Proceedings of the 1st Workshop on Representation Learning for NLP . pp. 174–183 (2016). https://doi.org/10.18653/V1/W16-1620.
- 23. Jatowt, A., Campos, R., Bhowmick, S.S., Tahmasebi, N., Doucet, A.: Every word has its history:
Interactive exploration and visualization of word sense evolution. In: Proceedings of International Conference on Information and Knowledge Management. pp. 1899–1902. Association for Computing Machinery (2018). https://doi.org/10.1145/3269206.3269218.
SLIDE 17