language technology r d
play

Language Technology: R&D Word Embeddings Ali Basirat - PowerPoint PPT Presentation

Language Technology: R&D Ali Basirat Language Technology: R&D Word Embeddings Ali Basirat Department of Linguistics and Philology Uppsala University September, 2020 Language The Word Technology: R&D Ali Basirat


  1. Language Technology: R&D Ali Basirat Language Technology: R&D Word Embeddings Ali Basirat Department of Linguistics and Philology Uppsala University September, 2020

  2. Language The Word Technology: R&D Ali Basirat ‚ Linguistics: the minimal syntactic unit of language ‚ Philosophy: the reflection of meaning in the mind ‚ Theology: the nature of God ‚ Cognitive science: the clusters of perceptual signals ‚ Artificial Intelligence: a symbol, a vector, a distribution, or a complex algebraic system

  3. Language The Word Technology: R&D The Journey in AI/CL Ali Basirat ‚ The importance: why word is important to the AI/CL communities? ‚ The use cases: which tasks would benefit from the study of words? ‚ Which models are examined by the community? ‚ What are the active lines of research?

  4. Language Importance Technology: R&D Intelligent Machines Ali Basirat ‚ Artificial intelligence: to design machines that simulate human intelligence, and think and behave like humans ‚ Turing test: an intelligent machine should behave equivalent to that of a human ‚ Communication system: a natural language is used to communicate with an intelligent machine

  5. Language Importance Technology: R&D Language and Intelligence Ali Basirat ‚ Humans use natural languages to communicate their intelligence ‚ Natural languages are brain products that have evolved gradually in centuries ‚ Natural languages can model almost whole the world ‚ Language is the jewel in the crown of cognition

  6. Language Importance Technology: R&D Words of Language Ali Basirat ‚ Words are fundamental elements of languages ‚ Syntax is the study of structures ‚ The word is the atomic element of syntax

  7. Language Use Cases Technology: R&D Example Ali Basirat ‚ Information retrieval, search engines, question answering, information extraction ‚ Machine translation ‚ Text analysis and language study ‚ Dialogue systems, and chat-bots ‚ Text summarization, story tellers, computational narrators ‚ Speech recognition ‚ Optical character recognition ‚ Many other use cases that deal with human languages

  8. Language The Community Technology: R&D Ali Basirat ‚ Association for computational linguistics (ACL): ‚ Journals: Computational Linguistics, Transactions of ACL ‚ Conferences: ACL, EACL, NAACL, EMNLP, IJCNLP ‚ Association for the Advancement of Artificial Intelligence (AAAI) ‚ Other conferences on AI, Linguistics, Machine Learning, and Learning Representation (e.g., COLING, NIPS, ICLR, and ICML)

  9. Language Which models are examined? Technology: R&D One-hot encoding Ali Basirat ‚ Words are symbols independent of each other ‚ The relationships between words are modelled in separate tasks 1 , 0 , 0 , ... the a 0 , 1 , 0 , ... ... sun 0 , ..., 1 , 0 , ... task ...

  10. Language Which models are examined? Technology: R&D One-hot encoding Ali Basirat ‚ Advantage: easy to implement - sparse vectors ‚ Disadvantages: ‚ It does not model the interrelationships between words ‚ A complex feature engineering should be performed by the target tasks ‚ It does not tell us anything about the word properties (not good for linguistic studies) ‚ No mechanism to handle out of vocabulary words

  11. Language Which models are examined? Technology: R&D Word vectors Ali Basirat ‚ Each words is represented as a vector (a list of real numbers) ‚ Vector similarity represent word similarity

  12. Language Which models are examined? Technology: R&D Word vectors Ali Basirat ‚ More complex word embedding learner ‚ Simpler feature engineering in the target task p 0 . 1 , 0 . 4 , ... q the a p 0 . 2 , 0 . 1 , ... q ... sun p 0 . 7 , 0 . 4 , ... q task ...

  13. Language Which models are examined? Technology: R&D Word vectors Ali Basirat ‚ Advantages: ‚ No data annotation ‚ Easy to train ‚ Linguistically rich: very little feature engineering is needed ‚ Disadvantages ‚ Does not encode polysemy and dynamics of word’s meaning ‚ Does not encode certain semantic aspects of words (e.g., is a noun countable or not?)

  14. Language Which models are examined? Technology: R&D Random Word vectors Ali Basirat ‚ Words are associated with random vectors ‚ Each word takes an area in a high-dimensional space ‚ Word similarities are measured by the distribution distances Stockholm London have can eat

  15. Language Which models are examined? Technology: R&D Random Word vectors Ali Basirat ‚ Advantages: ‚ All advantages of word vectors ‚ Encode multiple senses of words and models polysemy ‚ Provide for modelling the complex semantic relations ‚ Disadvantages ‚ Limited to a fixed number of senses for each word ‚ Not studied enough in the literature

  16. Language Which models are examined? Technology: R&D Contextualized Word vectors Ali Basirat ‚ Each word in a context is associated with a vector ‚ Word vectors are generated according to the context of words ‚ The word similarities are measured according the contextual occurrence of words

  17. Language Which models are examined? Technology: R&D Contextual Word vectors Ali Basirat ‚ Advantages: ‚ No data annotation: word vectors are often trained on large raw corpora ‚ Linguistically rich: almost no feature engineering is needed on the target tasks ‚ Encode multiple senses of words and models polysemy ‚ Disadvantages ‚ The training procedure is computationally heavy ‚ Not suitable for modeling the static properties of words (e.g., grammatical gender)

  18. Language Which models are examined? Technology: R&D Summary Ali Basirat ‚ Word representation is becoming more and more important in natural language processing ‚ The target tasks become smaller and smaller as we have better representation of words 1 , 0 , 0 , ... p 0 . 1 , 0 . 4 , ... q N p µ 1 , σ 1 q encoder N p µ 2 , σ 2 q 0 , 1 , 0 , ... p 0 . 2 , 0 . 1 , ... q ... ... ... task task task N p µ n , σ n q task 0 , ..., 1 , 0 , ... p 0 . 7 , 0 . 4 , ... q attention ... ... ... decoder

  19. Language Research Lines Technology: R&D Ali Basirat ‚ New models and architectures of word embeddings ‚ Interpret the current models ‚ The application of words embeddings in new tasks ‚ Linguistic study of words - e.g., typology, nominal classification, etc. ‚ Compositional Semantics ‚ Survey of use cases, and architectures

  20. Language Thank You Technology: R&D Ali Basirat Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend